Machine Learning in 4D Seismic Inversion


Dramsch, J. S., Corte, G., Amini, H., MacBeth, C., & Lüthje, M.. (2019). Including Physics in Deep Learning–An example from 4D seismic pressure saturation inversion. arXiv preprint arXiv:1904.02254.
Github: https://github.com/JesperDramsch/4D-seismic-neural-inversion


Dramsch, J. S., Corte, G., Amini, H., Lüthje, M., & MacBeth, C.. (2019, April). Deep Learning Application for 4D Pressure Saturation Inversion Compared to Bayesian Inversion on North Sea Data. In Second EAGE Workshop Practical Reservoir Monitoring 2019.
Github: https://github.com/JesperDramsch/4D-seismic-neural-inversion

This chapter discusses a neural network application to approximate the pressure-saturation inversion of 4D seismic data. It contains two workshop papers that discuss two different aspects of the construction of the neural network architecture. Traditionally, 4D seismic qi often relies on priors to reduce variance in the face of uncertainty. The inversion problem in this chapter is a pressure-saturation inversion from seismic amplitude difference maps in the Schiehallion field. The first paper presents an ablation study of the components in the architecture. The second paper discusses the neural network result and presents a comparison to a classical Bayesian inversion.

Data

The Schiehallion field is a stacked turbidite reservoir in the UK North Sea, which makes it very heterogeneous and compartmentalized. The T31 sandstone reservoir has the most lateral extent with the thickness ranging from 5 m to 30 m. The small thickness of the reservoir layer results in the entire reservoir being contained in a single trough of a seismic wavelet (\(\approx\frac{1}{2}\lambda\)), which has historically lead to applications using a 2D map view of the data. In order to make the results comparable, we treat the network as a 2D map instead of a 3D problem.

The data available consists of simulation and field data with several years of collected seismic data. The baseline acquisition is from 1996 with additional time steps acquired in 1999, 2000, 2002, 2004, 2006, 2008, and 2010. There are simulation results and measured amplitude difference maps. The simulated seismic data is based on pore volumes from previous pore volume inversions, pressure changes and saturation changes for water and gas. The ground truth pressure and saturation changes are not available for validation of the field data directly, which would be the ideal validation case.

Specifically, the seismic data consists of angle stacks in near, mid, and far. The reflectivity of seismic data can be angle-dependent, especially in the presence of fluids contained in the rock matrix. (Castagna and Backus 1993). Angle stacks are constructed by selecting subsets of the full dataset to average data within defined bands of incidence angles. Commonly, angle stacks are constructed by stacking over the offset hence avo. The process of stacking the data, despite being partial stack increases the snr and is often necessary to obtain reliable results in 4D qi.

The simulation results are noise-free calculations with only a single simulation per year available. The recorded field seismic contains significant levels of noise. The seismic field data can therefore diverge from the theoretical prediction based on the pressure and saturation data. These fluctuations are not smooth across individual cells of the map, which can be seen in 6.7.

The validation strategy in this problem setting is using one time step as a hold-out set that is not used during the training of the neural network. The time step used was recorded in the year 2004 and is presented in 6.7. The remaining time steps are used during the training. Results in the paper are presented on the hold-out data.

Machine Learning Model

A primary application of machine learning is building regression models. The data available is not particularly abundant, which restricts the choice of model or training strategy. Following a premise of simplicity, a dense neural network was implemented, which treats each cell of a map independently. It is possible that a convolutional neural network increases the performance, but due to the nature of deep convolutional neural networks more training data needs to be generated.

In Jesper Sören Dramsch, Corte, et al. (2019d) we present a novel network structure that explicitly includes avo gradient calculation within the network as physical knowledge, shown in 6.6.

The network architecture was chosen to follow an encoder-decoder architecture as a forcing function for information distillation. The encoder decreases in size with each layer, gradually compressing the input data, while the decoder decompresses the data to the designated output (Dony and Haykin 1995). Conventionally, the middle layer is called "bottleneck" or "code layer" as it contains the compressed representation of the input data. Encoder-decoder architectures have found wide application in neural network applications that necessitate data transformation to a different representation (Worrall et al. 2017).

Additionally, the bottleneck layer is implemented as a variational encoding layer to be less susceptible to noisy input. The specific implementation is based on variational auto-encoders (Diederik P. Kingma and Welling 2013). These replace the singular bottleneck layer with a number of layers that represent the parameters in a parametric probability distribution, most commonly the mean and variance of a Gaussian distribution \(\mathcal{N}\left(\mu, \sigma\right)\). The encoder then informs the Gaussian distribution at the bottleneck and the decoder samples from the distribution during training. At inference, these networks commonly return the mean of the distribution. Neural networks are conventionally trained using stochastic gradient descent, which is not well-behaved calculating the derivative of a random node. Diederik P. Kingma and Welling (2013) popularized the "reparameterization trick", which reformulates

\begin{equation*} z \sim P_\phi (z|x), \end{equation*}

with \(z\) being the bottleneck, \(P\) being the probability of the distribution \(\phi\) to approximate, and \(x\) being the data sample to

\begin{equation*} z = g(\phi, x, \epsilon) \end{equation*}

where \(g()\) is the functional representation of \(\phi\) parameterized by \(\mu\) and \(\sigma\) for a Gaussian distribution, and \(\epsilon\) being a random sample from \(\mathcal{N} (0,1)\) that is the source of randomness in the bottleneck layer computing as \(z = \mu + \sigma \cdot \epsilon\).

The pore volume is passed as-is to the network. The estimated pore volume helps the network to decouple the rock matrix from the fluid effects, which is further explored in 6.4. A schematic of the network is shown in 6.6, which shows the connections of the individual operations.

The network explicitly includes avo gradient calculation in the network architecture, considering it is physical knowledge we know will stabilize pressure and saturation change separation. Including basic physics knowledge leads to the network learning residual information, essentially defining another forcing function for the networks learning process. The avo gradient can be calculated explicitly as input to the network. However, performing the avo gradient calculation within the network enables programmatic augmentation of the input data during training. This implies that instead of learning one pre-computed avo relation, we can perform data augmentation of the input data and train on a significantly higher amount of correctly calculated avo gradients. This strategy can significantly improve the training strategy.

Training the Deep Neural Network for 4D Seismic Inversion

The model training is carried out in multiple phases. The first phase solely trains on un-augmented simulation data to determine an ideal network structure. The second phase trains on the fixed architecture with data augmentation to transfer the network to noisy field data. The network is optimized on standard mse while monitoring the R²-score.

The initial phase was carried out on simulation data with the data split into one part for training and a separate data set for validation. The seismic data from 2004 was held out as a test set. nas was applied to the network to determine depth and width of the architecture, using a tpe hyper-parameter search (J. Bergstra et al. 2015). This ensures an architecture in a controlled test environment on simulation data that is optimized for the complexity of the data.

In the second phase, to transfer the network to field data, the input of the network was combined with additive Gaussian noise (Chris M. Bishop 1995) to train the network for noisy field data input. The noise level was estimated in a manual process. Therefore, including the avo calculation within the network forces the network to learn noisy avo gradients that correspond to the augmented input. This process reduces the R²-Score and mse, which is an expected effect of noisy regression data (Hastie, Tibshirani, and Friedman 2009). Nevertheless, this produces consistent results on field data upon visual inspection.

The paper in 6.4 provides an ablation study, where parts of the neural network architecture are systematically switched off. Ablation studies are commonly used to explore and evaluate the effect of the individual components on the regression result. The paper in 6.5 shows the results of the deep neural network compared to a Bayesian inversion.

Workshop Paper: Including Physics in Deep Learning – An example from 4D seismic pressure saturation inversion

Introduction

Physics in machine learning often relies on transformations of data to beneficial domains and simulating additional data. Karpatne et al. (2017) show a physics-guided approach to model lake temperatures with neural networks. Schütt et al. (2017a) use deep neural networks to model molecule energies and Oliveira, Paganini, and Nachman (2017) employ a special architecture to capture scatter patterns in high-energy physics. When building deep learning pipelines, we can make informed choices in data modeling, but also build neural networks to maximize information gain on the available data. Ulyanov, Vedaldi, and Lempitsky (2018) has shown that the network architecture itself can be used as prior in machine learning. These approaches translate well to geoscience, where strong priors are often necessary to inform decisions.

Deep learning has revolutionized machine learning by replacing the feature generation and augmentation step by learned internal representations of features that maximize information gain. On image data analysis of these neural network filters have shown close relations to edge filters and color separators (Grün et al. 2016). Jesper Sören Dramsch and Lüthje (2018b) have shown that these filters translate well to seismic data. However, classic feed-forward neural networks do not have the benefit of learning filters. However, these neural networks benefit from recent improvements for regularization (Ioffe and Szegedy 2015), non-saturating and non-vanishing gradients (K. He et al. 2015), and training on GPUs.

Neural networks for inversion of seismic data have a long history (Roeth and Tarantola 1994). In (Jesper S. Dramsch et al. 2019) we show the application of a deep multi-layer perceptron for map-based 4D seismic pressure saturation inversion. In this work we show the information gain of feed-forward multi-layer perceptron neural networks by including an explicit calculation of the AVO gradient within the network architecture. It’s exemplary for including domain knowledge as a prior in machine learning.

Method

We build a deep feed-forward network to invert seismic amplitude maps for pressure and saturation changes. We use the high-level Python framework keras with a tensorflow backend. The neural network was trained on synthetic data, to subsequently predict field data. The network takes the seismic input samplewise with near, mid, and far stacks, and pore volume. We inject 20% Gaussian noise to model the noisier field data directly after the input layer. This is fed to a custom layer that calculates the PP AVO gradient between far-mid, mid-near, and far-near. The main components are as follows:

Gaussian noise injection

The synthetic model is noise-free. While we get good results on the training data and the modelled test data, the network does not transfer well to noisy field data. Although the 4D NRMS is very low in the data set, the sample-wise fluctuations in the field seismic differ significantly from the synthetic data. We apply additive Gaussian noise with \(\sigma = .02\) to the seismic inputs separately to simulate independent fluctuations of the seismic maps. This significantly decreases the training and validation performance on noise free synthetic data. On field data, however, this enables good transfer of the neural network.

noisy_input = GaussianNoise(0.02)(input_data)

Explicit AVO gradient calculation

The Schiehallion field is a good example of imbalanced learning. We have many samples of pressure changes \(\Delta P\), a good selection of water saturation changes \(\Delta S_w\), and very few gas saturation changes \(\Delta S_g\). Yet, the changes in gas saturation \(\Delta S_g\) produce the strongest changes in seismic P wave amplitudes. Statistically, these can easily be regarded as outliers, and therefore, possibly disregarded by the neural network. From decades of seismic analysis, we know that the AVO gradient is very good for pressure saturation separation. We implement an explicit calculation of AVO gradients in the network.

\begin{equation*} G = \frac{A_{\Theta_1} - A_{\Theta_0}}{x_{\Theta_1} - x_{\Theta_0}}, \end{equation*}

where \(G\) is the PP AVO gradient, \(A\) is the seismic P wave amplitude, \(x\) is the offset, and \(\Theta\) is the angle.

mid_near = Lambda(
    lambda inputs: (inputs[0] - inputs[1]) / (10)
)([noisy_mid, noisy_near])

far_mid = Lambda(
    lambda inputs: (inputs[0] - inputs[1]) / (10)
)([noisy_far, noisy_mid])

far_near = Lambda(
    lambda inputs: (inputs[0] - inputs[1]) / (20)
)([noisy_far, noisy_near])

Encoder-decoder architecture

Subsequently, the four input maps and the three gradient maps are concatenated and fed to an encoder architecture that condenses the information to an embedding layer \(z\). This layer learns a collection of Gaussian distributions to represent the noisy input data The decoder samples this variational embedding layer to calculate the pressure change \(\Delta P\), change in water saturation \(\Delta S_w\), and gas saturation \(\Delta S_g\).

The full architecture is of the encoder-decoder class. The encoder reduces the number of parameters with each subsequent layer. This forces the network to learn a lossy compression of the input data as \(z\)-vector. The decoder increases the number of nodes per layer toward the output. The network therefore learns to correlate the low resolution representation with the desired output.

Full Architecture from Jesper S. Dramsch et al. (2019).

Variational Z Vector

The inversion of noisy input benefits from a variational representation of compressed z-vector. The networks learns Gaussian distributions in the embedding layer. Therefore, we have to apply the reparametrization trick outlined in Diederik P. Kingma and Welling (2013) to circumvent the sampling process cannot be learned by gradient descent. We use the implementation in Chollet and others (2015b) for variational autoencoders.

Results

In figure 6.2 we show the 2004 time step of the Schiehallion 4D. Figure 6.3 contains the inversion result using the variational encoder decoder architecture. Some coherency in the maps can be seen, but each map is very noisy and the gas saturation map contains many data points that indicate gas desaturation, which cannot be confirmed by production data.

Variational Encoder Decoder Architecture Inversion

When we add the gradient, we can clean up some of the misfit in the gas saturation maps \(\Delta S_g\). Particularly, the event with the strongest softening in the amplitude maps, is partially reassigned to the pressure map \(\Delta P\). However, the inversion process is still very prone to noise. In figure 6.5, we show the inversion results of a AVO-gradient neural network with a noise injection at training of \(\sigma = .02\). The inversion maps are very coherent. Noise injection without gradient calculation does not give adequate results.

AVO-Gradient Variational Encoder Decoder Architecture Inversion

Noiseinjected AVO-Gradient Variational Encoder Decoder Architecture Inversion

Conclusions

We have shown a neural network architecture that incorporates physical domain knowledge to enable transfer from synthetic to field data. The final inversion result has very good coherency, despite the network not having any spatial context. While further investigation is necessary, this indicates that useful information has been learned. This is one example, where bias can be intentionally introduced into the network architecture to include physics into machine learning.

Acknowledgements

The research leading to these results has received funding from the Danish Hydrocarbon Research and Technology Centre under the Advanced Water Flooding program. We thank the sponsors of the Edinburgh Time-Lapse Project, Phase VII (AkerBP, BP, CGG, Chevron, ConocoPhillips, ENI, Equinor, ExxonMobil, Halliburton, Nexen, Norsar, OMV, Petrobras, Shell, Taqa, and Woodside) for supporting this research. The Brazilian governmental research-funding agency CNPq. We are also grateful to Linda Hodgson and Ross Walder for important discussions on the field and dataset.