Complexvalued Neural Networks
Dramsch, J. S., Lüthje, M., & Christensen, A. N. (2019). Complexvalued neural networks for machine learning on nonstationary physical data. arXiv preprint arXiv:1905.12321. 

Github: https://github.com/JesperDramsch/ComplexCNNSeismic 
In the paper (Jesper Sören Dramsch, Lüthje, and Christensen 2019) I explore complexvalued deep convolutional networks to show that phase content in nonstationary data improves generalization of convolutional neural networks. This work implements selfsupervised aes that compress the data and measure the reconstruction of the seismic data.
Four different deep convolutional aes are constructed. Two aes are realvalued and two aes are complexvalued. The complexvalued convolutional neural network is implemented as two realvalued feature maps, one for the real component \(a\) and one for the complex component \(b\) each, which are combined into a complexvalued number with \(a + b\text{i}\). The complex convolution is then implemented explicitly in the calculation to avoid some drawbacks of using complex numbers by a computer. However, matching the networks proved to be a complicated task with regard to the number of parameters. This led to building four different architectures that get progressively bigger and compare the results.
This study implements aes to increase the validity of this experiment. While vaes have shown better performance on reconstruction tasks, it would also introduce more variability in the network to control for. Considering that asi is a fairly new discipline, it is difficult to disambiguate effects on misclassification. These effects include erroneous labels, the difficulty of the task of asi, as well as the choice of architecture.
Therefore, this leads us to the decision to inspect the reconstructed seismic data numerically. Signal analysis is wellexplored in the seismic data processing. Moreover, this enables analysing the result in the fkdomain providing additional insight to the denoising effect of the aes. Overall, the complexvalued networks result in smaller networks compared to a larger realvalued network achieving comparable reconstruction error.
Journal Paper: Complexvalued neural networks for machine learning on nonstationary physical data
Introduction
Seismic data has its caveats due to the complicated nature of bandwidthlimited wavebased imaging. Common problems are cycleskipping of wavelets and nullspaces in inversion problems (Özdoğan Yilmaz 2001). Automatic seismic interpretation is complicated, as the modelling of seismic data is computationally expensive and often proprietary. Seismic field data is often not available and their interpretation is highly subjective and ground truth is not available. The lack of training data has been delaying the adoption of existing methods and hindering the development of specific geophysical deep learning methods. Incorporating domain knowledge into general deep learning models has been successful in other fields (Paganini, Oliveira, and Nachman 2017).
The stateoftheart method has been an iterative windowed Fourier transform for phase reconstruction (Griffin and Lim 1984). Modern neural audio synthesis focuses on methods that do not require explicit reconstruction of the phase (Mehri et al. 2016; Oord et al. 2016, 2017; Prenger, Valle, and Catanzaro 2018). Mehri et al. (2016) introduced a recurrent neural network formulation, where Oord et al. (2016) reformulated the network for audio synthesis in a strided convolutional network. The original WaveNet formulation in Oord et al. (2016) is slow due to the autoregressive filter, warranting the parallel formulation in Oord et al. (2017).
We explicitly incorporate phase information in a deep convolutional neural network. These have been heavily explored in the digital signal processing community, before the recent renaissance of neural networks and deep learning. Relevant examples to seismic data processing include source separation (Scarpiniti et al. 2008), adaptive noise reduction (Suksmono and Hirose 2002), and optical flow (Miyauchi et al. 1993) with complexvalued neural networks. Sarroff (2018) gives a comprehensive overview of applications of complexvalued neural networks in signal and image processing.
In this work, we evaluate the reconstruction error after compression in an autoencoder to test how reliable information can be contained within a network with and without explicit phase information. This insight can be transferred to the aforementioned applications that benefit from an increase in information recovery. We calculate the complexvalued seismic trace by applying the Hilbert transform to each trace. Phase information has been shown to be valuable in the processing (Liner 2002) and interpretation of seismic data (Rocky Roden and Sepúlveda 1999; Mavko, Mukerji, and Dvorkin 2003). Steve Purves (2014) provides a tutorial that shows the implementation details of Hilbert transforms.
In this paper we give a brief overview of convolutional neural networks and then introduce the extension to complex neural networks and seismic data. We show that including explicit phase information provides superior results to realvalued convolutional neural networks for seismic data. Difficult areas that contain seismic discontinuities due to geologic faulting are resolved better without leakage of seismic horizons. We train and evaluate several complexvalued and realvalued autoencoders to show and compare these properties. These results can be directly extended to automatic seismic interpretation problems.
Complex Convolutional Neural Networks
Basic principles
Convolutional neural networks (Y. LeCun et al. 1999) use multiple layers of convolution and subsampling to extract relevant information from the data (see Figure [complex:fig:3])
The input image is repeatedly convolved with filters and subsampled. This creates many, but smaller and smaller images. For a classification task, the final step is then a weighting of these very small images leading to a decision about what was in the original image. The filters are learned as part of the training process by exposing the network to training images. The salient point is, that the convolution kernels are learned based on the training. If the goal is  for example  to classify geological facies, the convolutional kernels will learn to extract information from the input, that helps with that task. It is thus a very strong methodology, that can be adapted to many tasks.
Real and Complexvalued Convolution
Convolution is an operation on two signals f and g or a signal and a filter that produce a third signal, containing information from both of the inputs. An example is the moving average filter, which smoothes the input, acting as a lowpass filter. Convolution is defined as
at the location \(\tau\). While often applied to real value signals, convolution can be used on complex signals. For the integral to exist both \(f\) and \(g\) must decay when approaching infinity. Convolution is directly generalizable to Ndimensions by multiple integrations and maintains commutativity, distributivity, and associativity. In digital signals this extends to discrete values by replacing the integration with summation.
Complex Convolutional Neural Networks
Complex convolutional networks provide the benefit of explicitly modelling the phase space of physical systems (Trabelsi et al. 2017). Unfortunately it is not possible to feed complex numbers directly to a CNN, as they are not supported by any of the standard implementations (PyTorch or Tensorflow). Instead, we can represent them in another form. The complex convolution introduced in Section 14.1.2.2, can be explicitly implemented as convolutions of the real and complex components of both kernels and the data. A complexvalued data matrix in cartesian notation is defined as \(\textbf{M} = M_\Re + i M_\Im\) and equally, the complexvalued convolutional kernel is defined as \(\textbf{K} = K_\Re + i K_\Im\). The individual coefficients \((M_\Re, M_\Im, K_\Re, K_\Im)\) are realvalued matrices, considering vectors are special cases of matrices with one of two dimensions being one.
Solving the convolution of
we can apply the distributivity of convolutions (cf. section 14.1.2.2) to obtain
where \(K\) is the Kernel and \(M\) is a data vector (see Figure 14.1).
We can reformulate this in algebraic notation
Complex convolutional neural networks learn by backpropagation. Sarroff, Shepardson, and Casey (2015) state that the activation functions, as well as the loss function must be complex differentiable (holomorphic). Trabelsi et al. (2017) suggest that employing complex losses and activation functions is valid for speed, however, refers that Hirose and Yoshida (2012) show that complexvalued networks can be optimized individually with realvalued loss functions and contain piecewise realvalued activations. We reimplement the code Trabelsi et al. (2017) provides in keras (Chollet and others 2015a) with tensorflow (Abadi et al. 2015a), which provides convenience functions implementing a multitude of realvalued loss functions and activations.
While common up and downsampling functions like MaxPooling, UpSampling, or striding do not suffer from complexvalued neural networks, batch normalization (BN) (Ioffe and Szegedy 2015) does. Realvalued batch normalization normalizes the data to zero mean and a standard deviation of 1. This does not guarantee normalization in complex values. Trabelsi et al. (2017) suggest implementing a 2D whitening operation as normalization in the following way.
where \(x\) is the data and \(V\) is the 2x2 covariance matrix, with the covariance matrix being
Effectively, this multiplies the inverse of the square root of the covariance matrix with the zerocentred data. This scales the covariance of the components instead of the variance of the data (Trabelsi et al. 2017).
Autoencoders
Autoencoders (Hinton and Salakhutdinov 2006) are a special configuration of the encoderdecoder network that map data to a lowlevel representation and back to the original data. This lowlevel representation  the latent space  is often called bottleneck or code layer. Autoencoder networks map \(f(x) = x\), where \(x\) is the data and \(f\) is an arbitrary network. The architecture of autoencoders is an example of lossy compression and recovery from the lossy representation. Commonly, recovered data is blurred by this process.
The principle is illustrated in figure 14.2. The input is transformed to a lowdimensional representation  called a code or latent space  and then reconstructed again from this low dimensional representation. The intuition is, that the network has to extract the most salient parts from the data, to be able to perform a reconstruction. As opposed to other methods for dimensionality reduction  e.g. principal component analysis  an autoencoder can find a nonlinear representation of the data. The lowdimensional representation can then be used for anomaly detection, or classification.
Aliasing in Patchbased training
MeanShift in Neural Networks
A single neuron in a neural network can be described by \(\sigma ( w \cdot x + b )\), where \(w\) is the network weights, \(x\) is the input data, \(b\) is the network bias, and \(\sigma\) is a nonlinear activation function. During training, the network weights \(w\) and biases \(b\) are are adjusted to a value that represents the training minimum. Learning on a meanshift of \(q\) of an arbitrary distribution over \(x\) leads to \(\sigma( w \cdot (x + q) + b )\), which increases the neuron response by \(q\), weighted by \(w\). During inference, both \(w\) and \(b\) are fixed, by extension the meanshift \(q\) is fixed as well. The meanshift over larger inference data disappears, introducing an additional bias of \(w \cdot q\) before nonlinear activation. This training bias may lead to prediction errors of the neuron and consequently the full neural network.
Windowed Aliasing
Nonstationary data such as seismic data can contain sections within the data that contain spurious offsets from the mean. Figure 14.3 shows varying sizes of cutouts, with 101 and 256 samples respectively. In the middle, the full normalised amplitude spectra are presented. On the right, the corresponding phase spectra are presented. On the left, we focus on the frequency content of the amplitude spectra around 0 Hz. The cutouts were Hanning tapered, however, a mean shift appears for any patch size.
These concepts of meanshift corresponds to a DC offset in spectral data, which can be audio, seismic or electrical data. In images this corresponds to a nonzero alpha channel. While batch normalization can correct the mean shift in individual minibatches (Ioffe and Szegedy 2015), this may shift the entire spectrum by the aliased offset. Additionally, batch normalization may not be feasible in some physical applications pertaining to regression tasks.
Complex Seismic Data
Complex seismic traces are calculated by applying the Hilbert transform to the realvalued signal. The Hilbert transform applies a convolution with to the signal, which is equivalent to a 90degree phase rotation. It is essential that the signal does not contain a DC component, as this would not have a phase rotation.
The Hilbert transform is defined as
of a realvalued time series \(u(t)\), where the improper integral has to be interpreted as the Cauchy principal value. In the Fourier domain, the Hilbert transform has a convenient formulation, where frequencies are set zero and the remaining frequencies are multiplied by 2. This can be written as
where \(x_a\) is the analytical signal, \(x\) is the real signal, \(F\) is the Fourier transform, and \(U\) is the step function. The imaginary component \(y\) is simultaneously the quadrature of the realvalued trace. This provides locality to explicit phase information, where the Fourier transform itself does not lend itself to the resolution of the phase in the time domain. In conventional seismic trace analysis, the complex data is used to calculate the instantaneous amplitude and instantaneous frequency. These are beneficial seismic attributes for interpretation (Barnes 2007).
Experiments
Data
The data is the F3 seismic data, acquired in the Dutch North Sea in 1987 over an area of 375.31 km^{2}. The samplingrates are 4 ms in time and inline/crossline bins of 25 m. The extent being 650 inline traces and 950 crossline traces with a total length of 1.848 s. The data contains faulted reflector packets, of which the lowest one overlays a salt diapir. The data contains some noise that masks loweramplitude events.
We generate 2D patches of size 64x64 in the inline and crossline direction from the 3D volume to train our network. We obtain inline and crossline 64x64 patches that are taken overlapping with a stride of 8 samples. The total amount of data is 188736 patches with 141552 for training and 47184 for validation in a 75/25 trainvalidation split. The test data is the holdout Alaudah et al. (2019) stored in test_once. The seismic data is normalized to values in the range of [1, 1]. To obtain complexvalued seismic data we apply a Hilbert transform to every trace of the data and subtract the realvalued seismic from the real component as laid out in Taner, Koehler, and Sheriff (1979).
Architecture
Layer 
Spatial 
Complex 
Real 
Complex 
Real 

(Size) 
X 
Y 
Small 
Small 
Large 
Large 
Input 
64 
64 
2 
1 
2 
1 
CConv2D 
64 
64 
8 
8 
16 
16 
CConv2D + BN 
64 
64 
8 
8 
16 
16 
Pool + CConv2D + BN 
32 
32 
16 
16 
32 
32 
Pool + CConv2D + BN 
16 
16 
32 
32 
64 
64 
Pool + CConv2D + BN 
8 
8 
64 
64 
128 
128 
Pool + CConv2D 
4 
4 
128 
128 
256 
256 
Up + CConv2D + BN 
8 
8 
64 
64 
128 
128 
Up + CConv2D + BN 
16 
16 
32 
32 
64 
64 
Up + CConv2D + BN 
32 
32 
16 
16 
32 
32 
Up + CConv2D 
64 
64 
8 
8 
16 
16 
CConv2D 
64 
64 
8 
8 
16 
16 
CConv2D 
64 
64 
2 
1 
2 
1 
Par ameters on Graph 
100,226 
198,001 
397,442 
790,945 

Comp ression Ratio 
4:1 
2:1 
2:1 
1:1 

Size on Disk [MB] 
1.4 
2.5 
4.8 
9.2 
The autoencoder architecture compresses the input data to a lower dimensional representation, i.e. bottleneck (cf. Figure 14.2), with an encoder network and reconstruct the input data from the bottleneck with a decoder network. It is common that the encoder and decoder networks are formulated symmetrically, as we have done in this paper. We reduce a 64x64 input 4 times by a factor of two spatially to encode a 4x4 encoding layer. We define varying amounts of filters during the downsampling steps and in the code layer to achieve varying amounts of compression shown in Table 14.1. The architecture for the complex convolutional network is identical to the real network, except for replacing the realvalued 2D convolutions with complexvalued convolutions represented by two feature maps instead of one. The layers for each network are shown in Table 14.1 with additional values, including learnable parameters counted on the computational graph, compression ratio, and size on disk. In total four network architectures are presented, two realvalued and complexvalued networks each matched in the number of feature maps, resulting in different amounts of parameters and compression ratio. The parameters are counted on the computational graph compiled by Tensorflow.
The neural networks specifically use 2D convolutions with 3x3 kernels. We employ batch normalization to regularize and speed up training (Ioffe and Szegedy 2015). The down and up sampling is achieved by MaxPooling and the UpSampling operation respectively.
Complexvalued neural networks contain two feature maps for every feature map contained in a realvalued network. Conceptually, this is equivalent to \(a + \text{i}b\), with \(b=0\) for the realvalued network. The information in the complex complement for these two feature maps is derived from the input data using the Hilbert transform. Following the argument of deep learning, this input could be derived from a neural network directly and should not provide an improvement to the networks reconstruction error. We define a complexvalued network that has the same number of filters as the realvalued network in both the "small" and "large" formulation in Table 14.1. This network effectively has half the available feature maps for the realvalued seismic input, as the other half is used for the complexvalued information. That means the smaller realvalued network contains as many feature maps for the realvalued seismic as the large complex network, the large realvalued network contains an additional feature map for every realvalued input for the complex component.
Training
We train the networks with an Adam optimizer (Diederik P. Kingma and Ba 2014) and a learning rate of \(10^{3}\) without decay, for 100 epochs. The loss function is mean squared error, as the seismic data contains values in the range of [1,1]. All networks reach stable convergence without overfitting, shown in Figure 14.4.
Evaluation
We compare the complex autoencoders with the realvalued autoencoders, through the reconstruction error on unseen test data on 7 individual realizations of the respective four networks and qualitative analysis of reconstructed images. We focus on evaluating the realvalued reconstruction of the seismic data for both networks.
Results
We trained four neural network autoencoders with seven random initializations for each network, to allow for error bars on the estimates in Figure 14.4. The mean squared error and the mean absolute error for each parameter configuration during training is given in Table 14.2. There is a clear correspondence of the reconstruction error of the autoencoder to the size of network. The realvalued networks outperform the complexvalued networks in both the mean squared error and mean absolute error, however, we see that a realvalued network needs around twice as many parameters as a complexvalued network to attain the same reconstruction errors.
Network 
Compression 
Parameters 
MSE [x10^2] 
MAE [x10^2] 


4:1 
100,226 
0.484 ± 0.013 
4.695 ± 0.058 

2:1 
198,001 
0.436 ± 0.006 
4.500 ± 0.028 

2:1 
397,442 
0.227 ± 0.003 
3.247 ± 0.025 

1:1 
790,945 
0.196 ± 0.002 
3.050 ± 0.013 
The seismic sections in Figure 14.5 show the unseen test seismic section. We perform a closer inspection of the regions "top" and "bottom" to focus on geologically relevant sections in the reconstruction process. The noisy segment without strong reflectors is a good baseline to evaluate the noise reduction of the autoencoder and the behaviour of the different networks on low amplitude data. Overall, all networks denoise the original seismic, with the lowest reconstruction errors being rootmeansquared (RMS) of 0.1187 and MAE of 0.0947 (cf. Table 14.3). Figure [complex:fig:silent_fk] shows the frequencywavenumber (FK) of the ground truth ([complex:fig:silent_fk] (a)) and the large complex network reconstruction ([complex:fig:silent_fk] (b)). These show a decrease in the 0  60 Hz band for larger absolute wavenumbers.
Full 
Silent 
Top 
Bottom 

Network 
RMS 
MAE 
RMS 
MAE 
RMS 
MAE 
RMS 
MAE 
C_small 
0.1549 
0.1145 
0.1265 
0.1010 
0.2315 
0.1759 
0.1588 
0.1200 
R_small 
0.1581 
0.1153 
0.1247 
0.0994 
0.2395 
0.1810 
0.1612 
0.1205 
C_big 
0.1508 
0.1101 
0.1187 
0.0947 
0.2301 
0.1747 
0.1514 
0.1135 
R_big 
0.1469 
0.1072 
0.1214 
0.0967 
0.2222 
0.1679 
0.1459 
0.1088 
"Top" seismic section
The "top" segment contains strong reflections that are very faulted with strong reflectors. Figure [complex:fig:top] shows the top segment and the reconstructions of the four networks. All networks display various amounts of smoothing. The quantitative results show that the complex networks perform very similar regardless of size. The large realvalued network outperforms the complex networks by 2.5 % on RMS, while the small realvalued network underperforms by 2.5 % on RMS. The panel in Figure [complex:fig:top_sr] shows a very smooth result. Despite the close score of the complex networks, it appears that the complexvalued network restores more highfrequency content. We can also see less smearing of discontinuities in the larger complex network, particularly visible in the lower part (1.2 s) at 6000 m offset, which is smeared to appear like a diffraction in the smaller network. The large realvalued network shows good reconstruction with minor smearing with higher amplitude fidelity in areas like 1.1 s at 2000 m, however, some of the steeply dipping artifacts are visible below the reflector packet between 0 m and 2000 m offset.
"Bottom" seismic section
The data marke as "bottom" in Figure 14.5 contains a faulted anticline and relatively strong noise levels. The small complex network in Figure [complex:fig:bottom_sc] reconstructs a denoised image with good reconstruction of the visible discontinuities. Some leakage of the reflector starting at 1.5 s across discontinuities is visible. The real small network in Figure [complex:fig:bottom_sr] reconstructs a strongly smoothed image, with some ringing below the main reflector, which is not visible in the other reconstructions. The dipping reflector at an offset of 16000 m is well reconstructed, however, it seems like the reconstruction introduced ringing noise over the vertical image. The large realvalued network in Figure [complex:fig:bottom_br] performs best quantitatively (cf. Table 14.3). The complexvalued large network in Figure [complex:fig:bottom_bc] does a fairly good job at reconstructing the image, similar to the large realvalued network. However, the amplitude reconstruction of highamplitude events particularly in the main reflector around 1.5 s is showing.
Full seismic test data
It is evident, that the small realvalued network does not match the performance of the smaller complexvalued network, even less so when compared to the large complexvalued network. We therefore compare the large networks on the full seismic data.
Overall, both networks return a smoothed image. The findings for the strongly faulted sections in the "top" panel hold across the entire faulted area around 1.1 s in Figure [complex:fig:full]. The complexvalued network does a better job at reconstructing faults and discontinuities. The realvalued network is better at reconstructing highamplitude regions that appear dimmer in the complexvalued region. The reconstruction of both networks seems adequately close to the ground truth, with differences in the details. Quantitatively, the realvalued network does the better reconstruction in Table 14.3 with an improvement of 2.5 % over the large complexvalued network. The FK domain shows a very similar reduction in noise in the sub 50 Hz band in Figure [complex:fig:full_fk]. All networks introduce an increase of energy across all frequencies at wavenumber \(k=0~km^{1}\). Additionally, a dimming of the frequencies around \(k=2.5~km^{1}\) appears in all reconstructions, but is more prominent in the large complexvalued network. The ground truth seismic contains some scattered energy in the highfrequency midwavenumber region, visible as "diagonal stripes". These were attenuated in the complexvalued network in Figure [complex:fig:full_bc_fk], but are partially present in the realvalued reconstruction in Figure [complex:fig:full_br_fk].
Discussion
We evaluated the outputs of the realvalued and complexvalued neural networks. All autoencoder outputs are blurred to different degrees and denoised. The denoising effect of the seismic was most visible in the frequency band below 50 Hz. Additionally, some scattered highfrequency energy was attenuated by the networks.
The largest differences of the outputs in realvalued and complexvalued networks can be observed in discontinuous areas. Particularly, the faulted blocks in the top quarter and in the bottom center of the seismic section show inconsistencies. The realvalued network smooths over discontinuities and steep reflectors. Fault lines are imaged better in the complexvalued network output.
In seismic data processing, including phase information stabilizes discontinuities and disambiguates cycleskipping in horizons. This could be observed in the network performance and reconstruction. The increase in performance of the realvalued networks was significant (7.0 % RMS), while the complexvalued networks already had an acceptable performance on the smaller network architecture (2.6 % RMS). We provide the complexvalued networks with a bias towards learning phase information, by providing the Hilbert transformed analytical trace, while the realvalued network needs to learn this information implicitly from the data itself. Considering, that during the training, the complex network evaluates both the realvalued seismic, which we primarily care about in addition to the complexvalued component, we can see how the losses in Figure 14.4 differ from the realvalued networks.
The largest network with 790,945 trainable parameters quantitatively performed the best on the reconstruction of the data. However, analysis of the reconstructed seismic shows, that while the highamplitude regions are reconstructed to higher fidelity, discontinuous sections may be smeared by the realvalued network. The realvalued network that was matched to contain as many filters for the realvalued component of the seismic as the large complexvalued network, did not perform well. Furthermore, the smaller complexvalued network with 100,226 parameters that contains as many filter maps as the realvalued network in total, and half the trainable parameters, outperformed the smaller realvalued network across all test cases.
Conclusion
The inclusion of phaseinformation leads to a better representation of seismic data in convolutional neural networks. Complexvalued networks perform consistently, where realvalued networks have to learn phaserepresentations through implicit correlation, which requires larger networks. We show that complex trace information in deep neural networks improves the imaging of discontinuities as well as steep reflectors, particularly in chaotic seismic textures that are smoothed by realvalued neural networks of the same size and level of compression.
We show that convolutional neural networks can perform lossy compression on seismic data, where the reconstruction error is dependent on both network architecture and implementation details, like providing explicit phase information. During this compression, noise and scattered energy get attenuated. The realvalued network is prone to introduce steeply dipping artifacts in the reconstruction and is matched by complexvalued networks half the size with twice the amount of compression. This is particularly interesting in the light of the complex complement of the data being derived from the realvalued data through a Hilbert transform, which should have been possible to approximate by a neural network.
The stabilization of the reconstruction can be useful in other seismic applications. While automatic seismic interpretation may benefit from the inclusion of information on discontinuities, we see the main application to be lossy seismic compression. The open source tool developed to make this research possible, enables further research and development of complexvalued solutions to nonstationary physics problems that benefit from explicit phase information.
This research also shows that a change as small as 2.5 % in RMS can change the reconstruction from being acceptable to very smeared to a geoscientist. This touches on the fact that better metrics to evaluate computer vision tasks in geoscience are necessary. Additionally, these tasks have to be noiserobust and, while amplitudepreserving, be robust against outliers. Moreover, more research in the frequency dimming of bands in the network reconstruction is necessary.
Overall, the computational memory footprint of the complex convolution is higher than realvalued convolutional neural networks comparing singular convolutional operations. A significant increase in depth and width of networks to obtain an acceptable result in realvalued neural network to implicitly learn the phase information is necessary. The complexvalued networks an 8^{th} of the size already performs well, suggesting that domains where a significant part pf the information is in the phase of signals, could benefit from applying complex convolutional networks.
Acknowledgments
We thank Andrew Ferlitsch for his valuable insights. The research leading to these results has received funding from the Danish Hydrocarbon Research and Technology Centre under the Advanced Water Flooding program. We thank DTU Compute for access to the GPU Cluster.
Contributions of this Study
This chapter and Jesper Sören Dramsch, Lüthje, and Christensen (2019) investigate the application of complex trace analysis to convolutional neural networks. It uses lossy compression to measure the reconstruction error and therefore, the informational content in complexvalued neural networks. We were able to show that networks containing phase information in the complex complement of data reduce the error as compared to realvalued networks. Moreover, the code to reproduce the findings in this paper (Jesper Sören Dramsch 2019b), as well as, a standalone Python library for complexvalued convolutional neural networks in tensorflow has been made available as foss (Jesper Sören Dramsch and Contributors 2019).