User:Ifarley/Facies classification using Neural Network algorithm

From SEG Wiki
Jump to: navigation, search

Machine Learning (ML) is a field of Artificial Intelligence (AI) that has experienced rapid growth in the last ten years across diverse industries, including communications, financial services, security, transportation, and others. Applications of machine learning have produced dramatic results, enabling new opportunities and business models. Machine learning facilitates an understanding of complex relationships among a large and diverse set of variables, valuable for generating and validating models and answering scientific questions. Machine learning can enable fast high-quality decisions in the oil and gas industry, an essential component for viability given the industry’s long-term outlook. Geoscience datasets are among the largest volumes of data in the industry.  The data has a wide spectrum of properties with scales varying over many orders of magnitude. This workshop will discuss the challenges, opportunities, and trends related to the adoption of machine learning in geoscience research and industrial workflows.

There has been many free and open-source packages now exist that provide powerful additions to the geoscientist's toolbox, much of which used to be only available in proprietary (and expensive) software platforms. One of the best examples is scikit-learn (http://scikit-learn.org/), a collection of tools for machine learning in Python.In this tutorial, we will demonstrate how to use a classification algorithm known as neural network to identify lithofacies based on well-log measurements. A neural network algorithm is a type of supervised-learning algorithm, which needs to be supplied with training data to learn the relationships between the measurements (or features) and the classes to be assigned. In our case, the features will be well-log data from nine gas wells. These wells have already had lithofacies classes assigned based on core descriptions. Once we have trained a classifier, we will use it to assign facies to wells that have not been described. Also at the end we will have a result comparison between results from neural network and support vector machine with the same dataset.

What is Neural Network

Here is a basic introduction for neural network algorithm in Wikipedia. [1] The main purpose of this tutorial is to focus on the application of neural networks on facies classification so we won't talk too much about the algorithm itself.

Exploring the data set

The data set we will use comes from a University of Kansas class exercise on the Hugoton and Panoma gas fields. For more on the origin of the data, see Dubois et al. (2007) and the Jupyter notebook that accompanies this tutorial at http://github.com/seg.[1]

Figure 1. study area.

The following code part I should give all the credit to the author of this seg wiki http://wiki.seg.org/wiki/Facies_classification_using_machine_learning. However there are two major changes: firstly, the sns.pairplot function has changed, so the right format should be >>> sns.pairplot(feature_vectors,hue='Facies'); second one is that since we use neural network, all the codes along with support vector machine should be changed to neural network.

The data set consists of seven features (five wireline log measurements and two indicator variables) and a facies label at half-foot depth intervals. In machine learning terminology, the set of measurements at each depth interval comprises a feature vector, each of which is associated with a class (the facies type). We will use the pandas library to load the data into a dataframe, which provides a convenient data structure to work with well-log data.

>>> import pandas as pd

>>> data = pd.read_csv(‘training_data.csv’)

We can use data.describe() to provide a quick overview of the statistical distribution of the training data (Table 1).

We can see from the count row in Table 1 that we have a total of 3232 feature vectors in the data set. The feature vectors consist of the following variables:

  1. Gamma ray (GR)
  2. Resistivity (ILD_log10)
  3. Photoelectric effect (PE)
  4. Neutron-density porosity difference (DeltaPHI)
  5. Average neutron-density porosity (PHIND)
  6. Nonmarine/marine indicator (NM_M)
  7. Relative position (RELPOS)
Table 1. Statistical distribution of the training data set.
Facies Depth GR ILD_log10 DeltaPHI PHIND PE NM_M RELPOS
count 3232 3232 3232 3232 3232 3232 3232 3232 3232
mean 4.42 2875.82 66.14 0.64 3.55 13.48 3.73 1.50 0.52
std 2.50 131.00 30.85 0.24 5.23 7.70 0.89 0.50 0.29
min 1 2573.50 13.25 -0.03 -21.83 0.55 .020 1 0.01
25% 2 2791.00 46.92 0.49 1.16 8.35 3.10 1 0.27
50% 4 2932.50 65.72 0.62 3.5 12.15 3.55 2 0.53
75% 6 2980.00 79.63 0.81 6.43 16.45 4.30 2 0.77
max 9 3122.50 361.15 1.48 18.60 84.40 8.09 2 1.00

There are nine facies classes (numbered 1–9) identified in the data set. Table 2 contains the descriptions associated with these classes. Note that not all of these facies are completely discrete; some gradually blend in to one another. Misclassification of these neighboring facies can be expected to occur. The Adjacent facies column in Table 2 lists these related classes.

Table 2. Facies labels with their descriptions.
Facies Description Label Adjacent facies
1 Nonmarine sandstone SS 2
2 Nonmarine course siltstone CSiS 1,3
3 Nonmarine fine siltstone FSiS 2
4 Marine siltstone and shale SiSh 5
5 Mudstone MS 4,6
6 Wackestone WS 5,7,8
7 Dolomite D 6,8
8 Packstone-grainstone PS 6,7,9
9 Phylloid-algal bafflestone BS 7,8

To evaluate the accuracy of the classifier, we will remove one well from the training set so that we can compare the predicted and actual facies labels.

>>> test_well = data[data[‘Well Name’] == ‘NEWBY’]

>>> data = data[data[‘Well Name’] != ‘NEWBY’]

Let's extract the feature vectors and the associated facies labels from the training data set:

>>> features = [‘GR’, ‘ILD_log10’, ‘DeltaPHI’, ‘PHIND’,‘PE’,‘NM_M’, ‘RELPOS’]

>>> feature_vectors = data[features]

>>> facies_labels = data[‘Facies’]

Crossplots are a familiar tool to visualize how two properties vary with rock type. This data set contains five log measurements, and we can employ the very useful seaborn library https://seaborn.pydata.org/ (Waskom et al., 2016) to create a matrix of crossplots to visualize the variation between the log measurements in the data set.[2]

>>> import seaborn as sns

>>> sns.pairplot(feature_vectors,hue='Facies')

Each pane in Figure 1 (next page) shows the relationship between two of the variables on the x and y axis, with a stacked bar plot showing the distribution of each point along the diagonal. Each point is colored according to its facies (see the Jupyter notebook associated with this tutorial for more details on how to generate colors for this plot). It is not clear from these crossplots what relationships exist between the measurements and facies labels. This is where machine learning will prove useful.

Conditioning the data set

Many machine-learning algorithms assume the feature data are normally distributed (i.e., Gaussian with zero mean and unit variance). Table 1 shows us that this is not the case with our training data. We will condition, or standardize, the training data so that it has this property. The same factors used to standardize the training set must be applied to any subsequent data set that will be classified. scikit-learn includes a handy StandardScalar class that can be applied to the training set and later used to standardize any input data.

>>> from sklearn.preprocessing import StandardScaler

>>> scaler = StandardScaler().fit(feature_vectors)

>>> scaled_features = scaler.transform(feature_vectors)

A standard practice when training supervised-learning algorithms is to separate some data from the training set to evaluate the accuracy of the classifier. We have already removed data from a single well for this purpose. It is also useful to have a cross-validation data set we can use to tune the parameters of the model. scikit-learn includes a handy function to randomly split the training data into subsets. Let's use 5% of the data for the cross-validation set.

>>> from sklearn.cross_validation import train_test_split

>>> X_train, X_cv, y_train, y_cv = \ train_test_split(scaled_features, facies_labels, test_size=0.05, random_state=42)

Figure 2. Crossplot matrix generated with the seaborn library.

Training the classifier

Now we can use the conditioned data set to train a neural network to classify facies.

The Neural Network implementation in scikit-learn (http://scikit-learn.org/stable/modules/neural_networks_supervised.html) takes a number of important parameters. These can be used to control the learning rate and the specifics of the kernel functions. The choice of parameters can affect the accuracy of the classifier, and finding optimal parameter choices is an important step known as model selection. A succession of models is created with different parameter values, and the combination with the lowest cross-validation error is used for the classifier. See the notebook accompanying this article for more details on the model selection procedure used to obtain the parameter choices used here.

>>> from sklearn.neural_network import MLPClassifier

>>> clf = MPLClssifier(solver='lbfgs',alpha=1e-5,hidden_layer_sizes=(20,10),random_state=1)

Now we can train the classifier using the training set we created above. Please notice we have two hidden layers, one has 20 neurons and one has 10 neurons in this case. However you can always change the number of layers and numbers of neurons in each layer.

>>> clf.fit(X_train, y_train)

That's it! Now that the model has been trained on our data, we can use it to predict the facies of any well with the same set of features as our training set. We set aside a well for exactly this purpose.

Evaluating the classifier

To evaluate the accuracy of our classifier we will use the well we kept for a blind test and compare the predicted facies with the actual ones. We need to extract the facies labels and features of this data set and rescale the features using the same parameters used to rescale the training set.

>>> y_test = test_well[‘Facies’]

>>> well_features = test_well.drop([‘Facies’, ‘Formation’, ‘Well Name’, ‘Depth’], axis=1)

>>> X_test = scaler.transform(well_features)

Now we can use our trained classifier to predict facies labels for this well, and store the results in the Prediction column of the test_well dataframe.

>>> y_pred = clf.predict(X_test)

>>> test_well[‘Prediction’] = y_pred

Because we know the true facies labels of the vectors in the test data set, we can use the results to evaluate the accuracy of the classifier on this well.

>>> from sklearn.metrics import classification_report

>>> target_names = [‘SS’, ‘CSiS’, ‘FSiS’, ‘SiSh’, ‘MS’, ‘WS’, ‘D’,‘PS’, ‘BS’]

>>> print(classification_report(y_test, y_pred, target_names=target_names))


Figure 3. Well logs from a single well.
Figure 4. Facies classification results from a single well.

Results comparison

Figure 5. Statistic results from Neural Network
Figure 6. Statistic results from Support Vector Machine


Our classifier achieved an overall F1 score of 0.52 on the test well, so there is room for improvement. It is interesting to note that if we count misclassification within adjacent facies as correct, the classifier has an overall F1 score of 0.97.

Let's look at the classifier results in log-plot form. Figure 4 is based on the plots described in Alessandro Amato del Monte's excellent tutorial from June 2015 of this series.[3] The five logs used as features are plotted on figure 3.

Instead of only having results from neural networks, I also repeat the work Brendon Hall has done, which is apply another machine learning algorithm-support vector machine (SVM)-to the same datasets. Based on the statistic results from Neural Network and support vector machine, the true facies, SVM predicted facies,neural network predicted facies are plotted on figure 4. We can easily tell that neural network did a better job in facies classification.


This tutorial has provided a brief overview of a typical machine learning workflow: preparing a data set, training a classifier, and evaluating the model. Libraries such as scikit-learn provide powerful algorithms that can be applied to problems in the geosciences with just a few lines of code.


References

  1. Dubois, M. K., G. C. Bohling, and S. Chakrabarti, 2007, Comparison of four approaches to a rock facies classification problem: Computers & Geosciences, 33, no. 5, 599–617, http://dx.doi.org/10.1016/j.cageo.2006.08.011
  2. Waskom, M., et al., 2016, Seaborn, v0.7, January 2016: Zenodo, http://dx.doi.org/10.5281/zenodo.45133
  3. Amato del Monte, A., 2015, Seismic petrophysics: Part 1: The Leading Edge, 34, no. 4, 440–442, http://dx.doi.org/10.1190/tle34040440.1

External links

You can find all the data and code for this tutorial for yourself at https://github.com/seg.

If you want to learn more about machine learning, here is a good online course for beginners: https://www.coursera.org/learn/machine-learning

find literature about
Ifarley
SEG button search.png Datapages button.png GeoScienceWorld button.png OnePetro button.png Schlumberger button.png Google button.png AGI button.png