Heat Maps in R – Bakken Edition

Nothing seems more daunting than facing a wall of words. Being a visual person, I prefer graphs, charts, and maps. One of the more powerful things I’ve discovered with R is the ability to auto-generate Heat Maps using well data.

You can plot up anything, though you have to be careful with outliers. Regardless, it gives me the ability to do quick things like spot-check if an operator actually has an acreage position in a good area. In this workflow, we will build a Heat Map using the Bakken data we’ve collected in other insights. So let’s get to painting.


Data

I will not rehash what we have done in other insights. For this one, I have aggregated well data from the North Dakota Department of Mineral Resources. The data set includes wells only in the Middle Bakken as identified on the site. Data includes:

  • Well Identifiers (API, Operator (current), Well Name, section, township, range). This comes from production data.
  • 9-month cumulative oil and gas for wells brought online after May 2015 (the first month excel data comes available).
  • Surface and Bottom Hole Lat Longs and TVD information from horizontal shapefiles.
  • Proppant and Fluid completion information from Frac Focus.

To read in the data:

df1 <- read.csv('https://github.com/xbrl-data/class/raw/master/bakkenStats.csv')

#Data Investigation
str(df)
summary(df)

The investigation lines will give you an idea of the structure of the data and any missing values/summary statistics. Now we can do a bit of housekeeping/cleanup.


Cleanup

First thing I like to do is convert API to a character column.

df1$API <- as.character(df1$API)

Packages

There are quite a few packages I need in this workflow, so I will just list them all here. I have used most of these in other insights.

  • tidyverse – The go-to package for all things data in R
  • sf – Spatial data in R
  • lubridate – Time data in R
  • akima – Interpolation
  • geosphere – Used for things like calculating distances
  • tigris – County/State shapefiles from US Cencus
  • glue – Easier string concatenation
  • leaflet – Interactive maps in R
  • rgdal – Bindings for the ‘Geospatial’ Data Abstraction Library (whatever that means)
  • raster – Raster objects in R
  • RColorBrewer – Used to create color scale for heat map

Remember, if you don’t have these packages installed, do so with install.packages(‘tidyverse’) (for the tidyverse).

#Load Packages
library(tidyverse)
library(sf)
library(lubridate)
library(akima)
library(geosphere)
library(tigris)
library(glue)
library(leaflet)
library(rgdal)
library(raster)
library(RColorBrewer)

Now I want to do some mapping prep.


Location Data

We have the surface and bottom hole lat/longs from the state website. And actually it’s not really surface. If you dig through the shapefile from the state website, it basically includes the whole survey. So really the surface location is where the lateral section starts (there is an id column for it in the file).

#Convert surface abd bottom hole to spatial points
df1$x1 <- st_as_sfc(paste0("POINT(", df1$surfLong, " ", df1$surfLat,")"))
df1$x2 <- st_as_sfc(paste0("POINT(", df1$bhLong, " ", df1$bhLat,")"))

#Combine them into a line for mapping purposes
df1 <- df1 %>% rowwise() %>% mutate(geom = c(x1, x2) %>% st_combine() %>% st_cast("LINESTRING")) %>% ungroup()

#The North Dakota data is using NAD83 (EPSG: 4267)
st_crs(df1$geom) <- 4267

#Convert dataframe to shapefile dataframe
df1 <- df1 %>% st_as_sf()
st_crs(df1) <- 4267

#I prefer to work in WGS84, which is EPSG:4326
df1 <- df1 %>% st_transform(4326)

#There is a weird quirk that sometimes messes up mapping, so just do this step
names(df1$geom) <- NULL

So now I have a spatial dataframe I can use for mapping in leaflet. I also want to grab the county data to make it a little easier to place locations (I haven’t found a good basemap yet that shows counties and county names).

#Load ND counties with tigris
counties1 <- st_as_sf(tigris::counties(state = 'ND'))

#Transform to our coordinate system and change NAME column to county
counties1 <- counties1 %>% st_transform(4326) %>% mutate(county = NAME) %>% subset(select = -c(NAME))

#Join county name to our well shapefile
df1 <- df1 %>% st_join(counties1 %>% select(county)) %>% filter(!duplicated(API))

#Only include counties within our dataset
counties1 <- counties1 %>% filter(county %in% df1$county)

#Find the centroid of each shape to place County name on map
counties1$pt1 <- st_centroid(counties1$geometry)

Now that I have this data I can load it up and look at the wells. Let’s plot it up.

leaflet(options = leafletOptions(zoomControl = FALSE)) %>%
  addProviderTiles(providers$Esri.WorldGrayCanvas) %>% #I think this is my favorite basemap these days
  addPolygons(data = counties1$geometry, color = 'black', fillOpacity = 0.05, weight = 1) %>% #Load county shapefiles
  leaflet::addScaleBar(position = 'topleft',scaleBarOptions(metric = FALSE)) %>% #Add Scale Bar to top
  addLabelOnlyMarkers(data = counties1$pt1, label = counties1$county,
                      labelOptions = labelOptions(noHide = T, direction = 'top', textOnly = T)) %>% #County Name
  addPolylines(data = df1$geom, color = 'green', popup = df1$operator, weight = 1) #Well Locations

Figure 1: Middle Bakken Horizontal Locations

An added little functionality I like is adding a popup to each well that displays various information. Above I just did operator (popup = df1$operator), but later on I’ll get a bit more advanced. So go ahead and click on a well and test it out.


Well Design

One thing I notice when I did the summary stats was that I had quite a few missing proppant columns. While most were from older wells before Frac Focus was collecting data, there are still some newer wells too. So here I am going to make an assumption to fill in missing data, which is not as crazy as you think.

One of the jobs of a data scientist/analyst is to decide which data is important to keep, which is important to exclude, and which data can be estimated knowing general rules of the data. This isn’t an insight on data exploration, but I’ll show one of those techniques here.

In general, I know that an operator during a given year will have a similar well design across their acreage position, at least within the same reservoir. So that is the only assumption I will make here. I am going to group the data by production year and operator, and then fill in the missing proppant loading data with the average of that “bin”. You can do this in a myriad of ways (like further segmenting by county), but I will just use this simple one for now.

df1 <- df1 %>% group_by(fpYear = year(firstProd), operator) %>%
  mutate(ppf = replace(ppf, is.na(ppf), mean(ppf, na.rm=TRUE))) %>% ungroup()

#Also want to create a midpoint lat long column
df1 <- df1 %>% mutate(avLong = (surfLong + bhLong)/2, avLat = (surfLat + bhLat)/2)

So now my proppant loading data is very nearly complete.


So that should be enough cleanup for our purposes.


Analysis

Here we will get into the good stuff.


Binning

Because there is a lot of data, and inherently errors in said data (public data isn’t exactly perfect), the preference is to subset data into “bins”. This will smooth out the errors and makes the maps a bit prettier and more informative. There is a multitude of ways to do so. For the heat map, I will just use Township/Range, though if the data is dense enough you can go even closer.

I have a lot of TVD data, so I will go more granular for that one. To do so, I need to create a new id column, which I will call quarter. Taking advantage of the numbering system for Township/Section/Range, I will bin by 2X2 square mile squares. So basically 4 640-acre units, or 9 bins per 36-section area (Township/Range).

df1$quarter <- 1
df1$quarter[df1$section %in% c(3,4,9,10)] <- 2
df1$quarter[df1$section %in% c(5,6,7,8)] <- 3
df1$quarter[df1$section %in% c(17,18,19,20)] <- 4

df1$quarter[df1$section %in% c(21,22,15,16)] <- 5
df1$quarter[df1$section %in% c(23,24,13,14)] <- 6
df1$quarter[df1$section %in% c(35,36,25,26)] <- 7
df1$quarter[df1$section %in% c(27,28,33,34)] <- 8
df1$quarter[df1$section %in% c(29,30,31,32)] <- 9

df1$twoSecLocation <- paste0(df1$twn,'-', df1$rng, '-', df1$quarter)

TVD

First, I will take a look at True Vertical Depth to see how it trends out across the area.

#Create a new data frame that takes the median of TVD by each Two Section Square
chkpts <- df1 %>% data.frame() %>% 
  group_by(twoSecLocation) %>% 
  summarise(lon = mean(avLong), lat = mean(avLat), z = median(TVD, na.rm=TRUE)) %>%
  filter(!is.na(z))

#Define Grid Corners
yo1 <- min(chkpts$lat) - 0.15
yo2 <- max(chkpts$lat) + 0.15
xo1 <- min(chkpts$lon) - 0.15
xo2 <- max(chkpts$lon) + 0.15

#Use geosphere to define length of each distance in miles
length1 <- geosphere::distHaversine(c(xo1, yo1), c(xo1, yo2))*3.28084/5280
length2 <- geosphere::distHaversine(c(xo1, yo1), c(xo2, yo1))*3.28084/5280
length1 <- max(length1, length2)

#To make the eventual map smoother, I'm going to make it a lot more granular.  
#A lot of data will make this really slow though so just remember that for the future.
length1 <- length1*10

#Use the akima package to interpolate the data across the grid
akima.li <- akima::interp(chkpts$lon, chkpts$lat, chkpts$z,
                   yo = seq(yo1, yo2, length = length1),
                   xo = seq(xo1, xo2, length = length1),
                   linear = TRUE,
                   extrap=TRUE,
                   duplicate = 'median')


#Convert grid to a raster object for our map
KernelDensityRaster <- raster(list(x=akima.li$x ,y=akima.li$y ,z = akima.li$z))

#Define the color function for our map
palRaster <- colorBin(rev(brewer.pal(n=11, name='Spectral')),bins = 5, 
                      domain = KernelDensityRaster@data@values, na.color = "transparent")

There’s a lot going on here, but basically we:

  1. Create a new dataframe with the median TVD value by each 2×2 square and lat long data.
  2. Determine our overall grid extend and how granular we want to make each grid (how many points to generate).
  3. Interpolate our dataset onto the grid
  4. Create our map layer

I also want to create a popup for my well data, and then for this example I’ll just look at Oasis wells since they are in the news for non-good reasons.

#Create our cool popup value for each well on the map
df1$popup <- glue("<b>Operator: </b>{df1$operator}<br>
                   <b>Well: </b>{df1$well}<br>
                   <b>TWN-RNG-SEC: </b>{df1$oneSecLocation}<br>
                   <b>API: </b>{df1$API}<br>
                   <b>County: </b>{df1$county}<br>
                   <b>TVD (ft): </b>{as.integer(df1$TVD)}<br>
                   <b>Lateral (ft): </b>{df1$perf}<br>
                   <b>Proppant (lb/ft): </b>{as.integer(df1$ppf)}")

#Filter to Asis
df2 <- df1 %>% filter(grepl('OASIS', operator))

Now let’s map it up.

leaflet(options = leafletOptions(zoomControl = FALSE)) %>% 
  addProviderTiles(providers$Esri.WorldGrayCanvas) %>% 
  addRasterImage(KernelDensityRaster, 
                 colors = palRaster, 
                 opacity = 1) %>%
  addLegend(pal = palRaster, 
            values = KernelDensityRaster@data@values, 
            title = "Middle Bakken<br>TVD<br>Ft")%>%
  addPolygons(data = counties1, fillColor = 'transparent', color = 'black', weight = 2) %>%
  addLabelOnlyMarkers(data = counties1$pt1, label = as.character(counties1$county),
                      labelOptions = labelOptions(noHide = T, direction = 'top', textOnly = T)) %>%
  addPolylines(data = df2$geom, color = 'black', popup = df2$popup, weight = 1)

Figure 2: Middle Bakken True Vertical Depth

So we can see that the deepest portion of the reservoir runs through McKenzie, and shallows up getting closer to our neighbors to the north.


Gas-Oil Ratio

When I did the original data collection, I also took 9-month gas-oil ratio for each well, so I can use the exact same methodology as above to create a GOR map. I’ll spare all the steps outlined above and just show all the code.

chkpts <- df1 %>% data.frame() %>% filter(!is.na(cumOil)) %>% 
  group_by(twn, rng) %>% 
  summarise(lon = mean(avLong), lat = mean(avLat), z = median(gor, na.rm=TRUE)) %>%
  filter(!is.na(z))


yo1 <- min(chkpts$lat) - 0.15
yo2 <- max(chkpts$lat) + 0.15
xo1 <- min(chkpts$lon) - 0.15
xo2 <- max(chkpts$lon) + 0.15

length1 <- distHaversine(c(xo1, yo1), c(xo1, yo2))*3.28084/5280
length2 <- distHaversine(c(xo1, yo1), c(xo2, yo1))*3.28084/5280
length1 <- max(length1, length2)
length1 <- length1*10


akima.li <- interp(chkpts$lon, chkpts$lat, chkpts$z,
                   yo = seq(yo1, yo2, length = length1),
                   xo = seq(xo1, xo2, length = length1),
                   linear = TRUE,
                   extrap=TRUE,
                   duplicate = 'median')



KernelDensityRaster <- raster(list(x=akima.li$x ,y=akima.li$y ,z = akima.li$z))

palRaster <- colorBin(rev(brewer.pal(n=11, name='Spectral')),bins = 6, 
                      domain = KernelDensityRaster@data@values, na.color = "transparent")


#Create the leaflet Plot

leaflet(options = leafletOptions(zoomControl = FALSE)) %>% 
  addProviderTiles(providers$Esri.WorldGrayCanvas) %>% 
  addRasterImage(KernelDensityRaster, 
                 colors = palRaster, 
                 opacity = 1) %>%
  addLegend(pal = palRaster, 
            values = KernelDensityRaster@data@values, 
            title = "Middle Bakken<br>GOR<br>Ft")%>%
  addPolygons(data = counties1, fillColor = 'transparent', color = 'black', weight = 2) %>%
  addLabelOnlyMarkers(data = counties1$pt1, label = as.character(counties1$county),
                      labelOptions = labelOptions(noHide = T, direction = 'top', textOnly = T)) %>%
  addPolylines(data = df2$geom, color = 'black', popup = df2$popup, weight = 1)

Figure 3: Middle Bakken Gas Oil Ratio

It does appear that Gas-Oil-Ratio tends to be highest where TVD is highest, though weirdly seems to be getting higher as you go north. I don’t really know enough about the basin to expand on why, but worth a closer look at some point.


Productivity

Grand finale; let’s see where the most productive parts of the play are. We will need to add one extra step in here. I want to subset the well data so that the completion design isn’t too far out whack so that we can compare somewhat apples-to-apples. To do that, I will just look at proppant loading bins for the production sample-set.

#Variable I'm mapping
df1$cumOilPerFt <- df1$cumOil/df1$perf

#Create Bins
df1$ppfBin <- 250
df1$ppfBin[df1$ppf > 250] <- 500
df1$ppfBin[df1$ppf > 500] <- 750
df1$ppfBin[df1$ppf > 750] <- 1000
df1$ppfBin[df1$ppf > 1000] <- 1250
df1$ppfBin[df1$ppf > 1250] <- 1500
df1$ppfBin[df1$ppf > 1500] <- 1750
df1$ppfBin[df1$ppf > 1750] <- 2000

ppfBins <- df1 %>% data.frame() %>%filter(!is.na(cumOil)) %>%
  group_by(ppfBin) %>% summarise(count = n(), ppf = mean(ppf, na.rm=TRUE)) %>%
  ungroup()

Figure 4: Well Counts by Proppant Loading Bin

So for this workflow, I’ll just select the highest bin, and then the two on either side (750, 1000, 1250). That is a pretty decent sample size and removes what would be outlier completion designs.

Now I will show a similar code to the other maps, but this time I am binning by Township/Range because my sample set is not as robust as for the other ones.

chkpts <- df1 %>% data.frame() %>% filter(!is.na(cumOil)) %>%
  filter(ppfBin %in% c(750, 1000, 1250)) %>% 
  group_by(twn, rng) %>% 
  summarise(lon = mean(avLong), lat = mean(avLat), z = median(cumOilPerFt, na.rm=TRUE)) %>%
  filter(!is.na(z))

yo1 <- min(chkpts$lat) - 0.15
yo2 <- max(chkpts$lat) + 0.15
xo1 <- min(chkpts$lon) - 0.15
xo2 <- max(chkpts$lon) + 0.15

length1 <- distHaversine(c(xo1, yo1), c(xo1, yo2))*3.28084/5280
length2 <- distHaversine(c(xo1, yo1), c(xo2, yo1))*3.28084/5280
length1 <- max(length1, length2)
length1 <- as.integer(length1*10)

akima.li <- interp(chkpts$lon, chkpts$lat, chkpts$z,
                   yo = seq(yo1, yo2, length = length1),
                   xo = seq(xo1, xo2, length = length1),
                   linear = TRUE,
                   extrap=TRUE,
                   duplicate = 'median')

KernelDensityRaster <- raster(list(x=akima.li$x ,y=akima.li$y ,z = akima.li$z))

#This time I am going to just say everything above 25 is 25 just to clean up the map a bit
KernelDensityRaster@data@values[which(KernelDensityRaster@data@values>25)] <- 25

palRaster <- colorBin(rev(brewer.pal(n=11, name='Spectral')),bins = 6, 
                      domain = KernelDensityRaster@data@values, na.color = "transparent")

#Create the leaflet Plot

leaflet(options = leafletOptions(zoomControl = FALSE)) %>% 
  addProviderTiles(providers$Esri.WorldGrayCanvas) %>% 
  addRasterImage(KernelDensityRaster, 
                 colors = palRaster, 
                 opacity = 1) %>%
  addLegend(pal = palRaster, 
            values = KernelDensityRaster@data@values, 
            title = "Middle Bakken<br>Nine Month Oil<br>Per Ft")%>%
  addPolygons(data = counties1, fillColor = 'transparent', color = 'black', weight = 2) %>%
  addLabelOnlyMarkers(data = counties1$pt1, label = as.character(counties1$county),
                      labelOptions = labelOptions(noHide = T, direction = 'top', textOnly = T)) %>%
  addPolylines(data = df2$geom, color = 'black', popup = df2$popup, weight = 2)

And here we go; maybe explains a bit why Oasis is struggling.

Figure 5: Bakken 9-month Productivity Post 2015-05-01


Summary

That is it for this one. Feel free to play around with other operators on the map, or heck come up with other maps. While I didn’t include it in this one, water-oil-ratio is a fun one and correlates very well with EUR.

I hope this helps you realize that you can do heat maps for just about anything in R, not just Bakken productivity. Things like IRR, F&D, Capex, etc.. It’s just limited to your imagination and data.

If you like this, take a look at my other insights in regards to mapping functionality in R.

And if you want to follow along as I teach you how to code in R within an oil and gas framework, follow my blog!


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: