# Shut-ins in the Bakken (Oil & Gas Coding Series)

Bakken operators shut in huge volumes when WTI went negative. Let’s investigate it using R.

## Data

The North Dakota Department of Mineral Resources provides production data monthly going back to 2003, but has only been doing so via excel since 2015. A PDF scraper for the other years is actually quite easy to pull off but that is not the focus of this insight.

Oh, so I have to download 5+ years of monthly excel data to do this? Hell yeah you do, but I can show you how to run a code that does this for you.

#### Site/Link Exploration

If you go to the DMR production page, you see the latest year of data. Hover on the last excel file and right-click to copy the link address. As of today that one is:

```https://www.dmr.nd.gov/oilgas/mpr/2020_06.xlsx
```

Websites don’t explicitly tell you how to scrape their stuff so sometimes it is some trial and error. My first check is usually to look at the link name to see if I can figure out a quick logical way to build a loop. Luckily for us, seems pretty intuitive. If 2020-06 is not a reference to Year-Month then someone is playing a cruel joke. Just to confirm, I’ll go over to 2015 and take a gander at the first one.

```https://www.dmr.nd.gov/oilgas/mpr/2015_05.xlsx
```

Yup, seems to confirm my suspicion, so let’s build out a loop.

#### Loop Design

##### Months-Years

So essentially, I need to build monthly increments from 2015 to 2020. Quick way to do so is just merge a couple of vectors (sequence of numbers in this case) together.

```library(tidyverse)

list1 <- merge(seq(1, 12, 1), seq(2015, 2020, 1)) %>%
select(months = x, years = y) %>%
mutate(months = replace(months,
months < 10,
paste0('0', months[months<10])
)
)
```

Quick runthrough of what I did here:

1. Merge a sequence of months and years together, this creates 72 rows of data.
2. Rename the variables to months and year.
3. Replace any months less than 10 with that number and a 0 in front (ie 1 to 01).

And here is what our data frame looks like.

```head(list1, 4)
```

##### Adding the URL

I also want to add in the URL of the excel file we just located earlier. I am going to use the glue package, which came with our tidyverse install.

```list1 <- list1 %>%
mutate(url1 =
glue::glue('https://www.dmr.nd.gov/oilgas/mpr/{years}_{months}.xlsx')
)
```

What we are doing is:

1. mutate – add a new column to dataframe
2. glue – glue allows us to combine a string with our data easier than trying to build a lot of concatentation. Basically, the url from previous looks we did is pasted in, but instead of 2020, we write {years} and instead of an 01 we write {months}. This looks at our data and pastes in automatically.
```head(list1, 2)
```

We now have the links to all of the excel files. There are actually some extraneous files that don’t actually exist (2020-12 doesn’t exist yet), but don’t worry, we build the loop so as to just throw back a NULL dataframe if it encounters an error.

##### Final Loop

R’s version of loops are the apply/sapply/lapply family of functions, and that’s what we will use here. These packages come installed in base R, but we will also need to install readxl.

```install.packages('readxl')
library(readxl)
```

And we will build our scraper. Basically, we want the function to go through each row of our dataframe, download the data to a new dataframe, and then move on to the next row. Actually simple, but I will break it down after.

```options(scipen = 999)

data1 <- lapply(split(list1, list1[,'url1']), function (well) tryCatch({

temp = tempfile(fileext = ".xlsx")

download.file(well\$url1, destfile=temp, mode='wb')

df1 <- readxl::read_excel(temp, sheet =1)
df1\$Oil <- as.numeric(df1\$Oil)
df1\$Gas <- as.numeric(df1\$Gas)
df1\$GasSold <- as.numeric(df1\$GasSold)
df1\$Flared <- as.numeric(df1\$Flared)
df1\$Days <- as.numeric(df1\$Days)
df1\$Wtr <- as.numeric(df1\$Wtr)
df1\$Section <- as.character(df1\$Section)
df1\$Township <- as.character(df1\$Township)
df1\$Range <- as.character(df1\$Range)
df1\$API_WELLNO <- as.character(df1\$API_WELLNO)
df1\$FileNo <- as.character(df1\$FileNo)
df1

},
error = function(e) {
e
NULL
}))

```
1. lapply -> Does operations on lists and returns and equal sized list. But our dataframe isn’t a list? Well, that’s what the next line is for
2. split(list1, list1[,’url1′]) -> This turns our dataframe into a list of 72 objects (ie we have 72 rows of data). The second part of the split just determines how the list is split. If it was production data, I could split it by API and then it would create a list for each API with a lot of rows in each list for that data. When we get to auto-casting it comes in super handy.
3. function(well) -> We are actually now creating a function for each item in the list. Each dataframe row that gets pulled into the function will essentially be a new dataframe called well.
4. tryCatch -> Error handling in R, at the very bottom you see the error function defined, which basically just says return NULL if it happens.
5. Next three lines are actually performing all of the operations for each row of data. First I’ll create a temporary file with .xlsx extension, download the data to that temp file, and then read in the first sheet with the readxl package.

Press run, and then go get a beer or coffee or do whatever. This will probably take like ten minutes or so.

##### And done!

Now, we can pull the data into a new dataframe.

```df <- dplyr::bind_rows(data1)
```

And there we go, we have monthly production data back to 2015 for North Dakota. What I REALLY hope just happened is a light bulb went off in your head when you remember a time you mind-numbingly downloaded a ton of excel files manually.

## Data Cleanup

#### Initial Exploration

Ok, ran it on my machine and took 6-7 minutes. This is without things like parallel computing. If I just type in:

```head(df)
```

Here is what we get.

Sweet. We have date, operator, API number, well information and location information. Wish there were bottomhole locations here but sadly not. We CAN get those somewhere else but it’s not the focus of this insight.

The real good stuff is that we get oil, water, gas, sold gas, and flared volumes. Essentially we have what we need to estimate shrink and flaring. Ignore the Pool column, as that is not analogous to the Bakken reservoir, it’s more a combo of Bakken/Three Foks so we’d have to go somewhere else for that.

One quick nifty thing I like to do is look at individual columns when I’m doing exploration.

```summary(as.factor(df\$Company))
```

This will return the Companies and how many times they occur in the dataset. In this example, Continental, Whiting, Hess, and Oasis are at the top of the list. One thing I notice is that there’s two XTO’s, so we have to do a little bit of cleanup.

#### Cleanup

##### Company Names

This is usually the most annoying one to clean up. First thing I’ll do is get rid of commas and periods as half the time that is what makes two versions of a company name appear. I also am going to capitalize all of them.

```df <- df %>%
mutate(Company = gsub('.', '', Company, fixed = TRUE))
df <- df %>%
mutate(Company = gsub(',', '', Company, fixed = TRUE))
df\$Company <- toupper(df\$Company)
```

Next I will get rid of common company identifiers and then eliminate some of the resulting spaces in the strings.

```df <- df %>% mutate(Company = gsub('LLC', '', Company, fixed = TRUE))
df <- df %>% mutate(Company = gsub('CORPORATION', '', Company, fixed = TRUE))
df <- df %>% mutate(Company = gsub('CORP', '', Company, fixed = TRUE))
df <- df %>% mutate(Company = gsub('LP', '', Company, fixed = TRUE))
df <- df %>% mutate(Company = gsub('LTD', '', Company, fixed = TRUE))
df <- df %>% mutate(Company = gsub('COMPANY', '', Company, fixed = TRUE))
df <- df %>% mutate(Company = gsub('INC', '', Company, fixed = TRUE))
df <- df %>% mutate(Company = gsub('ENERGY', '', Company, fixed = TRUE))
df <- df %>% mutate(Company = gsub('OPERATING', '', Company, fixed = TRUE))
df <- df %>% mutate(Company = gsub('EXPLORATION &amp; PRODUCTION', '', Company, fixed = TRUE))
df <- df %>% mutate(Company = gsub('E&amp;P', '', Company, fixed = TRUE))
df <- df %>% mutate(Company = gsub('PETROLEUM', '', Company, fixed = TRUE))
df <- df %>% mutate(Company = gsub('OIL AND GAS', '', Company, fixed = TRUE))
df <- df %>% mutate(Company = gsub('RESOURCES', '', Company, fixed = TRUE))
df <- df %>% mutate(Company = gsub('USA', '', Company, fixed = TRUE))
df <- df %>% mutate(Company = gsub('MANAGEMENT', '', Company, fixed = TRUE))
df <- df %>% mutate(Company = gsub('PRODUCTION', '', Company, fixed = TRUE))
df <- df %>% mutate(Company = gsub('NORTH AMERICA', '', Company, fixed = TRUE))
df <- df %>% mutate(Company = gsub('OIL &amp; GAS', '', Company, fixed = TRUE))

df <- df %>% mutate(Company = gsub('\\s+', ' ', Company))
df\$Company <- trimws(df\$Company, which = 'both')
```

And that should be enough for company names.

##### Production

I also need to clean up the production data a little. If I use:

```str(df)
```

I see that most of the production data is actually in character format, so we need to change those to numeric, and then I want to replace na’s with 0.

I also like to change POSIXCT format to just simple Date format, so I’ll do that as well with a new column.

```df <- df %>%
mutate(Oil = as.numeric(Oil),
Gas = as.numeric(Gas),
Wtr = as.numeric(Wtr),
Flared = as.numeric(Flared),
Days = as.numeric(Days))

df[is.na(df)] <- 0
df\$Date <- as.Date(df\$ReportDate)
```

You may notice there are some months with negative volumes, which I’m guessing is due to correcting an allocation error, so I’ll leave it unchanged for now.

That should be enough cleanup for our purposes.

## Data Exploration

#### Mapping the Bakken

Now it’s time to have a little fun. First I will do a quick map of the Bakken Pool well locations using the leaflet, tigris, and sf packages.

```mapDf = st_as_sf(df %>% filter(Pool == 'BAKKEN') %>%
arrange(desc(Date)) %>%
select(API_WELLNO,Company, Long, Lat) %>% distinct() %>%
filter(!duplicated(API_WELLNO)), coords = c("Long", "Lat"),
crs = 4269)

mapDf\$geometry1 <- mapDf\$geometry

mapDf <- mapDf %>% rowwise() %>% mutate(line = c(geometry, geometry1) %>% st_combine() %>% st_cast("LINESTRING")) %>% ungroup()

countyList <- tigris::counties(state = 'ND')
countyList <- st_as_sf(countyList)
countyList <- st_transform(countyList, crs = 4269)
countyList\$pt1 <- st_centroid(countyList\$geometry)

leaflet(options = leafletOptions(zoomControl = FALSE)) %>%
addProviderTiles(providers\$OpenStreetMap) %>%
addPolygons(data = countyList\$geometry, color = 'black', weight = 1) %>%
addPolylines(data = mapDf\$line, popup = mapDf\$Company) %>%
leaflet::addScaleBar(position = 'topleft',scaleBarOptions(metric = FALSE)) %>%
addLabelOnlyMarkers(data = countyList\$pt1, label = countyList\$NAME,
labelOptions = labelOptions(noHide = T, direction = 'top', textOnly = T))
```

Steps:

1. Convert our Bakken production into single rows and then create a sf spatial dataframe with the lat/long data. This allows us to map it. I also convert it to EPSG:4269 to represent NAD83 coordinate system.
2. Create a bottom-hole clone of the surface location.
3. Create a fake line that is just the same point twice so I can map it easier in leaflet. You could just use markers but that is A LOT slower.
4. Download county data from US Census with tigris package.
5. Create a centroid point for each county so I can plot county name too.
6. Build the map. I include a popup that shows the opeator for each well.

Figure 1: Map of Bakken Wells with production post 2015

#### Shut-ins

##### Days

Let’s look real quick into shut-in data, given the craziness in WTI action that occurred earlier this year. First, I’ll just look at total days.

```df1 <- df %>%
group_by(Date) %>%
summarise(days = sum(Days)) %>%
ungroup()
```

Figure 2: North Dakota Producing Days, 2018+

Total producing days in May was down around 38.5% from the peak. What does this mean for total oil volumes?

##### Oil
```library(lubridate)
df1 <- df %>%
group_by(Date) %>%
summarise(oil = sum(Oil)) %>%
ungroup() %>%
mutate(oil = oil/days_in_month(Date))
```

Figure 3: North Dakota Oil Production, bbls/d

While days were down 38.5%, production was down a bit more (~42%). It appears that May was the low though, and production was starting to recover in June.

##### Operators

I will now dig a bit deeper at various operators (top 10). We are just going to select February and May 2020, and then look at how production changed among each operator in that time period. Should give us a good idea of the scale of shut-ins.

```library(tidyverse)
df1 <- df %>% filter(year(Date) == 2020) %>%
filter(month(Date) == 2|month(Date) == 5) %>%
group_by(Company, Date) %>% summarise(oil = sum(Oil)) %>%
ungroup() %>% mutate(oil = oil/days_in_month(Date), Month = month(Date)) %>%
mutate(Month = paste0('M', Month)) %>%
subset(select = -c(Date)) %>% spread(Month, oil) %>%
arrange(desc(M2)) %>% top_n(10) %>%
mutate(totalDrop = M2-M5, percentDrop = 1-M5/M2) %>%
arrange(desc(totalDrop))
```

This one is a bit more complicated.

1. Filter to Year 2020 and then either Month 2 or 5.
2. Group by that company and date and add up the oil production.
3. Divide by days in the month to get bbls/d.
4. Create a new column called either M2 or M5.
5. Remove Date column and then pivot the volumes to new columns.
6. Arrange by descending February production totals and select top 10 operators.
7. Calculate the total production drop, and then the percentage fraction drop.
8. Sort production drop in descending order (for my plot).

Figure 4: Production Drop by Operator, North Dakota

By far Continental and Conoco (Burlington) were the most impacted from a volume perspective. How does it look from a total percentage perspective?

Figure 5: Percentage Production Drop by Operator, North Dakota

Oof, that puts it into perspective. The top three were shutting in around 60% of total volumes. Scatter view is informative as well.

Figure 6: Febuary Production vs Percentage Drop, North Dakota

## Summary

And that is it for this version. In the next North Dakota one, I’ll dig a bit deeper into flaring volumes.

### 7 thoughts on “Shut-ins in the Bakken (Oil & Gas Coding Series)”

1. I get the following error when trying to pull everything into the df data frame: “Error: Can’t combine `..1\$Section` and `..28\$Section` .” Looks like there are some NULL values in the data (like in March of 2020) causing some of the Section columns to get pulled in as characters and some as numbers. Did you not have that problem? I didn’t have much luck figuring out how to find and fix these. Do you have any tips?

I just happened to stumble on your Twitter last week. Glad I found it! I am enjoying your blog.

1. Hey Brett, problem was that some of the columns were in character, and some were in numeric format, which don’t play well together. So I made an update to the post to fix. Testing it now but it should work.

1. What would have worked, I believe, is to install the data.table package. Then, instead of dplyr::bind_rows() you can do data.table::rbindlist(). When I ran it on my machine I believe that’s what I did the first time, but didn’t want to have users install yet another package so I showed the alternate way without testing. My Bad!

This site uses Akismet to reduce spam. Learn how your comment data is processed.