Oil & Gas Coding with R (Part 1)

This will be the start of a continuing series on how to code in R with the assumption that you have never coded in R before and that you work in oil and gas (so hopefully you know the lingo). In this post, I will go through general procedural things, like how to actually install what you need.

Some background; I’m not a computer science guy. I was a reservoir engineer that went into finance and who then decided to start picking this up. Made me hugely faster and more efficient; the key advantage is I’m used to working with all of the data sources you are, I know general oil and gas lingo, and I think like you. And domain knowledge trumps generalist coders every time. So my syntax probably sucks. There are probably faster ways to do things. But I will get you from point A to point B.

Follow My Blog

Getting Started

Please god don’t do this on your work computer if you can help it. The firewalls and permissions are a pain. I am going to do this using Windows, though the program can be installed on whichever system you like. The easiest is probably Linux, but the process of setting up an AWS account and learning Bash coding language might be too much of a brain hurt this early in the process. Sorry Apple brethren; I’ve never installed it on there but there are plenty of online resources. If you have a computer with a lot of Cores (look it up), that would be ideal.

Install R/RStudio/Rtools

This process is somewhat straightforward, but I’m betting there’ll be a lot of issues that crop up. Just follow these instructions. The versions of R currently are newer (currently v 4.0.2), but the process is correct. FOLLOW THESE EXACTLY. Go to the PATH in your environment and do these things or none of your crap will work right.

When everything is done, you should have a working version of RStudio up and ready to go. What is RStudio? It’s a Graphical User Interface for R; think of it as basically a big program like ARIES or something, but far less painful. It allows you to do all kinds of cool stuff with R. When you load it up, create a new project and save it in some directory. Your screen should look something like this. My current project is saved in Documents/R/class.

Figure 1: My RStudio Screen

The white actually hurts my eyes quite a bit, so I go to Tools at the top, then Global Options, and I change my Appearance to Cobalt, which is Dark Purple background and white text.

Install some packages

R is Open Source, which means there are A LOT of packages built by people to do really cool analysis. There are even oil and gas specific packages. However, pound for pound, the best thing to install is the tidyverse. It opens you up to a whole range of packages that help you to manipulate and use data. Code snippet is below.

#Install Libraries

#Load Libraries

By the way, there are two places to run code, either in the Console, which is where default R will show up, or in an R document. Underneath File there is a picture of a piece of paper with a plus symbol on it.

Click it and press R Script. Save it as something like working.R. This is the first step towards REPRODUCIBLE CODE. If you just use the console, everything works fine, but it won’t actually save anything once you shut it all down. The R document allows you to keep a record of all of it.

There are three ways to run this.

  1. Highlight it and press run on the top right. Slow and inefficient.
  2. If it’s a single line code, just keep the cursor at the end of the line and push CTRL+Enter.
  3. Multi-line code, Highlight it all and press CTRL+Enter

This will push that section of the code to the console and run it automatically. These are big packages so might take you a few minutes to install.

Download Data

I have downloaded this data myself, so I can share this stuff with you problem free. For this example, I’ll share monthly production data for all of Antero’s wells in the Northeast that I pulled from their sites.

#Remove Scientific Notation and make strings Character type
options(scipen = 999)
options(stringsAsFactors = FALSE)

#Load Data
prod <- readr::read_csv('https://github.com/xbrl-data/class/raw/master/prodAntero.csv')

Investigate Data

This should run through Q1 2020. If you want to just open the table and look at it, in the top right “box” in RStudio is all of your loaded files. If you press prod it will bring up the table in something similar to excel for you to investigate. With this many rows (125,657) that’s probably a bit too much to glean anything useful. That’s why there’s two handy little things available to investigate your data.



str will tell you the type of each column of data. Basically is it a number, integer, date, column, or a few other random things. Pretty useful. For example, I don’t really want to carry API as a numeric column for various reasons, so I am going to make it a character. Luckily the date column is already in date format, or we would need to convert it. A few extra things to be aware of:

  1. The $ sign is used to reference a variable. So I would say table of prod$API to specify that column.
  2. <- is equivalent to =. = Also works but that’s not how R is taught.
  3. %>% or “pipe” as it’s referred to. Basically says continue doing the current operation and only execute when you are done. It allows you to chain a lot of commands together and reduces how much you have to code. We will use A LOT of piping.

The other thing I notice when running str is that there is a random variable called X1. I probably screwed up the save, but I’m guessing it’s the row number. We can get rid of it using the select function from dplyr (from the tidyverse) and subset function that is from base R.

#Convert API column to character
prod$API <- as.character(prod$API)

#Basically select everything but column X1
prod <- prod %>% subset(select = -c(X1))


summary tells us various things about each column, but is really only useful for numeric, date, and factors (which I haven’t covered but are basically big grouping categories). I prefer to steer clear of factors as it can slow down my machine with these big datasets.


It actually gives you some good data (number of NA’s, distributions, etc.). A lot of noise right now but particularly valuable when looking for outliers.

I notice that I have Date going back to 1985 and that I have data for oil, gas, and water in monthly increments.

Let’s do something cool!

All right, we’ve learned a little bit, but time to show some real value real quick. The two main things you will be using from tidyverse are:

  1. mutate – Basically creates a new variable (column)
  2. group_by (and ungroup) – This allows you to apply analysis to each thing you group by.
prod <- prod %>% arrange(API, Date) %>% #Arrange Is Just Sort in Ascending Order
  group_by(API) %>% mutate(monthsOn = seq(1, n(), 1), #Add a cumulative months column
                           fpYear = year(min(Date)), #Create a First Production Year Column
                           cumOil = cumsum(oil), #Create a cumulative oil column
                           cumGas = cumsum(gas), #Creage a cumulative gas column
                           cumMCFE = cumOil*6 + cumGas) %>%#Create a cumualtive mcfe column 
  ungroup() #We always want to unroup after a grouping or some weird stuff happens

Essentially, we are just taking each well (API) and then adding a cumulative month, a first production year, and then cumulative volumes.

Another library I love, and you should too, is highcharter. You can make some really cool interactive graphs (free for personal use but not for reproducible use). We are going to use it here to visualize our data. I want to look at cumulative production over time, binned by first production year. I’m also going to filter out any wells that came online before 2010.

prod1 <- prod %>% filter(fpYear >= 2010) %>% #Filter Production
  group_by(fpYear, monthsOn) %>% summarise(cumMCFE = mean(cumMCFE), count = n()) %>% #Group By Year and Cumulative Month and take averages and total count
  ungroup() %>% group_by(fpYear) %>% filter(count >= max(count)*0.4) %>% #You want to do this to remove months at the end that have lower well samples
  mutate(cumMCFE = as.integer(cumMCFE)) #Convert to integer

The next part might be a bit advanced, but it shows how to create these very customized, interactive plots in highcharts. Code is below.


highchart() %>%
  hc_add_series(prod1, type = 'spline',
                hcaes(x = monthsOn, y = cumMCFE, group = fpYear),
                marker = list(enabled = FALSE)) %>%
  hc_title(text = 'Antero Cumulative MCFE Over Time', align = 'left') %>%
  hc_subtitle(text = 'By First Production Year', align = 'left')  %>%
  hc_tooltip(pointFormat = "<span style=\"color:{series.color}\">{series.name}</span>:
             <b> {point.y}</b><br/>", shared = FALSE) %>% 
  hc_xAxis(title = list(text = '<b>Cumulative Months</b>', style = list(fontSize = '18px')),
    labels = list(style = list(
      color = '#0D1540',
      fontSize = '14px', 
      fontWeight = 'bold'))) %>% 
  hc_yAxis(title = list(text = '<b>Monthly MCFE</b>', style = list(fontSize = '18px')),
           labels = list(style = list(fontSize = '12px', fontWeight = 'bold'))) %>%
  hc_credits(enabled = TRUE, text = 'Powered by Highcharts', href = "https://www.highcharts.com/")

And here is the result!

Figure 1: Antero Cumulative Production over Time, By Year


And that’s it. Stay tuned for the next one, where we get into some production forecasting.

3 thoughts on “Oil & Gas Coding with R (Part 1)”

  1. Pingback: Oil & Gas Coding with Python (Part 1) - Shale Insights

  2. Pingback: Explaining the Shiny App (Oil & Gas Coding Series) - Shale Insights

  3. Pingback: Oil & Gas Coding with R (Part 3) - Shale Insights

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: