Scraping Strip with R (Oil & Gas Coding Series)

Often I find myself going to various websites whenever I’m doing a valuation to track down either historical or future WTI/Henry Hub prices. At least before I started coding. So let me show you how to automate it with R!


Follow My Blog


Packages

I’m going to use a few packages in this analysis.

  1. tidyverse, but more specifically rvest, xml2, dplyr, stingr, tidyr, lubridate, and httr
  2. zoo

And that’s pretty much it.


Historical Prices

Usually I just want to get monthly prices, but the EIA provides daily settlement so I’ll use that.


WTI

First, I’ll go to the EIA website for WTI (Oil). Basically, I need to locate the webpage and use xml2/rvest to read the website and grab the table that I want.

#Tidyverse packages
library(rvest)
library(xml2)
library(httr)
library(dplyr)

#EIA website for Daily Oil Production
crude <- 'https://www.eia.gov/dnav/pet/hist/RWTCD.htm'

#Read the website structure/content with xml2
webpage <- xml2::read_html(crude)

#Look for table nodes, locate the table we want (6) and put it in a table
wti1 <- webpage %>%
  html_nodes('table') %>%
  .[6] %>%
  html_table(fill = TRUE)

wti1 <- wti1[[1]] %>% filter(!is.na(Mon))

Our resulting dataframe looks like this.

Week ofMonTueWedThuFri
1985 Dec-30 to Jan- 3NANANA25.5626
1986 Jan- 6 to Jan-1026.5325.8525.8726.0325.65
Table 1: EIA WTI Table Structure

I don’t like the layout (really not usable), so I change the table structure and then aggregate at the monthly level.

library(stringr)
library(lubridate)
library(tidyr)

#Create Date Column from first day of week
wti1$Date <- paste0(substr(wti1$`Week Of`, 1, 4), '-',
                    substr(wti1$`Week Of`, 6, 8), '-',
                    substr(wti1$`Week Of`, 10, 11))

#Replace spaces with zeros (Day of Week)
wti1$Date <- gsub(' ', '0', wti1$Date)

#Convert to date format
wti1$Date <- as.Date(wti1$Date, format = '%Y-%b-%d')

#Drop fist column, pivot from wide format to long format, calculate day, and then average by month
wti1 <- wti1 %>% subset(select = -c(`Week Of`)) %>% 
  gather(Day, WTI, -c(Date)) %>%
  arrange(Date) %>% group_by(Date) %>% 
  mutate(daysOn = seq(0, 4, 1)) %>% ungroup() %>% #Create a day of week from 0 to 4
  mutate(Date = Date %m+% days(daysOn)) %>% #Add to first date by day in week
  mutate(DATE = as.Date(paste0(year(Date), '-', month(Date), '-01'))) %>% #Create new date for month/year
  filter(!is.na(WTI)) %>% group_by(DATE) %>% #Remove NA's
  summarise(WTI = mean(WTI, na.rm=TRUE)) %>% 
  ungroup()

Figure 1: Historical WTI Pricing


Henry Hub

Next, I’ll get Henry Hub (Gas). The process is essentially the same for gas.

hh <- 'https://www.eia.gov/dnav/ng/hist/rngwhhdD.htm'
webpage <- xml2::read_html(hh)
hh1 <- webpage %>%
  rvest::html_nodes('table') %>%
  .[6] %>%
  rvest::html_table(fill = TRUE)

hh1 <- hh1[[1]] %>% filter(!is.na(Mon))

hh1$Date <- paste0(substr(hh1$`Week Of`, 1, 4), 
                    '-',
                    substr(hh1$`Week Of`, 6, 8),
                    '-',
                    substr(hh1$`Week Of`, 10, 11))
hh1$Date <- gsub(' ', '0', hh1$Date)

hh1$Date <- as.Date(hh1$Date, format = '%Y-%b-%d')

hh1 <- hh1 %>% subset(select = -c(`Week Of`)) %>% 
  gather(Day, HH, -c(Date)) %>%
  arrange(Date) %>% group_by(Date) %>% 
  mutate(daysOn = seq(0, 4, 1)) %>% ungroup() %>%
  mutate(Date = Date %m+% days(daysOn)) %>%
  mutate(DATE = as.Date(paste0(year(Date), '-', month(Date), '-01'))) %>%
  filter(!is.na(HH)) %>% group_by(DATE) %>%
  summarise(HH = mean(HH, na.rm=TRUE)) %>% 
  ungroup()

Figure 2: Historical Henry Hub Pricing

God I remember when gas prices jumped above $12. We were running through the halls giving high fives to each other.


Strip Forecast

For this one, I will be borrowing from the Wall Street Journal.


Oil

This one is a little easier. It downloads in the format we need, but I will just use month and year to create Date, and then use the settlement price as WTI.

crude = 'https://quotes.wsj.com/futures/CRUDE%20OIL%20-%20ELECTRONIC/contracts'
webpage <- read_html(crude)
tbls <- rvest::html_nodes(webpage, 'table')

tbls_ls <- webpage %>%
  rvest::html_nodes('table') %>%
  .[1] %>%
  rvest::html_table(fill = TRUE)

wti <- tbls_ls[[1]] %>% select(MONTH, SETTLEMENT)
wti <- wti %>% filter(MONTH != 'Front Month')
wti$MONTH <- gsub('Crude Oil', '', wti$MONTH)

wti$MONTH <- trimws(wti$MONTH, which = 'both')

wti$Year <- substr(wti$MONTH, 5, 8)
wti$Month <- substr(wti$MONTH, 1, 3)
wti$Date <- as.Date(paste0(wti$Month, '/01/', wti$Year), format = '%b/%d/%Y')
wti <- wti %>% select(DATE = Date, WTI = SETTLEMENT)


Gas

And same for Gas Price Forecast.

gas = 'https://quotes.wsj.com/futures/NATURAL%20GAS/contracts'
webpage <- read_html(gas)
tbls <- rvest::html_nodes(webpage, 'table')

tbls_ls <- webpage %>%
  rvest::html_nodes('table') %>%
  .[1] %>%
  rvest::html_table(fill = TRUE)

hh <- tbls_ls[[1]]%>% select(MONTH, SETTLEMENT) %>% filter(MONTH != 'Front Month')
hh$MONTH <- gsub('Natural Gas', '', hh$MONTH)
hh$MONTH <- trimws(hh$MONTH, which = 'both')
hh$Year <- substr(hh$MONTH, 5, 8)
hh$Month <- substr(hh$MONTH, 1, 3)
hh$Date <- as.Date(paste0(hh$Month, '/01/', hh$Year), format = '%b/%d/%Y')

hh <- hh %>% select(DATE = Date, HH = SETTLEMENT)

Combine it all together

Now we can just link up all the data and create a combined historical + forecast view.

wti <- rbind(wti1, wti)
hh <- rbind(hh1, hh)

library(zoo)
prices <- merge(seq(min(year(hh$DATE)), max(year(hh$DATE)), 1),
               seq(1, 12, 1)) %>%
  mutate(DATE = as.Date(paste0(x, '-', y, '-01'))) %>%
  arrange(DATE) %>%
  select(-c(x, y)) %>% left_join(wti) %>% left_join(hh) %>%
  mutate(WTI = na.locf(WTI), HH = na.locf(HH))
  

I use na.locf (from zoo) to just fill in missing values with the last reported value. I also create a total sequence of dates because there are random missing dates within the dataset. When I’m all done, I have a price file. It’s simple enough just to turn this into a function that can be run whenever you like.

Figure 3: Combined Oil & Gas Price Forecast

Summary

Thanks for tuning in for this version. Eventually I will build all of this into a cohesive valuation methodology.

1 thought on “Scraping Strip with R (Oil & Gas Coding Series)”

  1. Pingback: EnergyNet Override Valuation (Oil & Gas Coding with R) - Shale Insights

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: