Oil & Gas Coding with Python (part 3)

Per usual, we follow in the footsteps of @fauxright with a Python spin. For reference, I’ll be recreating the work from this post.

First, I’ll grab some code from the previous Python post that contained a function for the modified hyperbolic decline. If the code below is not familiar from part 2, take a moment to review that post in more detail.

Decline Curve

``````...
# set up the initial decline curve parameters to be used
initialParamsGas =   [ 10000,  0.75,  0.7,   0.1,  2,  30,   'Gas']
initialParamsOil =   [ 25000,   0.6,  0.7,  0.09,  1,  30,   'Oil']
initialParamsWater = [ 22000,     1, 0.75,  0.08,  0,  30, 'Water']
# Create a variable to hold all of our products (Oil, Gas, and Water)
initialParams = []
initialParams.append(initialParamsGas)
initialParams.append(initialParamsOil)
initialParams.append(initialParamsWater)
# Modified Hyperbolic Decline function utilized to generate the decline forecast
def modified_hyperbolic_decline(T,Qi,B,Di,Dlim):
# Some minor logic here to determine which D factor to use
usedD = 1/((1/Di)+(B*T))
usedD = usedD.where(usedD > Dlim,Dlim)
return Qi/np.power((1+B*usedD*T),1./B)
...``````

Forward Strip Price

Next, I will pull in the forward strip pricing for West Texas Intermediate and Henry Hub. Similar to @fauxright, I am borrowing my data from the Wall Street Journal. Making this code a function will allow us to minimize and reuse the code as much as possible.

Selenium

Unless you pay for a data service, gathering our futures strip will require some web scraping. That can be achieved several ways in Python, but today I am going to leverage the Selenium library. Selenium allows us to control a chrome browser instance programmatically with Python. With that said, you will need to have Chrome installed on your machine. Additionally, you will need to download the Chrome driver binary and place the chromedriver.exe file in your project folder, or your Python PATH. If you have issues getting Selenium and the Chrome driver binary setup, reach out and I can help.

Define Pulling Strip Function

Let’s get to some code.

``````...
# Function to read a web page and pull a <table> element into a pandas dataframe
def pullCommodityPricing(inputLink):
# Selenium options
options = Options()
options.headless = True
options.add_argument("--window-size=1920,1200")
driver = webdriver.Chrome(options=options)
# Call the Chrome browser driver, and pass it the web page
driver.get(inputLink)
# Added a sleep function to 'pause' and let the webpage load
time.sleep(2)
# this returns the full HTML source of the web page
fullPageHTML = driver.page_source
# shutdown the instance of the Chrome driver
driver.quit()
# Start working with the HTML we scraped from the site above
soup=BeautifulSoup(fullPageHTML,'html.parser')
# Select the <table> element, which will be holding our data
webTable=soup.select_one("table")
table=pd.read_html(str(webTable))
# return a single data table, a few parts of this function may
# need modified based on the input webpage
return table[0]
...``````

The “pullCommodityPricing” function shown above is generic, yet extremely useful. Any website containing at least one <table> element can be processed scraped for data.

``````...
# Function to read a web page and pull a <table> element into a pandas dataframe
def pullCommodityPricing(inputLink):
# Selenium options
options = Options()
options.headless = True
options.add_argument("--window-size=1920,1200")
driver = webdriver.Chrome(options=options)
# Call the Chrome browser driver, and pass it the web page
driver.get(inputLink)
# Added a sleep function to 'pause' and let the webpage load
time.sleep(2)
# this returns the full HTML source of the web page
fullPageHTML = driver.page_source
# shutdown the instance of the Chrome driver
driver.quit()
# Start working with the HTML we scraped from the site above
soup=BeautifulSoup(fullPageHTML,'html.parser')
# Select the <table> element, which will be holding our data
webTable=soup.select_one("table")
table=pd.read_html(str(webTable))
# return a single data table, a few parts of this function may
# need modified based on the input webpage
return table[0]
...``````

Generate Production Data

With those two core functions defined, I will start to progress through the actual work of generating decline curves, compiling strip pricing, and additional information necessary for the financial model.

``````...
# generate a pandas dataframe to hold our time column, assuming the
# well life from the gas parameters is usable for everything
modelTime = pd.DataFrame({'Month':
np.arange(1, 12*initialParamsGas[5]+1,1)})
# Process each product stream from the well (Gas, Oil, Water)
for product in initialParams:
# run the modified hyperbolic decline function to generate production rate estimates
ratePrediction = modified_hyperbolic_decline(
modelTime['Month']-1,product[0],product[1],product[2], product[3])

# Add the modified ARPs rates to our time dataframe and shift based on the 'flat months' input
modelTime

AFE Leaks Actual Capital Expenditure DataRated 0 out of 5
\$999.99
Add to cart

] = ratePrediction.shift(
product[4],fill_value=product[0])
...``````

Similar to Python blog post 2, a dataframe is generated with the timing component for the well. What has evolved since the last post is that we now have a list of products (oil, gas, and water) that also have their own respective lists (of parameters for decline curve generation). To keep my code to a minimum, I use a ‘for loop’ to iterate through the products being produced from our well. Each product is appended to the ‘modelTime’ dataframe for future use.

Append price file to production

``````...
# List of commodities we want to pull
commodities = [
'https://www.wsj.com/market-data/quotes/futures/CRUDE%20OIL%20-%20ELECTRONIC/contracts',
'https://quotes.wsj.com/futures/NATURAL%20GAS/contracts'
]
# Grab out commodity futures
for x in commodities:
# Grab pricing
Futures = pullCommodityPricing(x)
# Cut out the row containing 'Front Month'
Futures = Futures.drop(Futures.index[1])
# reset the index of the dataframe so that things line up with our modelTime dataframe
Futures = Futures.reset_index(drop=True)
# Filter the string to just capture the Month/Year
Futures['MONTH'] = Futures['MONTH'].str[-8:]
# Convert our MONTH column to a python recognized Datetime
Futures['MONTH'] = pd.to_datetime(Futures['MONTH'], format='%b %Y')
# Append to our original dataframe, modelTime
if 'oil' in x.lower():
modelTime['Date'] = Futures['MONTH']
modelTime['WTI'] = Futures['SETTLEMENT']
elif 'natural' in x.lower():
modelTime['HH'] = Futures['SETTLEMENT']
# Fill any blank values (at the end of our dataframe) with the last values of the prices
modelTime = modelTime.fillna(method='ffill')
...``````

The links to the future strip pricing for both oil and gas are placed into an array called ‘commodities’. Just below that, a for loop is used to iterate through the website links stored in ‘commodities’ and scrape the data we need to move forward. Some of the processing in this section is specific to the Wall Street Journal site, so if you have an alternate data source, you may need to clean the data in a slightly different manner.

Ownership Model

True to the previous work of @fauxright, I lay out the working interest and revenue interest model for expenses and revenue calculations.

``````...
# == Ownership Section ==

# Variable for our revenue model inputs
# [0] = Working Interest
# [1] = Royalty Interest
# [2] = Shrink
# [3] = NGL Yield (per Mcf)
# [4] = Oil Differential
# [5] = Gas Differential
# [6] = NGL Differential (from WTI)
RevenueModel = [1,0.75,0.75,10,0.95,0.8,0.3]
modelTime['NGL'] = modelTime['Gas']*RevenueModel[3]/1000
modelTime['Oil Revenue'] = modelTime['Oil']*RevenueModel[0]*RevenueModel[1]*modelTime['WTI']*RevenueModel[4]
modelTime['Gas Revenue'] = modelTime['Gas']*RevenueModel[0]*RevenueModel[1]*modelTime['HH']*RevenueModel[5]*RevenueModel[2]
modelTime['NGL Revenue'] = modelTime['NGL']*RevenueModel[0]*RevenueModel[1]*modelTime['WTI']*RevenueModel[6]
modelTime['Total Revenue'] = modelTime['Oil Revenue'] + modelTime['Gas Revenue'] + modelTime['NGL Revenue']
# == Expenses Section ==
# \$10,000/month until fluid rate <30,000 bbls/month, then \$5,000/month
fixedCost = [10000, 5000]
# \$0.50/bbl oil, \$0.20/mcf gas, \$1.00/bbl water
variableCost = [0.5, 0.2, 1]
conditions = [modelTime['Oil']+modelTime['Water'] >= 30000,
modelTime['Oil']+modelTime['Water'] < 30000]
modelTime['Fixed Cost'] = np.select(conditions, fixedCost)
modelTime['Variable Cost'] = modelTime['Oil']*variableCost[0]+modelTime['Gas']*variableCost[1]+modelTime['Water']*variableCost[2]
modelTime['Total Cost'] = modelTime['Fixed Cost'] + modelTime['Variable Cost']
...``````

The assumptions here mostly track the Oil & Gas Coding with R (Part 3) post. I utilize a special function in numpy, ‘np.select’, to insert the correct fixed cost through time.

Taxes/Cash Flow

``````...
# == Taxes Section ==
# Assuming Texas severance taxes, oil well status
# [0] - Texas Oil Severance
# [1] - Texas Gas/NGL Severance
# [2] - Ad Valorem
taxes = [0.046, 0.075, 0.025]
modelTime['Production Tax'] = modelTime['Oil Revenue']*taxes[0] + (modelTime['Gas Revenue'] + modelTime['NGL Revenue'])*taxes[1] + modelTime['Total Revenue']*taxes[2]
# == Cash Flow Section ==
modelTime['Cash Flow'] = modelTime['Total Revenue'] - modelTime['Total Cost'] - modelTime['Production Tax']
...``````

The tax section mirrors the previous work as well.

A more complex version of this workflow could be constructed for various tax rates based on the location of the well. For example, horizontal gas wells in Texas are eligible for a severance tax deduction. Interestingly, the data is also publicly available and can be converted to actual capex. It can be visualized at our sister site, AFE Leaks. Reach out if you are interested in the dataset of 26,000+ wells in Texas and Louisiana covering plays such as the Delaware, Eagle Ford, Haynesville, and Barnett.

I calculate cash flow by taking the calculated cost and taxes out of the revenue deck.

CAPEX/P&A

``````...
# == CAPEX/P&A Section ==
# [0] = Drilling Cost
# [1] = Completion Cost
# [2] = P&A Cost
capex = [4000000, 600000, 50000]
spudToProductionDelay = 6
# Create a new Dataframe to shift the timing variables for CAPEX vs production
fullModel = pd.DataFrame(
{'Month': np.arange(1, 12*initialParamsGas[5]+1+spudToProductionDelay,1)})
# Generate conditions to insert specific CAPEX events
conditions = [fullModel['Month'] == 1,
fullModel['Month'] == spudToProductionDelay,
fullModel['Month'] == initialParamsGas[5]*12]
fullModel['CAPEX'] = np.select(conditions,capex,default=0)
...``````

Notice here that we are using a new dataframe, ‘fullModel‘, for this portion of the code. I implement CAPEX timing by using the ‘np.select‘ function in a similar fashion to the costs in the ownership section above. I assume the P&A cost occur on the previously declared well life from the decline curve section.

``````...
# Shift the month column to time out with CAPEX timeline
modelTime['Month'] = modelTime['Month'] + spudToProductionDelay
# Rearrange our columns in case we want to output the data to a file
modelTime = modelTime[['Month', 'Gas', 'Oil', 'NGL', 'Water', 'Date', 'WTI', 'HH', 'Oil Revenue', 'Gas Revenue', 'NGL Revenue', 'Total Revenue', 'Fixed Cost', 'Variable Cost', 'Total Cost', 'Production Tax', 'Cash Flow']]
fullModel = pd.merge(modelTime, fullModel, how='right', on='Month')
fullModel = fullModel.sort_values(by='Month',ignore_index=True)
fullModel = fullModel.fillna(0)
...``````

Next, we do some housekeeping. The ‘modelTime‘ dataframe is shifted to account for the delay between CAPEX timing and initial production. After that, I merge the dataframes to create one source of truth for all of the work to this point.

Net Present Value

``````...
# == NPV Section ==
econ = [0.1,1]
fullModel['Present Value'] = (fullModel['Cash Flow']-fullModel['CAPEX'])/(1.0+econ[0])**(fullModel['Month']/12)
fullModel['Cumulative Present Value'] = fullModel['Present Value'].cumsum()
#Export model to CSV
fullModel.to_csv('FullModel.csv')
...``````

Finally, we have a full data set to generate financial metrics to your hearts desire. The present value and cumulative NPV are appended to the data set to show the methodology. If you want to export all of this work into a CSV file for storage, follow the last line and check the file folder where you are building this script.

Code

``````# -*- coding: utf-8 -*-
"""
Created on Thu Sep 10 22:25:01 2020
@author: @FracLost
"""
import time
import pandas as pd
import numpy as np
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
# set up the initial decline curve parameters to be used
initialParamsGas =   [ 10000,  0.75,  0.7,   0.1,  2,  30,   'Gas']
initialParamsOil =   [ 25000,   0.6,  0.7,  0.09,  1,  30,   'Oil']
initialParamsWater = [ 22000,     1, 0.75,  0.08,  0,  30, 'Water']
# Create a variable to hold all of our products (Oil, Gas, and Water)
initialParams = []
initialParams.append(initialParamsGas)
initialParams.append(initialParamsOil)
initialParams.append(initialParamsWater)
# Modified Hyperbolic Decline function utilized to generate the decline forecast
def modified_hyperbolic_decline(T,Qi,B,Di,Dlim):
# Some minor logic here to determine which D factor to use
usedD = 1/((1/Di)+(B*T))
usedD = usedD.where(usedD > Dlim,Dlim)
return Qi/np.power((1+B*usedD*T),1./B)
# Function to read a web page and pull a <table> element into a pandas dataframe
def pullCommodityPricing(inputLink):
# Selenium options
options = Options()
options.headless = True
options.add_argument("--window-size=1920,1200")
driver = webdriver.Chrome(options=options)
# Call the Chrome browser driver, and pass it the web page
driver.get(inputLink)
# Added a sleep function to 'pause' and let the webpage load
time.sleep(2)
# this returns the full HTML source of the web page
fullPageHTML = driver.page_source
# shutdown the instance of the Chrome driver
driver.quit()
# Start working with the HTML we scraped from the site above
soup=BeautifulSoup(fullPageHTML,'html.parser')
# Select the <table> element, which will be holding our data
webTable=soup.select_one("table")
table=pd.read_html(str(webTable))
# return a single data table, a few parts of this function may
# need modified based on the input webpage
return table[0]
# generate a pandas dataframe to hold our time column, assuming the
# well life from the gas parameters is usable for everything
modelTime = pd.DataFrame({'Month':
np.arange(1, 12*initialParamsGas[5]+1,1)})
# Process each product stream from the well (Gas, Oil, Water)
for product in initialParams:
# run the modified hyperbolic decline function to generate production rate estimates
ratePrediction = modified_hyperbolic_decline(
modelTime['Month']-1,product[0],product[1],product[2], product[3])

# Add the modified ARPs rates to our time dataframe and shift based on the 'flat months' input
modelTime

AFE Leaks Actual Capital Expenditure DataRated 0 out of 5
\$999.99
Add to cart

] = ratePrediction.shift(
product[4],fill_value=product[0])

# List of commodities we want to pull
commodities = [
'https://www.wsj.com/market-data/quotes/futures/CRUDE%20OIL%20-%20ELECTRONIC/contracts',
'https://quotes.wsj.com/futures/NATURAL%20GAS/contracts'
]
# Grab out commodity futures
for x in commodities:
# Grab pricing
Futures = pullCommodityPricing(x)
# Cut out the row containing 'Front Month'
Futures = Futures.drop(Futures.index[1])
# reset the index of the dataframe so that things line up with our modelTime dataframe
Futures = Futures.reset_index(drop=True)
# Filter the string to just capture the Month/Year
Futures['MONTH'] = Futures['MONTH'].str[-8:]
# Convert our MONTH column to a python recognized Datetime
Futures['MONTH'] = pd.to_datetime(Futures['MONTH'], format='%b %Y')
# Append to our original dataframe, modelTime
if 'oil' in x.lower():
modelTime['Date'] = Futures['MONTH']
modelTime['WTI'] = Futures['SETTLEMENT']
elif 'natural' in x.lower():
modelTime['HH'] = Futures['SETTLEMENT']
# Fill any blank values (at the end of our dataframe) with the last values of the prices
modelTime = modelTime.fillna(method='ffill')
# == Ownership Section ==

# Variable for our revenue model inputs
# [0] = Working Interest
# [1] = Royalty Interest
# [2] = Shrink
# [3] = NGL Yield (per Mcf)
# [4] = Oil Differential
# [5] = Gas Differential
# [6] = NGL Differential (from WTI)
RevenueModel = [1,0.75,0.75,10,0.95,0.8,0.3]
modelTime['NGL'] = modelTime['Gas']*RevenueModel[3]/1000
modelTime['Oil Revenue'] = modelTime['Oil']*RevenueModel[0]*RevenueModel[1]*modelTime['WTI']*RevenueModel[4]
modelTime['Gas Revenue'] = modelTime['Gas']*RevenueModel[0]*RevenueModel[1]*modelTime['HH']*RevenueModel[5]*RevenueModel[2]
modelTime['NGL Revenue'] = modelTime['NGL']*RevenueModel[0]*RevenueModel[1]*modelTime['WTI']*RevenueModel[6]
modelTime['Total Revenue'] = modelTime['Oil Revenue'] + modelTime['Gas Revenue'] + modelTime['NGL Revenue']
# == Expenses Section ==
# \$10,000/month until fluid rate <30,000 bbls/month, then \$5,000/month
fixedCost = [10000, 5000]
# \$0.50/bbl oil, \$0.20/mcf gas, \$1.00/bbl water
variableCost = [0.5, 0.2, 1]
conditions = [modelTime['Oil']+modelTime['Water'] >= 30000,
modelTime['Oil']+modelTime['Water'] < 30000]
modelTime['Fixed Cost'] = np.select(conditions, fixedCost)
modelTime['Variable Cost'] = modelTime['Oil']*variableCost[0]+modelTime['Gas']*variableCost[1]+modelTime['Water']*variableCost[2]
modelTime['Total Cost'] = modelTime['Fixed Cost'] + modelTime['Variable Cost']
# == Taxes Section ==
# Assuming Texas severance taxes, oil well status
# [0] - Texas Oil Severance
# [1] - Texas Gas/NGL Severance
# [2] - Ad Valorem
taxes = [0.046, 0.075, 0.025]
modelTime['Production Tax'] = modelTime['Oil Revenue']*taxes[0] + (modelTime['Gas Revenue'] + modelTime['NGL Revenue'])*taxes[1] + modelTime['Total Revenue']*taxes[2]
# == Cash Flow Section ==
modelTime['Cash Flow'] = modelTime['Total Revenue'] - modelTime['Total Cost'] - modelTime['Production Tax']
# == CAPEX/P&A Section ==
# [0] = Drilling Cost
# [1] = Completion Cost
# [2] = P&A Cost
capex = [4000000, 6000000, 50000]
spudToProductionDelay = 6
# Create a new Dataframe to shift the timing variables for CAPEX vs production
fullModel = pd.DataFrame(
{'Month': np.arange(1, 12*initialParamsGas[5]+1+spudToProductionDelay,1)})
# Generate conditions to insert specific CAPEX events
conditions = [fullModel['Month'] == 1,
fullModel['Month'] == spudToProductionDelay,
fullModel['Month'] == initialParamsGas[5]*12]
fullModel['CAPEX'] = np.select(conditions,capex,default=0)
# Shift the month column to time out with CAPEX timeline
modelTime['Month'] = modelTime['Month'] + spudToProductionDelay
# Rearrange our columns in case we want to output the data to a file
modelTime = modelTime[['Month', 'Gas', 'Oil', 'NGL', 'Water', 'Date', 'WTI', 'HH', 'Oil Revenue', 'Gas Revenue', 'NGL Revenue', 'Total Revenue', 'Fixed Cost', 'Variable Cost', 'Total Cost', 'Production Tax', 'Cash Flow']]
fullModel = pd.merge(modelTime, fullModel, how='right', on='Month')
fullModel = fullModel.sort_values(by='Month',ignore_index=True)
fullModel = fullModel.fillna(0)
# == NPV Section ==
econ = [0.1,1]
fullModel['Present Value'] = (fullModel['Cash Flow']-fullModel['CAPEX'])/(1.0+econ[0])**(fullModel['Month']/12)
fullModel['Cumulative Present Value'] = fullModel['Present Value'].cumsum()
# Generate a chart of Cumulative PV
fig, ax = plt.subplots()
rects1 = ax.bar(fullModel['Month'], fullModel['Cumulative Present Value'])
# Styling of chart
ax.set_ylabel('Cumulative PV')
ax.set_xlabel('Months')
fmt = '\${x:,.0f}'
tick = mtick.StrMethodFormatter(fmt)
ax.yaxis.set_major_formatter(tick)
fullModel.to_csv('FullModel.csv')``````

3 thoughts on “Oil & Gas Coding with Python (part 3)”

1. hey, what’s up with the
] = ratePrediction.shift(
product[4],fill_value=product[0])
block?

1. Hey Ben W, here we are shifting the data in the data frame by the number of flat months (item 4 in the product array). We are also filling in those initial months with the Qi (initial rate) value, which is item 0 in the product array. Think of it like inserting cells above our data and prefilling them with a set value at the same time.

This site uses Akismet to reduce spam. Learn how your comment data is processed.