gray laptop computer showing html codes in shallow focus photography

Oil & Gas Coding with Python (part 3)

Per usual, we follow in the footsteps of @fauxright with a Python spin. For reference, I’ll be recreating the work from this post.

First, I’ll grab some code from the previous Python post that contained a function for the modified hyperbolic decline. If the code below is not familiar from part 2, take a moment to review that post in more detail.


Follow My Blog


Decline Curve

...
# set up the initial decline curve parameters to be used
initialParamsGas =   [ 10000,  0.75,  0.7,   0.1,  2,  30,   'Gas']
initialParamsOil =   [ 25000,   0.6,  0.7,  0.09,  1,  30,   'Oil']
initialParamsWater = [ 22000,     1, 0.75,  0.08,  0,  30, 'Water']
# Create a variable to hold all of our products (Oil, Gas, and Water)
initialParams = []
initialParams.append(initialParamsGas)
initialParams.append(initialParamsOil)
initialParams.append(initialParamsWater)
# Modified Hyperbolic Decline function utilized to generate the decline forecast
def modified_hyperbolic_decline(T,Qi,B,Di,Dlim):
    # Some minor logic here to determine which D factor to use
    usedD = 1/((1/Di)+(B*T))
    usedD = usedD.where(usedD > Dlim,Dlim) 
    return Qi/np.power((1+B*usedD*T),1./B)
...

Forward Strip Price

Next, I will pull in the forward strip pricing for West Texas Intermediate and Henry Hub. Similar to @fauxright, I am borrowing my data from the Wall Street Journal. Making this code a function will allow us to minimize and reuse the code as much as possible.


Selenium

Unless you pay for a data service, gathering our futures strip will require some web scraping. That can be achieved several ways in Python, but today I am going to leverage the Selenium library. Selenium allows us to control a chrome browser instance programmatically with Python. With that said, you will need to have Chrome installed on your machine. Additionally, you will need to download the Chrome driver binary and place the chromedriver.exe file in your project folder, or your Python PATH. If you have issues getting Selenium and the Chrome driver binary setup, reach out and I can help.


Define Pulling Strip Function

Let’s get to some code.

...
# Function to read a web page and pull a <table> element into a pandas dataframe
def pullCommodityPricing(inputLink):
    # Selenium options
    options = Options()
    options.headless = True
    options.add_argument("--window-size=1920,1200")
    driver = webdriver.Chrome(options=options)
    # Call the Chrome browser driver, and pass it the web page
    driver.get(inputLink)
    # Added a sleep function to 'pause' and let the webpage load
    time.sleep(2)
    # this returns the full HTML source of the web page
    fullPageHTML = driver.page_source
    # shutdown the instance of the Chrome driver
    driver.quit()
    # Start working with the HTML we scraped from the site above
    soup=BeautifulSoup(fullPageHTML,'html.parser')
    # Select the <table> element, which will be holding our data
    webTable=soup.select_one("table")
    table=pd.read_html(str(webTable))
    # return a single data table, a few parts of this function may 
    # need modified based on the input webpage
    return table[0]
...

The “pullCommodityPricing” function shown above is generic, yet extremely useful. Any website containing at least one <table> element can be processed scraped for data.

...
# Function to read a web page and pull a <table> element into a pandas dataframe
def pullCommodityPricing(inputLink):
    # Selenium options
    options = Options()
    options.headless = True
    options.add_argument("--window-size=1920,1200")
    driver = webdriver.Chrome(options=options)
    # Call the Chrome browser driver, and pass it the web page
    driver.get(inputLink)
    # Added a sleep function to 'pause' and let the webpage load
    time.sleep(2)
    # this returns the full HTML source of the web page
    fullPageHTML = driver.page_source
    # shutdown the instance of the Chrome driver
    driver.quit()
    # Start working with the HTML we scraped from the site above
    soup=BeautifulSoup(fullPageHTML,'html.parser')
    # Select the <table> element, which will be holding our data
    webTable=soup.select_one("table")
    table=pd.read_html(str(webTable))
    # return a single data table, a few parts of this function may 
    # need modified based on the input webpage
    return table[0]
...

Generate Production Data

With those two core functions defined, I will start to progress through the actual work of generating decline curves, compiling strip pricing, and additional information necessary for the financial model.

...
# generate a pandas dataframe to hold our time column, assuming the 
# well life from the gas parameters is usable for everything
modelTime = pd.DataFrame({'Month': 
                          np.arange(1, 12*initialParamsGas[5]+1,1)})
# Process each product stream from the well (Gas, Oil, Water)
for product in initialParams:
    # run the modified hyperbolic decline function to generate production rate estimates
    ratePrediction = modified_hyperbolic_decline(
        modelTime['Month']-1,product[0],product[1],product[2], product[3])
    
    # Add the modified ARPs rates to our time dataframe and shift based on the 'flat months' input
    modelTime
] = ratePrediction.shift( product[4],fill_value=product[0]) ...

Similar to Python blog post 2, a dataframe is generated with the timing component for the well. What has evolved since the last post is that we now have a list of products (oil, gas, and water) that also have their own respective lists (of parameters for decline curve generation). To keep my code to a minimum, I use a ‘for loop’ to iterate through the products being produced from our well. Each product is appended to the ‘modelTime’ dataframe for future use.


Append price file to production

...
# List of commodities we want to pull
commodities = [
    'https://www.wsj.com/market-data/quotes/futures/CRUDE%20OIL%20-%20ELECTRONIC/contracts',
    'https://quotes.wsj.com/futures/NATURAL%20GAS/contracts'
    ]
# Grab out commodity futures
for x in commodities:
    # Grab pricing
    Futures = pullCommodityPricing(x)
    # Cut out the row containing 'Front Month'
    Futures = Futures.drop(Futures.index[1])
    # reset the index of the dataframe so that things line up with our modelTime dataframe
    Futures = Futures.reset_index(drop=True)
    # Filter the string to just capture the Month/Year
    Futures['MONTH'] = Futures['MONTH'].str[-8:]
    # Convert our MONTH column to a python recognized Datetime
    Futures['MONTH'] = pd.to_datetime(Futures['MONTH'], format='%b %Y')
    # Append to our original dataframe, modelTime
    if 'oil' in x.lower():
        modelTime['Date'] = Futures['MONTH']
        modelTime['WTI'] = Futures['SETTLEMENT']
    elif 'natural' in x.lower():
        modelTime['HH'] = Futures['SETTLEMENT']
# Fill any blank values (at the end of our dataframe) with the last values of the prices
modelTime = modelTime.fillna(method='ffill')
...

The links to the future strip pricing for both oil and gas are placed into an array called ‘commodities’. Just below that, a for loop is used to iterate through the website links stored in ‘commodities’ and scrape the data we need to move forward. Some of the processing in this section is specific to the Wall Street Journal site, so if you have an alternate data source, you may need to clean the data in a slightly different manner.


Ownership Model

True to the previous work of @fauxright, I lay out the working interest and revenue interest model for expenses and revenue calculations.

...
# == Ownership Section ==
        
# Variable for our revenue model inputs
# [0] = Working Interest
# [1] = Royalty Interest
# [2] = Shrink
# [3] = NGL Yield (per Mcf)
# [4] = Oil Differential
# [5] = Gas Differential
# [6] = NGL Differential (from WTI)
RevenueModel = [1,0.75,0.75,10,0.95,0.8,0.3]
modelTime['NGL'] = modelTime['Gas']*RevenueModel[3]/1000
modelTime['Oil Revenue'] = modelTime['Oil']*RevenueModel[0]*RevenueModel[1]*modelTime['WTI']*RevenueModel[4]
modelTime['Gas Revenue'] = modelTime['Gas']*RevenueModel[0]*RevenueModel[1]*modelTime['HH']*RevenueModel[5]*RevenueModel[2]
modelTime['NGL Revenue'] = modelTime['NGL']*RevenueModel[0]*RevenueModel[1]*modelTime['WTI']*RevenueModel[6]
modelTime['Total Revenue'] = modelTime['Oil Revenue'] + modelTime['Gas Revenue'] + modelTime['NGL Revenue']
# == Expenses Section ==
# $10,000/month until fluid rate <30,000 bbls/month, then $5,000/month
fixedCost = [10000, 5000]
# $0.50/bbl oil, $0.20/mcf gas, $1.00/bbl water
variableCost = [0.5, 0.2, 1]
conditions = [modelTime['Oil']+modelTime['Water'] >= 30000,
              modelTime['Oil']+modelTime['Water'] < 30000]
modelTime['Fixed Cost'] = np.select(conditions, fixedCost)
modelTime['Variable Cost'] = modelTime['Oil']*variableCost[0]+modelTime['Gas']*variableCost[1]+modelTime['Water']*variableCost[2]
modelTime['Total Cost'] = modelTime['Fixed Cost'] + modelTime['Variable Cost']
...

The assumptions here mostly track the Oil & Gas Coding with R (Part 3) post. I utilize a special function in numpy, ‘np.select’, to insert the correct fixed cost through time.


Taxes/Cash Flow

...
# == Taxes Section ==
# Assuming Texas severance taxes, oil well status
# [0] - Texas Oil Severance
# [1] - Texas Gas/NGL Severance
# [2] - Ad Valorem
taxes = [0.046, 0.075, 0.025]
modelTime['Production Tax'] = modelTime['Oil Revenue']*taxes[0] + (modelTime['Gas Revenue'] + modelTime['NGL Revenue'])*taxes[1] + modelTime['Total Revenue']*taxes[2]
# == Cash Flow Section == 
modelTime['Cash Flow'] = modelTime['Total Revenue'] - modelTime['Total Cost'] - modelTime['Production Tax']
...

The tax section mirrors the previous work as well.

A more complex version of this workflow could be constructed for various tax rates based on the location of the well. For example, horizontal gas wells in Texas are eligible for a severance tax deduction. Interestingly, the data is also publicly available and can be converted to actual capex. It can be visualized at our sister site, AFE Leaks. Reach out if you are interested in the dataset of 26,000+ wells in Texas and Louisiana covering plays such as the Delaware, Eagle Ford, Haynesville, and Barnett.

I calculate cash flow by taking the calculated cost and taxes out of the revenue deck.


CAPEX/P&A

...
# == CAPEX/P&A Section ==
# [0] = Drilling Cost
# [1] = Completion Cost
# [2] = P&A Cost
capex = [4000000, 600000, 50000]
spudToProductionDelay = 6
# Create a new Dataframe to shift the timing variables for CAPEX vs production
fullModel = pd.DataFrame(
    {'Month': np.arange(1, 12*initialParamsGas[5]+1+spudToProductionDelay,1)})
# Generate conditions to insert specific CAPEX events
conditions = [fullModel['Month'] == 1, 
              fullModel['Month'] == spudToProductionDelay,
              fullModel['Month'] == initialParamsGas[5]*12]
fullModel['CAPEX'] = np.select(conditions,capex,default=0)
...

Notice here that we are using a new dataframe, ‘fullModel‘, for this portion of the code. I implement CAPEX timing by using the ‘np.select‘ function in a similar fashion to the costs in the ownership section above. I assume the P&A cost occur on the previously declared well life from the decline curve section.

...
# Shift the month column to time out with CAPEX timeline
modelTime['Month'] = modelTime['Month'] + spudToProductionDelay
# Rearrange our columns in case we want to output the data to a file
modelTime = modelTime[['Month', 'Gas', 'Oil', 'NGL', 'Water', 'Date', 'WTI', 'HH', 'Oil Revenue', 'Gas Revenue', 'NGL Revenue', 'Total Revenue', 'Fixed Cost', 'Variable Cost', 'Total Cost', 'Production Tax', 'Cash Flow']]
fullModel = pd.merge(modelTime, fullModel, how='right', on='Month')
fullModel = fullModel.sort_values(by='Month',ignore_index=True)
fullModel = fullModel.fillna(0)
...

Next, we do some housekeeping. The ‘modelTime‘ dataframe is shifted to account for the delay between CAPEX timing and initial production. After that, I merge the dataframes to create one source of truth for all of the work to this point.


Net Present Value

...
# == NPV Section ==
econ = [0.1,1]
fullModel['Present Value'] = (fullModel['Cash Flow']-fullModel['CAPEX'])/(1.0+econ[0])**(fullModel['Month']/12)
fullModel['Cumulative Present Value'] = fullModel['Present Value'].cumsum()
#Export model to CSV
fullModel.to_csv('FullModel.csv')
...
Not another SCOOP/STACK well…

Finally, we have a full data set to generate financial metrics to your hearts desire. The present value and cumulative NPV are appended to the data set to show the methodology. If you want to export all of this work into a CSV file for storage, follow the last line and check the file folder where you are building this script.


Code

# -*- coding: utf-8 -*-
"""
Created on Thu Sep 10 22:25:01 2020
@author: @FracLost
"""
import time
import pandas as pd
import numpy as np
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
# set up the initial decline curve parameters to be used
initialParamsGas =   [ 10000,  0.75,  0.7,   0.1,  2,  30,   'Gas']
initialParamsOil =   [ 25000,   0.6,  0.7,  0.09,  1,  30,   'Oil']
initialParamsWater = [ 22000,     1, 0.75,  0.08,  0,  30, 'Water']
# Create a variable to hold all of our products (Oil, Gas, and Water)
initialParams = []
initialParams.append(initialParamsGas)
initialParams.append(initialParamsOil)
initialParams.append(initialParamsWater)
# Modified Hyperbolic Decline function utilized to generate the decline forecast
def modified_hyperbolic_decline(T,Qi,B,Di,Dlim):
    # Some minor logic here to determine which D factor to use
    usedD = 1/((1/Di)+(B*T))
    usedD = usedD.where(usedD > Dlim,Dlim) 
    return Qi/np.power((1+B*usedD*T),1./B)
# Function to read a web page and pull a <table> element into a pandas dataframe
def pullCommodityPricing(inputLink):
    # Selenium options
    options = Options()
    options.headless = True
    options.add_argument("--window-size=1920,1200")
    driver = webdriver.Chrome(options=options)
    # Call the Chrome browser driver, and pass it the web page
    driver.get(inputLink)
    # Added a sleep function to 'pause' and let the webpage load
    time.sleep(2)
    # this returns the full HTML source of the web page
    fullPageHTML = driver.page_source
    # shutdown the instance of the Chrome driver
    driver.quit()
    # Start working with the HTML we scraped from the site above
    soup=BeautifulSoup(fullPageHTML,'html.parser')
    # Select the <table> element, which will be holding our data
    webTable=soup.select_one("table")
    table=pd.read_html(str(webTable))
    # return a single data table, a few parts of this function may 
    # need modified based on the input webpage
    return table[0]
# generate a pandas dataframe to hold our time column, assuming the 
# well life from the gas parameters is usable for everything
modelTime = pd.DataFrame({'Month': 
                          np.arange(1, 12*initialParamsGas[5]+1,1)})
# Process each product stream from the well (Gas, Oil, Water)
for product in initialParams:
    # run the modified hyperbolic decline function to generate production rate estimates
    ratePrediction = modified_hyperbolic_decline(
        modelTime['Month']-1,product[0],product[1],product[2], product[3])
    
    # Add the modified ARPs rates to our time dataframe and shift based on the 'flat months' input
    modelTime
] = ratePrediction.shift( product[4],fill_value=product[0]) # List of commodities we want to pull commodities = [ 'https://www.wsj.com/market-data/quotes/futures/CRUDE%20OIL%20-%20ELECTRONIC/contracts', 'https://quotes.wsj.com/futures/NATURAL%20GAS/contracts' ] # Grab out commodity futures for x in commodities: # Grab pricing Futures = pullCommodityPricing(x) # Cut out the row containing 'Front Month' Futures = Futures.drop(Futures.index[1]) # reset the index of the dataframe so that things line up with our modelTime dataframe Futures = Futures.reset_index(drop=True) # Filter the string to just capture the Month/Year Futures['MONTH'] = Futures['MONTH'].str[-8:] # Convert our MONTH column to a python recognized Datetime Futures['MONTH'] = pd.to_datetime(Futures['MONTH'], format='%b %Y') # Append to our original dataframe, modelTime if 'oil' in x.lower(): modelTime['Date'] = Futures['MONTH'] modelTime['WTI'] = Futures['SETTLEMENT'] elif 'natural' in x.lower(): modelTime['HH'] = Futures['SETTLEMENT'] # Fill any blank values (at the end of our dataframe) with the last values of the prices modelTime = modelTime.fillna(method='ffill') # == Ownership Section == # Variable for our revenue model inputs # [0] = Working Interest # [1] = Royalty Interest # [2] = Shrink # [3] = NGL Yield (per Mcf) # [4] = Oil Differential # [5] = Gas Differential # [6] = NGL Differential (from WTI) RevenueModel = [1,0.75,0.75,10,0.95,0.8,0.3] modelTime['NGL'] = modelTime['Gas']*RevenueModel[3]/1000 modelTime['Oil Revenue'] = modelTime['Oil']*RevenueModel[0]*RevenueModel[1]*modelTime['WTI']*RevenueModel[4] modelTime['Gas Revenue'] = modelTime['Gas']*RevenueModel[0]*RevenueModel[1]*modelTime['HH']*RevenueModel[5]*RevenueModel[2] modelTime['NGL Revenue'] = modelTime['NGL']*RevenueModel[0]*RevenueModel[1]*modelTime['WTI']*RevenueModel[6] modelTime['Total Revenue'] = modelTime['Oil Revenue'] + modelTime['Gas Revenue'] + modelTime['NGL Revenue'] # == Expenses Section == # $10,000/month until fluid rate <30,000 bbls/month, then $5,000/month fixedCost = [10000, 5000] # $0.50/bbl oil, $0.20/mcf gas, $1.00/bbl water variableCost = [0.5, 0.2, 1] conditions = [modelTime['Oil']+modelTime['Water'] >= 30000, modelTime['Oil']+modelTime['Water'] < 30000] modelTime['Fixed Cost'] = np.select(conditions, fixedCost) modelTime['Variable Cost'] = modelTime['Oil']*variableCost[0]+modelTime['Gas']*variableCost[1]+modelTime['Water']*variableCost[2] modelTime['Total Cost'] = modelTime['Fixed Cost'] + modelTime['Variable Cost'] # == Taxes Section == # Assuming Texas severance taxes, oil well status # [0] - Texas Oil Severance # [1] - Texas Gas/NGL Severance # [2] - Ad Valorem taxes = [0.046, 0.075, 0.025] modelTime['Production Tax'] = modelTime['Oil Revenue']*taxes[0] + (modelTime['Gas Revenue'] + modelTime['NGL Revenue'])*taxes[1] + modelTime['Total Revenue']*taxes[2] # == Cash Flow Section == modelTime['Cash Flow'] = modelTime['Total Revenue'] - modelTime['Total Cost'] - modelTime['Production Tax'] # == CAPEX/P&A Section == # [0] = Drilling Cost # [1] = Completion Cost # [2] = P&A Cost capex = [4000000, 6000000, 50000] spudToProductionDelay = 6 # Create a new Dataframe to shift the timing variables for CAPEX vs production fullModel = pd.DataFrame( {'Month': np.arange(1, 12*initialParamsGas[5]+1+spudToProductionDelay,1)}) # Generate conditions to insert specific CAPEX events conditions = [fullModel['Month'] == 1, fullModel['Month'] == spudToProductionDelay, fullModel['Month'] == initialParamsGas[5]*12] fullModel['CAPEX'] = np.select(conditions,capex,default=0) # Shift the month column to time out with CAPEX timeline modelTime['Month'] = modelTime['Month'] + spudToProductionDelay # Rearrange our columns in case we want to output the data to a file modelTime = modelTime[['Month', 'Gas', 'Oil', 'NGL', 'Water', 'Date', 'WTI', 'HH', 'Oil Revenue', 'Gas Revenue', 'NGL Revenue', 'Total Revenue', 'Fixed Cost', 'Variable Cost', 'Total Cost', 'Production Tax', 'Cash Flow']] fullModel = pd.merge(modelTime, fullModel, how='right', on='Month') fullModel = fullModel.sort_values(by='Month',ignore_index=True) fullModel = fullModel.fillna(0) # == NPV Section == econ = [0.1,1] fullModel['Present Value'] = (fullModel['Cash Flow']-fullModel['CAPEX'])/(1.0+econ[0])**(fullModel['Month']/12) fullModel['Cumulative Present Value'] = fullModel['Present Value'].cumsum() # Generate a chart of Cumulative PV fig, ax = plt.subplots() rects1 = ax.bar(fullModel['Month'], fullModel['Cumulative Present Value']) # Styling of chart ax.set_ylabel('Cumulative PV') ax.set_xlabel('Months') fmt = '${x:,.0f}' tick = mtick.StrMethodFormatter(fmt) ax.yaxis.set_major_formatter(tick) fullModel.to_csv('FullModel.csv')

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: