Parsley and Pioneer Maturiting Debt – Scraping SEC Debt Tables in R

As I build out various R scrapers for financial data, I thought I would take a moment to focus on the debt maturity tables in SEC financial filings. And what better place to start than with the Father-Son mash-up of Parsley and Pioneer? So let’s get after it.


Packages

As always, there are a few R packages (mostly tidyverse) I’m using to scrape and manipulate the data. Remember, if they aren’t installed, please type install.packages(‘xxxx‘) into the console, with xxxx being the package name.

  • rvest – HTML scraping library
  • httr – Tools for working with HTTP and URL
  • dplyr – General data manipulation
  • stringr – Manipulating text data
  • readr – Various data tasks in R
  • edgarWebR – Finding SEC data in R. This package may be failing by the way so you’ll need to install via devtools.
devtools::install_github("mwaldstein/edgarWebR")
  • lubridate – Manipulating time data
  • zoo – Used to fill missing data
  • tidyr – Pivot operations in R

Getting the Data

Now I need to use these packages to download a random table from an SEC filing and turn that into a dataframe, which is R’s version of a table. In case I hadn’t mentioned, you can export any dataframe to a csv using:

write.csv(dataframe, 'dataframeName.csv', row.names = F)

Finding the Filing

First, I need to locate the actual HTML file for the filing. I’m actually skipping the XBRL for now as, though the data does usually exist, it’s not always easy to intepret. Let’s start with Pioneer, whose ticker symbol is PXD.

library(dplyr)
library(rvest)
library(httr)
library(stringr)
library(tidyr)
library(readr)
library(lubridate)
library(edgarWebR)
library(zoo)
FilingsonEdgar <- edgarWebR::company_filings(x = 'PXD')  %&gt;%
  filter(grepl('10-', type)) %&gt;%
  filter(!grepl('A', type)) 
DocumentsonEdgar <-  edgarWebR::filing_documents(x = FilingsonEdgar$href[1])
    
DocumentsonEdgar <- DocumentsonEdgar %&gt;% filter(type == '10-K'|type == '10-Q')

FilingsonEdgar will give us a long list of recent quarterly/annual filings. For this workflow, we’ll just work with the first one (DocumentsonEdgar), but know it can be looped through if you’re feeling a bit adventurous.


Locate all Tables

Now, we need to actually use web scraping to find the tables within our document.

#Find tables within html file    
pg <- read_html(DocumentsonEdgar$href[1], 
                encoding = "utf8") %&gt;%
      html_nodes('table')

html_nodes tells us to look for all nodes of table type. What you will typically find if you dig in deeper to HTML is that it is further condensed using tags called td and tr. These tell us the table layout (ie columns and rows). However, HTML was not designed with spreadsheets in mind, so the columns are not usually uniform and are just used for best viewability. Guess they weren’t thinking of us poor web scrapers.


Locate Specific Tables

Now, I am going to build a loop that searches each table, and then looks to see if my desired phrase is in one of the first two columns. In this scenario, I will be searching for any table with the phrase NOTES DUE within it, as most often it’s the debt summary.

df1 <- data.frame(cols = seq(1, length(pg), 1), ticker = 'PXD')
    
    
data1 <- lapply(split(df1, df1[,'cols']), function (well) tryCatch({
      
      df <- pg[[well$cols]] %&gt;% html_table(fill = T)
      
      if(length(df) &gt;= 4 &amp; nrow(df) &gt;= 5){
        
        if((TRUE %in% grepl(toupper('NOTES DUE'), toupper(df$X1)))|
           (TRUE %in% grepl(toupper('NOTES DUE'), toupper(df$X2)))){
          df
        } else {
          df <- NULL
          df
        }
      } else {
        df <- NULL
        df
      }
      
      
      
  },
  error = function(e) {
    e
    NULL
}))
    
    
#Remove Empty Tables    
data1 <- data1[lengths(data1) != 0]
  
#Locate the node of the first table    
node1 <- as.numeric(names(data1)[1])
#Link to that node
tbl <- read_html(DocumentsonEdgar$href[1], 
                     encoding = "utf8") %&gt;%
      html_nodes('table') %&gt;% .[node1]
    tbl <- tbl[[1]]

I build an if-then statement that filters out tables with very few rows/columns, as those are likely not the ones I’m looking for (from experience).


Data Cleanup

I have found that the basic HTML to Table functionality in R doesn’t always do what I want it to do, so I’m going to build a more customized version. This should allow me to be able to more or less recreate this for most companies (though I’ve only tested on PXD/PE so far so might have to do a bit more work).

Basically, I need to loop through each row (tr), and then extract the columns (td), remove empty values, and then reconstruct the table to match what it looks like in the report.

rows1 <- length(tbl %&gt;% html_nodes('tr'))
    
i <- 1
dfx <- data.frame(X1=NA, X2=NA, X3=NA, X4=NA, X5=NA, X6 = NA, X7 = NA, X8 = NA, X9 = NA, X10 = NA)
    
while(i <= rows1){
    
    
    txt1 <- tbl %&gt;% html_nodes('tr') %&gt;% .[i] %&gt;% html_nodes('td') %&gt;%
      html_text()
    
    if(!FALSE %in% (txt1 == '')){
      sum(nchar(txt1)) == 0
    } else {
      txt1 <- data.frame(text = txt1)
      #Condense Extra Spaces to Single Space
      txt1$text <-  gsub("\\s+", " ", txt1$text)
      
      #Get rid of various text that finds its way in
      txt1 <- data.frame(apply(txt1, 2, function(x) gsub("\u200b", "", x)))
      txt1 <- data.frame(apply(txt1, 2, function(x) gsub("(a)", "", x, fixed = TRUE)))
      txt1 <- data.frame(apply(txt1, 2, function(x) gsub("(b)", "", x, fixed = TRUE)))
      txt1 <- data.frame(apply(txt1, 2, function(x) gsub("(c)", "", x, fixed = TRUE)))
      txt1 <- data.frame(apply(txt1, 2, function(x) gsub("(1)", "", x, fixed = TRUE)))
      txt1 <- data.frame(apply(txt1, 2, function(x) gsub("(2)", "", x, fixed = TRUE)))
      txt1 <- data.frame(apply(txt1, 2, function(x) gsub("(3)", "", x, fixed = TRUE)))   
      txt1 <- data.frame(apply(txt1, 2, function(x) gsub("$", "", x, fixed = TRUE)))
      txt1 <- data.frame(apply(txt1, 2, function(x) gsub(",", "", x, fixed = TRUE)))
      txt1 <- data.frame(apply(txt1, 2, function(x) gsub(")", "", x, fixed = TRUE)))
      txt1 <- data.frame(apply(txt1, 2, function(x) gsub("(", "", x, fixed = TRUE)))
      txt1 <- data.frame(apply(txt1, 2, function(x) gsub(":", "", x, fixed = TRUE)))
      
      txt1[txt1 == '('] <- NA
      txt1[txt1 == ')'] <- NA
      
      #Remove Empty Data
      txt1 <- txt1 %&gt;% filter(text != '') %&gt;% filter(text != ' ') %&gt;% filter(!is.na(text))
      #Other quick data cleanup
      txt1 <- txt1 %&gt;% mutate(rows = seq(1, n(), 1))
      rownames(txt1) <- paste0('X', txt1$rows)
      txt1 <- txt1 %&gt;% subset(select = -c(rows))
   
      #Transpose data to a dataframe
      txt1 <- data.frame(t(txt1))
      rownames(txt1) <- i
      #Append to our initially empty dataframe
      dfx <- dfx %&gt;% dplyr::bind_rows(txt1)
    }
      
      
     i <- i + 1 
}
#Remove columns with no data
dfx <- dfx[,colSums(is.na(dfx))<(nrow(dfx)-1)]
#Remove empty rows
dfx <- dfx[rowSums(is.na(dfx)) < ncol(dfx), ]

The rest is general data cleanup to make the data usable for analysis. This can be a bit complicated, but basically I am renaming, repositioning, and filling in missing data. I am also adding a sequencing column so I can replicate the table structure after I’m done. I’d encourage you to go line by line and see what happens at each step to your table.

    #Remove likely end-month day numbers from all columns, as I want to extract yearly data eventually
    dfx <- data.frame(apply(dfx, 2, function(x) gsub(" 31 ", " ", x)))
    dfx <- data.frame(apply(dfx, 2, function(x) gsub(" 30 ", " ", x)))
    
    #See how many rows are filled
    dfx$rows1 <- rowSums(!is.na(dfx))+1
    
    i <- 1
    
    #Loop through and add a new column if it didn't contain full data
    while(i <= nrow(dfx)){
      if(dfx$rows1[i] == ncol(dfx)-1){
        tst1 <- dfx[i,]
        tst1 <- tst1 %&gt;% subset(select = -c(rows1))
        tst1 <- data.frame(col1 = 'category') %&gt;% cbind(tst1)
        names(tst1) <- names(dfx)
        dfx[i,] <- tst1
      }
      i <- i+1
    }
    
   #Customized for debt scraper; empty column is usally the total unsecured notes + credit facility
    dfx <- dfx %&gt;% group_by(X1) %&gt;% mutate(rowCheck = seq(1, n(), 1)) %&gt;% ungroup()
    dfx$X1[dfx$X1 == 'category' &amp; dfx$rowCheck &gt; 1] <- 'Total Notes'
    dfx <- dfx %&gt;% subset(select = -c(rowCheck))
    dfx$rows1 <- rowSums(!is.na(dfx))+1
    
    dfx$category <- NA
    dfx$category[dfx$rows1 == (length(dfx)-2)] <- dfx$X1[dfx$rows1 ==(length(dfx)-2)]
    dfx$category[1][is.na(dfx$category[1])] <- 'period'
    dfx$category <- zoo::na.locf(dfx$category)
    dfx <- dfx %&gt;% filter(category != X1)
    dfx <- dfx %&gt;% relocate(category)
    
    dfx <- dfx %&gt;% subset(select = -c(rows1))
    names(dfx) <- dfx[1,]
    dfx <- dfx[2:nrow(dfx),]
    names(dfx)[1] <- 'category'
    
    dfx[,3:length(dfx)] <- (apply(dfx[,3:length(dfx)], 2, function(x) parse_number(x)))
    dfx[,1:2] <- (apply(dfx[,1:2], 2, function(x) trimws(x)))
    
    dfx[is.na(dfx)] <- 0
    names(dfx)[1:2] <- c('category', 'product')
    dfx <- dfx %&gt;% group_by(product) %&gt;% mutate(count = seq(1,n(),1)) %&gt;% ungroup()
    dfx$seq <- seq(1, nrow(dfx), 1)
    dfx <- dfx %&gt;% gather(dataId, value, -c(category, product, count, seq))
    dfx$dataId <- parse_number(dfx$dataId)
    
    
    dfx$dataId <- paste0('Q',well$qrt,dfx$dataId)
    
    dfx$product[toupper(dfx$product) == 'TOTAL LONG-TERM DEBT'] <- 'LONG-TERM DEBT'
    dfx$product[grepl('CURRENT PORTION', toupper(dfx$product))] <- 'CURRENT DEBT'
    
    dfx$product[grepl('LONG-TERM DEBT', toupper(dfx$product))] <- 'LONG-TERM DEBT'
    #dfx$product[grepl('TOTAL DEBT', dfx$product)] <- 'LONG-TERM DEBT'
    
    dfx$product[grepl('DISCOUNTS', toupper(dfx$product))] <- 'ISSUANCE COSTS DISCOUNT'
    dfx$product[grepl('CREDIT FACILITY', toupper(dfx$product))] <- 'CREDIT FACILITY'
    dfx <- dfx %&gt;% distinct()
    
    dfx <- dfx %&gt;% filter(!duplicated(paste0(product, dataId, value)))
    dfx <- dfx %&gt;% filter(value &gt; 0)
    dfx <- dfx %&gt;% arrange(product, dataId)
    dfx$value <- round(dfx$value, 0)
    dfx <- dfx %&gt;% distinct()
    
    dfx$product <- gsub(' %', '%', dfx$product)
    dfx$product <- gsub('00%', '0%', dfx$product)
    dfx$product <- gsub('0%', '%', dfx$product)

Final Table

To view our final table, we can use tidyr.

dfx %&gt;% select(product, seq, dataId, value) %&gt;%
 spread(dataId, value) %&gt;%
 arrange(seq) %&gt;% subset(select = -c(seq))
productQ22019Q22020
7.5% senior notes due 20204500
3.45% senior notes due 2021500139
3.95% senior notes due 2022600244
0.25% convertible senior notes due 202501323
4.45% senior notes due 2026500500
7.2% senior notes due 2028250241
Total Notes23002447
ISSUANCE COSTS DISCOUNT11254
Total debt22892193
CURRENT DEBT450139
LONG-TERM DEBT18392054
Table 1: PXD Debt Summary Table – Q2 2020

How in the heck can you get away with lending to an O&G company at 0.25%! I mean, sure there’s a convertible but come on!


Analysis

Now that the process is built out, I can run a loop over time and for multiple companies. I’ll save that bit for myself, but I’ll do some quick high level looks at Parsley and Pioneer and see how the mash-up looks.


Pioneer

Pioneer’s debt load hasn’t really moved much over the years, and they don’t really pull down on the RBL very often either. Of course, this really isn’t due to being amazingly profitable; PXD is part of a rich tradition of Midland players that issue A TON of equity (PE/FANG).

Figure 1: Pioneer Debt-Load over Time, US$ Millions

Their debt maturities aren’t too daunting either after doing the 0.25% refinance earlier this year.

Figure 2: Pioneer Maturity Schedule, US$ Millions


Parsley

Parsley, on the other hand, has grown debt-load quite a bit despite the equity issuances. In fact, it’s about in-line with Pioneer’s currently.

Figure 3: Parsley Debt-Load over Time, US$ Millions

Of course, the maturity wall wasn’t too bad for them either, with only the RBL needing a refinance nearish-term. Of course, I notice real quick that 2025, similar to PXD, looked a bit aggressive. Of course none of this matters should the acquisition go through and they refinance the current debt.

Figure 4: Parsley Maturity Schedule, US$ Millions


Combination

How does our father-son combo look as far as maturities go?

Figure 5: Combined PE/PXD Maturity Schedule, US$ Millions

Looks harsh in 2025 as I mentioned previously. But of course with change of control they will refinance so will be interesting to see where it falls out. Still, I’m interested to see where this goes in the future. There is a lot less inventory out in the Permian than people like to discuss, and parent-child is a real risk to the downside. And who knows where prices go (though I’m semi-bullish as there is nowhere near the shale inventory in the US we like to think quote).


Summary

And that’s it for this round. Expect a few more posts over the next week as I dig deeper and deeper into just what you can get out of the filings.

See our other posts on SEC scraping while you’re at it.

And give us a follow.


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: