Shale Executive Compensation – Scraping SEC Filings with R

Inspired by the constant outrage from Energy FinTwit (#EFT) on Twitter and the recent article from The Wall Street Journal discussing shale compensation, I ventured out to see if it was possible to build a scraper for company compensation from SEC filings.

Short answer is yes. Long answer is yes but it was a real pain in the ass. Regardless, I will show you how to get going. Come along as I share my pain.


Companies file a an annual proxy report with the SEC that discusses executive compensation. For the most part, this table looks pretty similar on a company-to-company basis, though it is far from standardized. Some companies (f’ing Chesapeake) post them as pdf tables so you can’t pull the data. However, that is not too common. Usually it sits in an html table that we can grab.

Figure 1: Diamondback 2019 Compensation Summary

The real pain comes from trying to combine multiple years and companies together. However, finding the table isn’t too difficult.

Locating the File

First, I will use some R packages we have used in other tutorials to help locate the file we want.

  • edgarWebR – Searching SEC filings with R
  • dplyr – Data manipulation in R

In this example, we will use WPX as the target operator. The annual proxy statement is filed under DEF 14A. I am also going to only look through filings made for 2017+.


comp.ticker <- 'WPX'

salaryDocs <- edgarWebR::company_filings(comp.ticker, count = 5, type = "DEF 14A")
salaryDocs$year <- as.numeric(substr(salaryDocs$filing_date, 1, 4))-1
salaryDocs <- salaryDocs %>% filter(year >= 2017)
salaryDocs$year <- paste0('YE', salaryDocs$year)
salaryDocs <- salaryDocs %>% filter(!duplicated(year))

I will look at the most recent year.

url1 <- salaryDocs$href[1]

For WPX, this page is here:

This is the index page for the filing. I want to go to this page and find the html file.

DocumentsonEdgar <-  edgarWebR::filing_documents(x =  url1) %>% 
        filter(type == 'DEF 14A') %>%
        filter(!grepl('pdf', document))

So now I have the file located.

Locate the Table

This step is a bit intensive (though is quite easy compared to what I have to do to throw it into a dataframe). I also need some more packages, which allow me to perform basic scraping operations in R.

  • rvest
  • httr

 pg <- read_html(DocumentsonEdgar$href, 
                      encoding = "utf8") %>%


What I do here is just go to my file link and read the node structure, and then filter for table. pg is a list of all nodes that are Tables. I use length(pg) to see how many tables there are. In this case there are 42, but there can be upwards of 200+ based on what I have seen so far. It would be super annoying to go through every one of these tables to find the one I am looking for. Instead, I will just build a loop to do it for me.

To locate the table I am looking for, I need to find some unique aspects of the table I am looking for. What do I know just by visual inspection? The table will probably have at least 8 columns (see Figure 1), and the word Principal or some variation will appear in either the first or second column.

It’s a bit tough to see that from Figure 1, but trust me, I have run this thing a lot. Like 2 days of work looking through various iterations to find the right table. Regardless, let’s use lapply in R to loop through each table and look for those conditions. If the conditions are not satisfied, I will return NULL.

df1 <- data.frame(cols = seq(1, length(pg), 1), ticker = comp.ticker)
data1 <- lapply(split(df1, df1[,'cols']), function (well) tryCatch({
        #Read table from pg and put into a dataframe
        df <- pg[[well$cols]] %>% html_table(fill = F)
        #Look if there are 8 or more columns
        if(length(df) >= 8){
          #Look to see if the word Principal is in the first or second column
          if((TRUE %in% grepl('PRINCIPAL', toupper(df$X1)))|
             (TRUE %in% grepl('PRINCIPAL', toupper(df$X2)))){
          } else {
            df <- NULL
        } else {
          df <- NULL
      error = function(e) {

#Find all tables from list that are not NULL
data1 <- data1[lengths(data1) != 0]

In this case, there is only one table. Sometimes there can be a few, but it is usually the first one.

tableNum <- as.numeric(names(data1[1]))

tableNum is number 26.

To view our data in the console, we can use another package, htmlTable.


htmlTable::htmlTable(pg[tableNum] %>% paste(collapse='\n'))

And this is what will spit out in the RStudio Viewer Pane.

Figure 2: WPX 2019 Compensation Summary


Because it is such a pain, and not consistent enough to be used as a package, I will keep the rest to myself for now. But I will show some analysis based on the results. I searched through 39 different shale-focused US operators, and grabbed all data from 2015-2019.

CentennialDiamondbackNobleSM Energy
ConocoPhillipsLaredoPenn VirginiaExtraction
Table 1: Operator List

Let’s look at some trends.

Total Compensation

Let’s look at the total over time. Appears that total compensation has roughly doubled since 2015. Can’t really say the same for myself. Any of you other guys out there? No? Didn’t think so. Must be all them good profits we’ve been generating.

Figure 3: Total Executive Compensation Over Time, US Shale

Let’s look a bit deeper at 2019 only. If you click on any company, it will zoom into the individual earners.

Figure 4: Total Executive Compensation in 2019, US Shale

A little surprising to see Diamondback this high. However, if you look up at Figure 1, which is the Diamondback 2019 summary, you will see Stice and the CFO Van’t Hof had very large vesting amounts this year. QEP is surprising for a company of that size and being significantly distressed.

Which individuals were domating total compensation?

Figure 5: Total Executive Earners in 2019, US Shale

As mentioned previously, the FANG boys had a good year. Ryan Lance (ConocoPhillips), Charles Stanley (QEP), and Vicki Hollub (OXY) rounded out the top 5. Though they’ll be quick to remind you that their “realizable” income was much less. Let me go out back and shed a tear.


Speaking of the various components, I thought it would be interesting to break down the various components for 2019 salary by company/role. It is a bit tough to standarize all of the columns, so total awards may be a bit off. Salary, Bonus, and Total will be accurate.

Figure 6: Total Executive Earners in 2019 by Position, US Shale

Good to be a Chaiman + CEO, but heck, sign me up for Chairman as that seems like where the real bucks are with the least amount of stress.


That is it for this round. Reading the html tables into dataframe is pretty nerve-wrecking, but as I show in the analysis section it can be done. Check out our other finance-related tutorials as well.

And while you’re at it, give our blog a follow.

1 thought on “Shale Executive Compensation – Scraping SEC Filings with R”

  1. Pingback: Scraping Comp Tables - R Coding Series - Shale Insights

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: