Kodak Insider Transactions – Scraping SEC data with R

In this insight, I show how to use R to build a quick scraper for insider transactions from the SEC, using the Eastman Kodak Company (Kodak) as an example. Why focus on Kodak? Well, they famously attempted to transition from photographs to pharmaceuticals to take advantage of investor herd behavior. We’ll use the SEC filings to see if there was any funny business going on.

Figure 1: Eastman Kodak Closing Share Price

I will also show some quick functionality using the tidyquant package to pull share prices for various companies.


Tidyquant

First, we will dig deeper into the tidyquant package in R, which helps you do various financial functions. It is also intended to be compliant with the broader tidyverse, which we use extensively in these tutorials. I am going to pull the share prices from the broader set of companies we used in the last financial insight, Impairments in Shale, and then compare their return so far in 2020.

I am also going to use a few more packages to help with my plots.

#If not installed
install.packages('tidyquant')

library(tidyquant)
library(tidyverse)
library(highcharter)

opList1 <- data.frame(operators = c('APA',  'AR',  'CDEV', 'CHAP',
                                    'CHK', 'CLR',
                                    'CNX', 'COG', 'COP', 'CPE',  
                                    'CRK', 'CXO',  'DVN', 'EOG',
                                    'EQT',  'FANG', 'GPOR', 'GDP',
                                    'HES', 'HPR',  'LPI', 'MGY', 'MR',
                                    'MRO', 'MTDR', 'MUR', 'NBL', 
                                    'OAS', 'OXY', 'OVV', 'PDCE',
                                    'PE', 'PVAC', 'PXD', 'QEP', 
                                    'RRC',  'SBOW', 'SM', 'SWN', 
                                    'WLL', 'WPX', 'XEC', 'XOG'))

stonks2020 <- tq_get(opList1$operators, get="stock.prices", from = "2020-01-01")

I am also going to adjust CHK‘s share price to reflect the 200:1 stock split done earlier this year. Then I will create a new column called change to look at the change in stock price over the year, and then find the year to date change in a new dataframe called totalChange.

stonks2020$close[stonks2020$close < 1 &amp; stonks2020$symbol == 'CHK'] <- stonks2020$close[stonks2020$close < 1 &amp; stonks2020$symbol == 'CHK']*200

stonks2020 <- stonks2020 %>% group_by(symbol) %>%
  mutate(change = (close-close[1])/close[1]) %>% ungroup()

totalChange <- stonks2020 %>% group_by(symbol) %>%
  filter(date == max(date)) %>% ungroup() %>%
  filter(symbol != 'WLL') %>% arrange(desc(change))

Advanced Plotting

I am taking a cue from the creator of the highcharter package to create a column plot with a graph that will popup when you scroll across an individual column. The main column is plotting the annual change in share price, while the popup will just show share price over time. This is a bit complicated, so go through it line by line. I’m not going to explain everything here, but if you’re interested in getting more clarification feel free to contact me.

#Create custom color scale
library(RColorBrewer)
n <- nrow(totalChange)
qual_col_pals = brewer.pal.info[brewer.pal.info$category == 'qual',]
cols = unlist(mapply(brewer.pal, qual_col_pals$maxcolors, rownames(qual_col_pals)))

#Create nested data (stock price over time) and add to totalChange
donutdata2 <- stonks2020 %>% filter(symbol != 'WLL') %>%
  group_by(symbol) %>%
  mutate(day = seq(1, n(), 1)) %>%
  ungroup() %>%
  select(symbol, day, close) %>% 
  nest(-symbol) %>% 
  mutate(
    data = map(data, mutate_mapping, hcaes(x = day, y = close), drop = TRUE),
    data = map(data, list_parse)
  ) %>%
  rename(ttdata = data) %>% 
  left_join(totalChange) %>%
  arrange(desc(change))

#Add color scale
donutdata2$cols1 <- cols

donutdata2 <- donutdata2 %>% filter(!duplicated(symbol))
donutdata2$change <- round(donutdata2$change*100,1)

#Plot
hchart(
  donutdata2,
  "columnrange",
  hcaes(x = symbol, low = 0, high = change, color = cols1), showInLegend = F) %>% 
  hc_colors(cols) %>% 
  hc_title(text = '2020 Stock Price Change By Company', align = 'left', 
           style = list(fontFamily = 'Arial', color = 'black')) %>%
  hc_subtitle(text = 'Source: Yahoo Finance/TidyQuant', align = 'left', 
              style = list(fontFamily = 'Arial')) %>%
  hc_xAxis(title = list(text = ''), categories = donutdata2$symbol,
           labels = list(rotation = 270, style = list(fontFamily = 'Arial', color = 'black', fontSize = '14px'))) %>%
  hc_tooltip(
    useHTML = TRUE,
    headerFormat = "<center><b>{point.x}</b></center>",
    pointFormatter = tooltip_chart(
      accesor = "ttdata",
      hc_opts = list(
        chart = list(type = "spline", name = 'Price'),
        credits = list(enabled = FALSE)
      ),
      height = 150,
      width = 250
    ),
    positioner = JS(
      "function () {
      
        /* one of the most important parts! */
        xp =  this.chart.chartWidth - this.label.width*3.25
        yp =  this.chart.chartHeight - this.label.height*1.4
      
        return { x: xp, y: yp };
      
      }"),
    shadow = T,
    borderWidth = 1,
    backgroundColor = 'white',
    hideDelay = 1000
  )%>%
  hc_credits(enabled = TRUE, text = 'Powered by Highcharts', position = list(align = 'right'),
             href = "https://www.highcharts.com/") %>%
  hc_size(width = 1000, height = 700) %>%
  hc_yAxis(gridLineColor = 'transparent',
           title = list(text = 'Percent Change', 
                        style = list(fontFamily = 'Arial', color = 'black', fontSize = '18px')),
           labels = list(format='{value}%', 
                         style = list(fontFamily = 'Arial', color = 'black', fontSize = '14px')))

Figure 2: Shale Operator 2020 Price Performance


Insider Transactions

Now let’s get to the good part. I’m not aware of a specific program for Insider Transactions in R, so we can just build one ourselves. I will also turn it into a function that can just be run every time you have a stock ticker to investigate.

In particular, we are looking for SEC Form 4, which is where these transactions are recorded. If you perform an company search for say, KODK, you will get to a page with various financial filings. To find Form 4, we need to search for Filing Type of 4, Ownership? of only, and show 100 Results per Page.

SEC Search for Form 4 for Eastman Kodak

To replicate in R, we can copy the url:

and use our various packages to do this as operator specific. Based on visual inspection, all we really need is the CIK value, which we can get from the edgarWebR package.


Replicate the URL

For this we will need two packages, edgarWebR and glue.

library(edgarWebR)
library(glue)

comp.ticker <- 'KODK'

#Find CIK Value
compInfo <- edgarWebR::company_details(comp.ticker)$info
 
#Replicate our URL
insiders <- glue::glue('https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&amp;CIK={compInfo$cik}&amp;type=&amp;dateb=&amp;owner=only&amp;count=100&amp;search_text=')

Now every time we change the stock ticker, it will update our search string automatically.


Find Links

The next step is to extract all of the individual links for each filing. We will need to use the httr and rvest packages for this step.

library(httr)
library(rvest)

pg <- (read_html(insiders) %>% html_nodes('table') %>% .[3] %>% html_table())[[1]]
  
links <- read_html(insiders)
  
links <- data.frame(links = (html_attr(html_nodes(links, "a"), "href")), ticker = comp.ticker) %>%
    filter(grepl('Archives', links))
  
pg$href <- links$links
pg$href <- paste0('https://www.sec.gov/', pg$href)
pg$`Filing Date` <- as.Date(pg$`Filing Date`)
pg <- pg %>% filter(Filings == '4')
  
names(pg) <- c('type', 'format', 'description', 'filing_date', 'file_number', 'href')

Extract Data

Now we need to go to each page, find the xml file, and then use the XML and methods packages to extract the data to a dataframe. I will use lapply to loop through each url we found in the previous step. I will also use the data.table package to combine the data together.

data1 <- lapply(split(pg, pg[,'href']), function (well) tryCatch({
    
    DocumentsonEdgar <-  edgarWebR::filing_documents(x =  well$href[1]) %>%
      filter(grepl('xml', document))
    
  
    
    # Convert the input xml file to a data frame. 
    downloader::download(DocumentsonEdgar$href[1], 'file1.xml')
    
    doc <- xmlParse('file1.xml')
    
    issuer <- xmlToDataFrame(getNodeSet(doc, "//issuer"))
    owner <- xmlToDataFrame(getNodeSet(doc, "//reportingOwnerId"))[1,]
    
    
    trans <-  xmlToDataFrame(getNodeSet(doc, "///securityTitle"))
    names(trans) <- 'shareType'
    transDate <- xmlToDataFrame(getNodeSet(doc, "///transactionDate"))
    names(transDate) <- 'date'
    shares <- xmlToDataFrame(getNodeSet(doc, "///transactionAmounts"))
    
    ownership <- xmlToDataFrame(getNodeSet(doc, "//postTransactionAmounts"))
    names(ownership) <- 'totalSharesPost'
    
    
    footnotes <- paste((t(xmlToDataFrame(getNodeSet(doc, "//footnotes")))), collapse = '<br>')
    
    
    transaction <- cbind(owner, issuer, trans, transDate, shares, ownership, footnotes)
    transaction$date <- as.Date(transaction$date)
    transaction$transactionShares <- as.numeric(transaction$transactionShares)
    transaction$transactionPricePerShare <- as.numeric(transaction$transactionPricePerShare)
    transaction$totalSharesPost <- as.numeric(transaction$totalSharesPost)
    unlink('file1.xml')
    transaction
  
 },
 error = function(e) {
   e
   NULL
}))

transactions <- data.table::rbindlist(data1, fill = TRUE) %>% data.frame()

What I do here is pull individual lines from the xml file and put it into a dataframe so we can use it later.


Putting it all together

Now I can combine it all into a function to call for any operator.

insiderPull <- function(ticker) {

  
  library(rvest)
  library(httr)
  library(edgarWebR)
  library(XML) 
  library(methods) 
  library(dplyr)
  library(glue)
  comp.ticker <- ticker
  
  compInfo <- edgarWebR::company_details(comp.ticker)$info
  insiders <- glue::glue('https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&amp;CIK={compInfo$cik}&amp;type=&amp;dateb=&amp;owner=only&amp;count=100&amp;search_text=')
  
  
  
  pg <- (read_html(insiders) %>% html_nodes('table') %>% .[3] %>% html_table())[[1]]
  
  links <- read_html(insiders)
  
  links <- data.frame(links = (html_attr(html_nodes(links, "a"), "href")), ticker = comp.ticker) %>%
    filter(grepl('Archives', links))
  
  pg$href <- links$links
  pg$href <- paste0('https://www.sec.gov/', pg$href)
  pg$`Filing Date` <- as.Date(pg$`Filing Date`)
  pg <- pg %>% filter(Filings == '4')
  
  names(pg) <- c('type', 'format', 'description', 'filing_date', 'file_number', 'href')

  data1 <- lapply(split(pg, pg[,'href']), function (well) tryCatch({
    
    DocumentsonEdgar <-  edgarWebR::filing_documents(x =  well$href[1]) %>%
      filter(grepl('xml', document))
    
  
    
    # Convert the input xml file to a data frame. 
    downloader::download(DocumentsonEdgar$href[1], 'file1.xml')
    
    doc <- xmlParse('file1.xml')
    
    issuer <- xmlToDataFrame(getNodeSet(doc, "//issuer"))
    owner <- xmlToDataFrame(getNodeSet(doc, "//reportingOwnerId"))[1,]
    
    
    trans <-  xmlToDataFrame(getNodeSet(doc, "///securityTitle"))
    names(trans) <- 'shareType'
    transDate <- xmlToDataFrame(getNodeSet(doc, "///transactionDate"))
    names(transDate) <- 'date'
    shares <- xmlToDataFrame(getNodeSet(doc, "///transactionAmounts"))
    
    ownership <- xmlToDataFrame(getNodeSet(doc, "//postTransactionAmounts"))
    names(ownership) <- 'totalSharesPost'
    
    
    footnotes <- paste((t(xmlToDataFrame(getNodeSet(doc, "//footnotes")))), collapse = '<br>')
    
    
    transaction <- cbind(owner, issuer, trans, transDate, shares, ownership, footnotes)
    transaction$date <- as.Date(transaction$date)
    transaction$transactionShares <- as.numeric(transaction$transactionShares)
    transaction$transactionPricePerShare <- as.numeric(transaction$transactionPricePerShare)
    transaction$totalSharesPost <- as.numeric(transaction$totalSharesPost)
    unlink('file1.xml')
    transaction
  
  },
  error = function(e) {
    e
    NULL
  }))

  transactions <- data.table::rbindlist(data1, fill = TRUE) %>% data.frame()

  return(transactions)

}

I will call my function for KODK now.

transactions <- insiderPull('KODK')

Investigate the Data

Now I will combine both sections together to look at the share price over time, and then plot both insider purchases or sales/grants at the same time.

trans1 <- transactions %>% mutate(transactionPricePerShare = 
                                    replace(transactionPricePerShare,transactionPricePerShare ==0, NA)) %>%
  group_by(date, type = transactionAcquiredDisposedCode) %>%
  summarise(shares = sum(transactionShares, na.rm=TRUE), price = mean(transactionPricePerShare, na.rm=TRUE)) %>% ungroup() %>%
  tidyr::spread(type, shares)
trans1[is.na(trans1)] <- 0

kodak <- tq_get('KODK', get="stock.prices", from = "2020-01-01")

highchart() %>% 
  hc_colors(c('blue', 'green', 'red')) %>%
  hc_title(text = "Kodak Insider Transactions", align = 'left', style = list(fontFamily = 'Arial')) %>% 
  hc_subtitle(text = "Source: Yahoo Finance, SEC, TidyQuant",align = 'left', style = list(fontFamily = 'Arial')) %>% 
  hc_add_series(kodak, type = 'spline', id = "kodak", hcaes(x = date, y = close), name = 'Stock Price',
                marker = list(enabled = F)) %>%
  hc_xAxis(type = 'datetime') %>%
  hc_add_series(trans1 %>% filter(A > 0) %>% filter(year(date) >= 2020),
                type = 'column', hcaes(x= date, y = A), 
                name = 'Share Purchase/Awards', yAxis = 1) %>%
  hc_add_series(trans1 %>% filter(D > 0) %>% filter(year(date) >= 2020),
                type = 'column', hcaes(x= date, y = D*-1), 
                name = 'Share Sales', yAxis = 1) %>%
  hc_yAxis_multiples(list(title = list('Stock Price')),
                     list(title = list(text = 'Shares Purchased (Sold)'), opposite = TRUE)) %>%
  hc_rangeSelector(enabled = TRUE, selected = 1) %>%
  hc_scrollbar(enabled = TRUE)

Figure 3: Kodak Share Price and Insider Transactions

Definitely looks a little fishy. If I zoom in to July, I can see that in the days before the announcement, there were plenty of stock acquisitions by the CEO and a board member. But don’t you worry, an independent review cleared them.


Summary

That is it for this round. Feel free to investigate the data after you download it and see what insights you can make. And please follow my other insights on scraping financial data.

And while you’re at it, give me a follow.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: