sec-web-scraper

A Python based web scraper for the SEC EDGAR database

Github Issues

Overview

This library will for scraping certain financial documents from the EDGAR database such as the 10-K (and it's versions such as 10-K405,10-KSB), 20-F and 40-F.

The two main features of the library will be: - A document downloader portion that will fetch documents from the EDGAR database based on parameters such as a text query, time period, company ticker, and file type. - A scraper that will parse sections and information from the retrieved files.

Installation

Please make sure you have Python 3.7 or higher.

You can check your python version with

python --version

Then run the command below!

pip install sec-web-scraper

Usage

# Downloader
from sec_web_scraper.Downloader import Downloader

# Create new downloader object
d = Downloader()

# input the year range for filing data
d.build_index_sec(2000, 2002)


# After you've built the index, see all forms type filed in that period as a list
d.get_forms()

# If you want to find the cik of company, provide the name (fuzzy match). Returns a list
d.get_company_info('apple')

# If you want all 8-K's filled in the range above.This is a DataFrame
res = d.find_files_by_type('8-K') 

#More features to be added!

#Scraper
from sec_web_scraper.Scraper import *

#With a particular filing
sample_10k = "https://www.sec.gov/Archives/edgar/data/20/0000893220-96-000500.txt"

#Get the raw text
raw_txt = get_document_given_link(sample_10k)

#Get the sections in the document
doc_tags = get_document_tags(raw_txt)

#More features to be added!

References

Python project template from https://github.com/ColumbiaOSS/example-project-python