Web Scraping for Stock Market Data

03 December 2020 / 1 min read

Web Scraping for Stock Market Data

Item: Web Scraping for Stock Market Data
Rating: 7
Author: Rachael Chapman

Web Scraping Explained

Scrape Yahoo Finance and Stock Market Data Using Python

Data Scraping in Real-Time

The Benefits of Stock Market Data Scraping to Businesses

Limitations of Stock Market Scraping

Requirements for Stock Market Data Scraping

Analyzing the Stock Market Using Python

By Rachael Chapman

In Stock Market Data,

4 years ago

1 min read

Add comment

The COVID-19 pandemic has proven beyond doubts that the stock market is just as volatile as any other business industry. It can crash in a second, and can also skyrocket at the flick of your fingers. Stocks are cheaper at this point due to the crisis brought about by the pandemic, and a lot of people are interested in stock market data to help with informed choices.

Unlike the general web scraping, scraping for stock market data is more specific and only useful to those interested in stock market investment.

Post Quick Links

Jump straight to the section of the post you want to read:

Web Scraping Explained
Scrape Yahoo Finance and Stock Market Data Using Python
Data Scraping in Real-Time
The Benefits of Stock Market Data Scraping to Businesses
Limitations of Stock Market Scraping
Requirements for Stock Market Data Scraping
Analyzing the Stock Market Using Python

Web Scraping Explained

Web scraping involves extracting as much data as possible from a preset index of target websites or other sources. Companies utilize scraping for decision making and planning strategies as it gives accurate and viable information on the topic.

It's usual to hear web scraping being mostly associated with commercial and marketing companies, but they are not the only ones that benefit from the process as everyone stands to gain from scraping stock market data. Investors are the ones who stand to benefit the most as the data benefits them in the following ways:

Real-time data
Price prediction
Stock market trends
Possibilities for investment
Price changes

Just as with web scraping for other data, stock market data scraping isn’t the easiest task to perform but yields valuable results if done right. Investors would be provided with insights on various interesting parameters that would be relevant to making the best and smartest choices.

Scrape Yahoo Finance and Stock Market Data Using Python

You’ll first need to install Python 3 for Windows, Mac, and Linux. Then install the following packages to enable download and parsing of the HTML data: pip for package installation, Python request package for sending requests and downloading the HTML content of the target page, and then Python LXMLto parse with Xpaths.

Python 3 Code For Data Extraction From Yahoo Finance

from lxml import html

import requests

import json

import argparse

from collections import OrderedDict

def get_headers():

\ return {"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",

\ "accept-encoding": "gzip, deflate, br",

\ "accept-language": "en-GB,en;q=0.9,en-US;q=0.8,ml;q=0.7",

\ "cache-control": "max-age=0",

\ "dnt": "1",

\ "sec-fetch-dest": "document",

\ "sec-fetch-mode": "navigate",

\ "sec-fetch-site": "none",

\ "sec-fetch-user": "?1",

\ "upgrade-insecure-requests": "1",

\ "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36"}

def parse(ticker):

\ url = "http://finance.yahoo.com/quote/%s?p=%s" % (ticker, ticker)

\ response = requests.get(

\ url, verify=False, headers=get_headers(), timeout=30)

\ print("Parsing %s" % (url))

\ parser = html.fromstring(response.text)

\ summary_table = parser.xpath(

\ '//div[contains(@data-test,"summary-table")]//tr')

\ summary_data = OrderedDict()

\ other_details_json_link = "https://query2.finance.yahoo.com/v10/finance/quoteSummary/{0}?formatted=true&lang=en-US®ion=US&modules=summaryProfile%2CfinancialData%2CrecommendationTrend%2CupgradeDowngradeHistory%2Cearnings%2CdefaultKeyStatistics%2CcalendarEvents&corsDomain=finance.yahoo.com".format(

\ ticker)

\ summary_json_response = requests.get(other_details_json_link)

\ try:

\ json_loaded_summary = json.loads(summary_json_response.text)

\ summary = json_loaded_summary["quoteSummary"]["result"][0]

\ y_Target_Est = summary["financialData"]["targetMeanPrice"]['raw']

\ earnings_list = summary["calendarEvents"]['earnings']

\ eps = summary["defaultKeyStatistics"]["trailingEps"]['raw']

\ datelist = []

\ for i in earnings_list['earningsDate']:

\ datelist.append(i['fmt'])

\ earnings_date = ' to '.join(datelist)

\ for table_data in summary_table:

\ raw_table_key = table_data.xpath(

\ './/td[1]//text()')

\ raw_table_value = table_data.xpath(

\ './/td[2]//text()')

\ table_key = ''.join(raw_table_key).strip()

\ table_value = ''.join(raw_table_value).strip()

\ summary_data.update({table_key: table_value})

\ summary_data.update({'1y Target Est': y_Target_Est, 'EPS (TTM)': eps,

\ 'Earnings Date': earnings_date, 'ticker': ticker,

\ 'url': url})

\ return summary_data

\ except ValueError:

\ print("Failed to parse json response")

\ return {"error": "Failed to parse json response"}

\ except:

\ return {"error": "Unhandled Error"}

if __name__ == "__main__":

\ argparser = argparse.ArgumentParser()

\ argparser.add_argument('ticker', help='')

\ args = argparser.parse_args()

\ ticker = args.ticker

\ print("Fetching data for %s" % (ticker))

\ scraped_data = parse(ticker)

\ print("Writing data to output file")

\ with open('%s-summary.json' % (ticker), 'w') as fp:

\ json.dump(scraped_data, fp, indent=4)

Data Scraping in Real-Time

Since the stock market is in constant up and down movements, it's best to use a scraper that extracts data in real-time. All the processes of web scraping would be carried out in real-time with a real-time scraper so that whatever data you have would be viable then, allowing for the best and most accurate decisions and to be made.

Interesting Read : Building Your Own Yellow Pages Scraper

Real-time web scrapers are more expensive than the slower ones but are the best choices for investment firms and businesses that depend on accurate data in a market as volatile as stocks.

The Benefits of Stock Market Data Scraping to Businesses

All businesses can benefit from web scraping in one form or another especially for data such as economic trends, user data, and then the stock market. Before investment firms go into investing in a particular stock, they make use of web scraping tools and analyze the extracted data to guide their decisions.

Stock market investment isn’t usually considered safe to say the least as it’s very volatile and prone to change. Each of the volatile variables involved in stock investment plays a huge role in the value of stocks and stock investment is only considered safe to an extent when all these volatile variable have been analyzed over time and studied

To accumulate as much data as would be necessary, you need to practice stock market data scraping. This implies that much data would have to be gathered from the stock market using a stock market scraping bot.

The software will first collect all the information that is valuable to your cause, and then parse it so it can be studied and analyzed for smart decision making.

Sources of Stock Market Data

Professionals have different APIs they use to their advantage when collecting stock market data from the web. Google Finance was the real deal back in the day but since 2012 its use has seen a decline.

One of the most popular options you can use is Yahoo Finance. Their API has been on and off over the years as it has depreciated and been revived from time to time. There are other companies whose APIs you can also use if Yahoo Finance doesn’t suit your project perfectly.

Limitations of Stock Market Scraping

Web scraping isn’t as straightforward as it may sound, and involves different steps and processes that need accuracy and timely executions to extract accurate and viable data. Most times these processes would be met with preventive measures** **that are put in place to prevent web scraping.

So most big companies choose to make their tools to overcome the obstacles to a seamless flow of web scraping processes. One of the most common issues with web scraping is blocked IP. Once an IP address is blocked, the web scraper won’t have access to the directory and there would be no extracted information.

Most of the limitations to web scraping can be avoided by programming the stock market data scraper uniquely and then using proxies. Even though it is impossible to avoid most of the restrictions on web scraping, a unique tool helps.

Requirements for Stock Market Data Scraping

Businesses and investment firms that are interested in stock market investment will need to use associated tools to obtain the necessary data for informed decision making.

Data scraping isn’t a straightforward process as you may have thought, and it needs different tools in data collection, removal of variables and redundancies, and also to provide useful and viable data.

The first tool companies would need to consider is a web crawler. It enables the scraping of stock data from the stock market for analysis. You can get specialized tools to scrape the stock market but it will require added investments which can be quite expensive, depending on the size of the project.

Another requirement for data harvesting is the prerequisite data source. This includes indexes of data and are made up of stock market websites that would be scraped by the web scraper for all types of necessary data. Once the data is collected through an index, it will be analyzed and processed to take out redundancies.

Interesting Read : What You Need To Know Now About Encryption

Most high-end data scraping tools include this process, but it’s not difficult to build a data parser to serve the function. By analyzing and refining the redundancies from the data, what will be left is the useful data. Such data would then be further analyzed using industry-specific software for precise results.

The precise results are then used to make decisions on the particular investment they are specific to. All these processes can be carried out on a single high-end web scraper, stock market-specific software, and a few data analysts.

Analyzing the Stock Market Using Python

Jupyter notebook would be used in the course of this tutorial and you can get it on GitHub.

The Setup Process

You will begin by installing jupyter notebooks as you install Anaconda
In addition to anaconda, also install other Python packages like beautifulsoup4, dill, and fastnumbers
Add the following imports to Python 3 jupyter notebooks

import numpy as np # linear algebra

import pandas as pd # pandas for dataframe based data processing and CSV file I/O

import requests # for http requests

from bs4 import BeautifulSoup # for html parsing and scraping

import bs4

from fastnumbers import isfloat

from fastnumbers import fast_float

from multiprocessing.dummy import Pool as ThreadPool

import matplotlib.pyplot as plt

import seaborn as sns

import json

from tidylib import tidy_document # for tidying incorrect html

sns.set_style('whitegrid')

%matplotlib inline

from IPython.core.interactiveshell import InteractiveShell

InteractiveShell.ast_node_interactivity = "all"

What You Will Need to Scrape the Required Data

Remove excess spaces between string

Some strings from web pages come with multiple spaces between the words. You can remove this with the following:

def remove_multiple_spaces(string):

\ if type(string)==str:

\ return ' '.join(string.split())

\ return string

Conversion of Strings to Float

In web pages, you can find symbols mixed with numbers. You can either remove the symbols before converting, or you use the following function:

def ffloat_list(string_list):

\ return list(map(ffloat,string_list))

Send HTTP Requests in Python

Before you make a HTTP request, you need to have the URL of the target website. Make the request using requests.get, use response.status_code to get HTTP status, and use response.content to get the page content.

Extract And Parse JSON Content From a Page

Extract json content from a page using response.json(). Double check with response.status_code.

Scrape and Parse HTML Data

For this we would utilize beautifulsoup4 parsing library.

Use Jupyter Notebook to Render HTML Strings

Use the following function:

from IPython.core.display import HTML

HTML("Rendered HTML")

Get the Content Position Using Chrome Inspector

You’ll first need to know the HTML location of the content you want to extract before you proceed. Inspect the page using chrome inspector for Mac using the function cmd+option+I and inspect for Linux with the function Control+Shift+I.

Parse the Content and Display it With BeautifulSoup4

Parse the content using the function BeautifulSoup and then get the content from header 1 tag and render it.

A scraper tool is essential for investment firms and businesses that want to go into buying stocks and making investments in the stock market. This is because accurate and viable data is needed to make the best decisions and they can only be gotten by extracting and analyzing stock market data.

There are a lot of limitations to scraping for this data, but you will have great success chances if you use a specially designed tool for your company. You will also stand a better chance if you make use of the best IPs such as the dedicated IP addresses by Limeproxies.