An Interactive GPT Index and DeepLake Interface – 1. Amazon Financial Statements

  • One of the key benefits of NLP for finance and banking professionals is the ability to process large amounts of data in a short amount of time.
  • For example, business analysts can use ChatGPT to quickly answer queries, summarize complex financial reports, and analyze trends based on the data provided.
  • In this post, we will implement an interactive ChatGPT query interface for Amazon financial statements. Specifically, we will explore the benefits of using LlamaIndex and DeepLake to query financial data [1-6].

Let’s set the working directory YOURPATH

import os
os.chdir(‘YOURPATH’)
os. getcwd()

and install the key libraries

!pip install llama-index

!pip install deeplake

Let’s import the libraries

from llama_index import (
SimpleDirectoryReader,
GPTDeepLakeIndex,
GPTSimpleKeywordTableIndex,
Document,
LLMPredictor,
ServiceContext,
download_loader,
)
from langchain.chat_models import ChatOpenAI
from typing import List, Optional, Tuple
import requests
import tqdm
import os
from pathlib import Path

Let’s define the PDF file reader

PDFReader = download_loader(“PDFReader”)

loader = PDFReader()

and prepare the list of Amazon financial statements

urls = [‘https://s2.q4cdn.com/299287126/files/doc_financials/Q1_2018_-_8-K_Press_Release_FILED.pdf’,
https://s2.q4cdn.com/299287126/files/doc_financials/Q2_2018_Earnings_Release.pdf’,
https://s2.q4cdn.com/299287126/files/doc_news/archive/Q318-Amazon-Earnings-Press-Release.pdf’,
https://s2.q4cdn.com/299287126/files/doc_news/archive/AMAZON.COM-ANNOUNCES-FOURTH-QUARTER-SALES-UP-20-TO-$72.4-BILLION.pdf’,
https://s2.q4cdn.com/299287126/files/doc_financials/Q119_Amazon_Earnings_Press_Release_FINAL.pdf’,
https://s2.q4cdn.com/299287126/files/doc_news/archive/Amazon-Q2-2019-Earnings-Release.pdf’,
https://s2.q4cdn.com/299287126/files/doc_news/archive/Q3-2019-Amazon-Financial-Results.pdf’,
https://s2.q4cdn.com/299287126/files/doc_news/archive/Amazon-Q4-2019-Earnings-Release.pdf’,
https://s2.q4cdn.com/299287126/files/doc_financials/2020/Q1/AMZN-Q1-2020-Earnings-Release.pdf’,
https://s2.q4cdn.com/299287126/files/doc_financials/2020/q2/Q2-2020-Amazon-Earnings-Release.pdf’,
https://s2.q4cdn.com/299287126/files/doc_financials/2020/q4/Amazon-Q4-2020-Earnings-Release.pdf’,
https://s2.q4cdn.com/299287126/files/doc_financials/2021/q1/Amazon-Q1-2021-Earnings-Release.pdf’,
https://s2.q4cdn.com/299287126/files/doc_financials/2021/q2/AMZN-Q2-2021-Earnings-Release.pdf’,
https://s2.q4cdn.com/299287126/files/doc_financials/2021/q3/Q3-2021-Earnings-Release.pdf’,
https://s2.q4cdn.com/299287126/files/doc_financials/2021/q4/business_and_financial_update.pdf’,
https://s2.q4cdn.com/299287126/files/doc_financials/2022/q1/Q1-2022-Amazon-Earnings-Release.pdf’,
https://s2.q4cdn.com/299287126/files/doc_financials/2022/q2/Q2-2022-Amazon-Earnings-Release.pdf’,
https://s2.q4cdn.com/299287126/files/doc_financials/2022/q3/Q3-2022-Amazon-Earnings-Release.pdf’,
https://s2.q4cdn.com/299287126/files/doc_financials/2022/q4/Q4-2022-Amazon-Earnings-Release.pdf’
]

years = [
2018, 2018, 2018, 2018,
2019, 2019, 2019, 2019,
2020, 2020, 2020,
2021, 2021, 2021, 2021,
2022, 2022, 2022, 2022
]
months = [
1, 4, 7, 10,
1, 4, 7, 10,
1, 4, 10,
1, 4, 7, 10,
1, 4, 7, 10
]

Let’s download the reports

zipped_data = list(zip(urls, months, years))

def download_reports(data: List[Tuple[str, int, int]], out_dir: Optional[str] = None) -> List[Document]:
“””Download pages from a list of urls.”””
docs = []
out_dir = Path(out_dir or “.”)
if not out_dir.exists():
print(out_dir)
os.makedirs(out_dir)

for url, month, year in tqdm.tqdm(data):
path_base = url.split(‘/’)[-1]
out_path = out_dir / path_base
if not out_path.exists():
r = requests.get(url)
with open(out_path, ‘wb’) as f:
f.write(r.content)
doc = loader.load_data(file=Path(out_path))[0]

   date_str = f"{month:02d}" + "-01-" + str(year)
   doc.extra_info = {"Date": date_str}

   docs.append(doc)

return docs

def _get_quarter_from_month(month: int) -> str:
mapping = {
1: “Q1”,
4: “Q2”,
7: “Q3”,
10: “Q4”
}
return mapping[month]

docs = download_reports(zipped_data, ‘data’)

Let’s define the OpenAI key

os.environ[“OPENAI_API_KEY”] = “your_openai_api_key”

Let’s call LLMPredictor and ServiceContext

from llama_index import GPTTreeIndex, MockLLMPredictor, SimpleDirectoryReader, ServiceContext

llm_predictor = LLMPredictor(llm=ChatOpenAI(temperature=0, model_name=”gpt-3.5-turbo”))
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)

Let’s build the vector index for each quarterly statement, store results in dictionary
dataset_root = ‘../data’
vector_indices = {}
for idx, (_, month, year) in enumerate(zipped_data):
doc = docs[idx]

dataset_path = dataset_root + f”{month:02d}_{year}”
vector_index = GPTDeepLakeIndex.from_documents([doc], dataset_path=dataset_path, overwrite=True)
vector_indices[(month, year)] = vector_index

Let’ test querying a Vector Index
response = vector_indices[(1, 2018)].query(“What is the operating cash flow?”)

print (response)

The operating cash flow for Q1 2018 was $18,194 million, representing a 13% year-over-year growth (excluding foreign exchange).

response = vector_indices[(1, 2018)].query(“What are the earning?”)
print (response)

Net income was $1.6 billion in the first quarter, or $3.27 per diluted share, compared with net income of $724 million, or $1.48 per diluted share, in first quarter 2017. This strong performance was driven by the continued expansion of AWS infrastructure, with two Availability Zones and one Local Region in Osaka, Japan launched during the first quarter of 2018, and plans for 12 more Availability Zones and four more regions in Bahrain, Hong Kong SAR, Sweden, and a second AWS GovCloud Region in the U.S. coming online between now and early 2019.

response = vector_indices[(1, 2018)].query(“What are the principal repayments of financial lease?”)
print (response)

Principal repayments of financial lease are payments made to reduce the amount of money owed on a loan or lease. These payments are typically made on a regular basis, such as monthly or quarterly, and are usually a fixed amount. They are an important part of managing finances, as they help to ensure that the loan or lease is paid off in a timely manner.

response = vector_indices[(1, 2018)].query(“What is the free cash flow?”)
print (response)

The free cash flow for Amazon.com is $7.3 billion for the trailing twelve months, compared with $10.1 billion for the trailing twelve months ended March 31, 2017. This is despite the company's expansion of its infrastructure, launching two Availability Zones and one Local Region in Osaka, Japan during the first quarter of 2018, and plans for 12 more Availability Zones and four more regions in Bahrain, Hong Kong SAR, Sweden, and a second AWS GovCloud Region in the U.S. coming online between now and early 2019.

response = vector_indices[(1, 2018)].query(“What are the operating expenses?”)
print (response)

Operating expenses are the costs associated with running a business, such as salaries, rent, utilities, and other overhead costs. These expenses do not include the cost of goods sold, which is the cost of the products or services that a business sells. Examples of operating expenses include stock-based compensation expense, fulfillment costs, marketing costs, technology and content costs, general and administrative costs, and shipping costs.

response = vector_indices[(1, 2018)].query(“What is the interest rate risk?”)
print (response)

The interest rate risk is the risk that changes in interest rates will adversely affect the value of investments. This risk is particularly relevant for Amazon.com, Inc. as they have a large amount of debt and borrowings, as evidenced by their balance sheet which shows $60.2 billion in current assets and $48.9 billion in property and equipment, net, as of December 31, 2017.

response = vector_indices[(1, 2018)].query(“What is the equity investment risk?”)
print (response)

The equity investment risk is the risk that the value of the equity investment will decrease due to changes in the market or other factors. This risk is present for any equity investment, and investors should be aware of the potential for losses when investing in stocks. This risk is especially pertinent when considering investments in companies such as Amazon.com, Inc., which has seen significant year-over-year growth in its net sales, but also has the potential to experience losses due to changes in the market or other factors.

response = vector_indices[(1, 2018)].query(“What is the foreign exchange risk?”)
print (response)

The foreign exchange risk is the risk that fluctuations in foreign exchange rates will have an adverse effect on Amazon.com's financial results. This risk is mentioned in the forward-looking statements, which state that the guidance assumes "no additional business acquisitions, investments, restructurings, or legal settlements are concluded" and that the guidance "anticipates a favorable impact of approximately $1.2 billion or 320 basis points from foreign exchange rates." This risk is particularly relevant for Amazon.com, given that international sales accounted for 31% of total net sales in 2017 and 29% of total net sales in 2018.

response = vector_indices[(1, 2018)].query(“What is the equity investment risk?”)
print (response)

The equity investment risk is the risk that the value of the equity investment will decrease due to changes in the market or other factors. This risk is present for any equity investment, and investors should be aware of the potential for losses when investing in stocks. This risk is especially pertinent when considering investments in companies such as Amazon.com, Inc., which has seen significant year-over-year growth in its net sales, but also has the potential to experience losses due to changes in the market or other factors.

response = vector_indices[(1, 2018)].query(“What are the net product sales?”)
print (response)

Net product sales are the total of Online stores, Physical stores, Third-party seller services, Subscription services, AWS, and Other. Therefore, the net product sales are $50,945.

response = vector_indices[(1, 2018)].query(“What are the net service sales?”)
print (response)

The net service sales are the sum of the sales from Online Stores, Third-Party Seller Services, Subscription Services, AWS, and Other. This would be $49,715.

response = vector_indices[(1, 2018)].query(“What is the marketing?”)
print (response)

Marketing refers to the activities associated with promoting and selling products or services. It includes advertising, selling, and delivering products to customers or clients. In the context of the information provided, marketing refers to the expenses associated with promoting and selling products or services, such as the $102 spent on marketing in the given table.

response = vector_indices[(1, 2018)].query(“What is the operating income?”)
print (response)

The operating income for Q1 2018 was $1,927 million, representing a year-over-year (Y/Y) growth of 13%.

response = vector_indices[(1, 2018)].query(“What are the operating expenses?”)
print (response)

Operating expenses are the costs associated with running a business, such as salaries, rent, utilities, and other overhead costs. These expenses do not include the cost of goods sold, which is the cost of the products or services that a business sells. Examples of operating expenses include stock-based compensation expense, fulfillment costs, marketing costs, technology and content costs, general and administrative costs, and shipping costs.

response = vector_indices[(1, 2018)].query(“What is the technology and content?”)
print (response)

Technology and content refers to the expenses associated with developing and maintaining technology and content for Amazon's products and services, such as software development, website development, content acquisition, and other related costs. These expenses are included in the company's current assets, such as cash and cash equivalents, marketable securities, inventories, and accounts receivable, as well as in other assets, such as property and equipment, goodwill, and other assets.

response = vector_indices[(1, 2018)].query(“What is the net income?”)
print (response)

Net income: $1,629

Summary

  • We have implemented and tested an AI-powered chatbot that can communicate with humans in a natural language. It has been trained on a large dataset of human conversation and can understand and generate text in response to user inputs. It uses NLP techniques to interpret the meaning of the user’s message and provide a relevant response. 
  • Our analysis of Amazon balance sheets supports previous studies [1] of large language models integrated with the multi-modal vector database [2-6].
  • Results have shown that LlamaIndex and DeepLake are powerful tools that greatly improve the efficiency of analyzing financial data.

Explore More

Semantic Analysis and NLP Visualizations of Wine Reviews

Build A Simple NLP/NLTK Chatbot

References

[1] LlamaIndex and Deep Lake for Financial Statement Analysis

[2] LlamaIndex

[3] DeepLake

[4] How to Build a Chatbot

[5] Activeloop

[6] OpenAI


One-Time
Monthly
Yearly

Make a one-time donation

Make a monthly donation

Make a yearly donation

Choose an amount

$5.00
$15.00
$100.00
$5.00
$15.00
$100.00
$5.00
$15.00
$100.00

Or enter a custom amount

$

Your contribution is appreciated.

Your contribution is appreciated.

Your contribution is appreciated.

DonateDonate monthlyDonate yearly
Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: