- One of the key benefits of NLP for finance and banking professionals is the ability to process large amounts of data in a short amount of time.
- For example, business analysts can use ChatGPT to quickly answer queries, summarize complex financial reports, and analyze trends based on the data provided.
- In this post, we will implement an interactive ChatGPT query interface for Amazon financial statements. Specifically, we will explore the benefits of using LlamaIndex and DeepLake to query financial data [1-6].
Let’s set the working directory YOURPATH
import os
os.chdir(‘YOURPATH’)
os. getcwd()
and install the key libraries
!pip install llama-index
!pip install deeplake
Let’s import the libraries
from llama_index import (
SimpleDirectoryReader,
GPTDeepLakeIndex,
GPTSimpleKeywordTableIndex,
Document,
LLMPredictor,
ServiceContext,
download_loader,
)
from langchain.chat_models import ChatOpenAI
from typing import List, Optional, Tuple
import requests
import tqdm
import os
from pathlib import Path
Let’s define the PDF file reader
PDFReader = download_loader(“PDFReader”)
loader = PDFReader()
and prepare the list of Amazon financial statements
urls = [‘https://s2.q4cdn.com/299287126/files/doc_financials/Q1_2018_-_8-K_Press_Release_FILED.pdf’,
‘https://s2.q4cdn.com/299287126/files/doc_financials/Q2_2018_Earnings_Release.pdf’,
‘https://s2.q4cdn.com/299287126/files/doc_news/archive/Q318-Amazon-Earnings-Press-Release.pdf’,
‘https://s2.q4cdn.com/299287126/files/doc_news/archive/AMAZON.COM-ANNOUNCES-FOURTH-QUARTER-SALES-UP-20-TO-$72.4-BILLION.pdf’,
‘https://s2.q4cdn.com/299287126/files/doc_financials/Q119_Amazon_Earnings_Press_Release_FINAL.pdf’,
‘https://s2.q4cdn.com/299287126/files/doc_news/archive/Amazon-Q2-2019-Earnings-Release.pdf’,
‘https://s2.q4cdn.com/299287126/files/doc_news/archive/Q3-2019-Amazon-Financial-Results.pdf’,
‘https://s2.q4cdn.com/299287126/files/doc_news/archive/Amazon-Q4-2019-Earnings-Release.pdf’,
‘https://s2.q4cdn.com/299287126/files/doc_financials/2020/Q1/AMZN-Q1-2020-Earnings-Release.pdf’,
‘https://s2.q4cdn.com/299287126/files/doc_financials/2020/q2/Q2-2020-Amazon-Earnings-Release.pdf’,
‘https://s2.q4cdn.com/299287126/files/doc_financials/2020/q4/Amazon-Q4-2020-Earnings-Release.pdf’,
‘https://s2.q4cdn.com/299287126/files/doc_financials/2021/q1/Amazon-Q1-2021-Earnings-Release.pdf’,
‘https://s2.q4cdn.com/299287126/files/doc_financials/2021/q2/AMZN-Q2-2021-Earnings-Release.pdf’,
‘https://s2.q4cdn.com/299287126/files/doc_financials/2021/q3/Q3-2021-Earnings-Release.pdf’,
‘https://s2.q4cdn.com/299287126/files/doc_financials/2021/q4/business_and_financial_update.pdf’,
‘https://s2.q4cdn.com/299287126/files/doc_financials/2022/q1/Q1-2022-Amazon-Earnings-Release.pdf’,
‘https://s2.q4cdn.com/299287126/files/doc_financials/2022/q2/Q2-2022-Amazon-Earnings-Release.pdf’,
‘https://s2.q4cdn.com/299287126/files/doc_financials/2022/q3/Q3-2022-Amazon-Earnings-Release.pdf’,
‘https://s2.q4cdn.com/299287126/files/doc_financials/2022/q4/Q4-2022-Amazon-Earnings-Release.pdf’
]
years = [
2018, 2018, 2018, 2018,
2019, 2019, 2019, 2019,
2020, 2020, 2020,
2021, 2021, 2021, 2021,
2022, 2022, 2022, 2022
]
months = [
1, 4, 7, 10,
1, 4, 7, 10,
1, 4, 10,
1, 4, 7, 10,
1, 4, 7, 10
]
Let’s download the reports
zipped_data = list(zip(urls, months, years))
def download_reports(data: List[Tuple[str, int, int]], out_dir: Optional[str] = None) -> List[Document]:
“””Download pages from a list of urls.”””
docs = []
out_dir = Path(out_dir or “.”)
if not out_dir.exists():
print(out_dir)
os.makedirs(out_dir)
for url, month, year in tqdm.tqdm(data):
path_base = url.split(‘/’)[-1]
out_path = out_dir / path_base
if not out_path.exists():
r = requests.get(url)
with open(out_path, ‘wb’) as f:
f.write(r.content)
doc = loader.load_data(file=Path(out_path))[0]
date_str = f"{month:02d}" + "-01-" + str(year)
doc.extra_info = {"Date": date_str}
docs.append(doc)
return docs
def _get_quarter_from_month(month: int) -> str:
mapping = {
1: “Q1”,
4: “Q2”,
7: “Q3”,
10: “Q4”
}
return mapping[month]
docs = download_reports(zipped_data, ‘data’)
Let’s define the OpenAI key
os.environ[“OPENAI_API_KEY”] = “your_openai_api_key”
Let’s call LLMPredictor and ServiceContext
from llama_index import GPTTreeIndex, MockLLMPredictor, SimpleDirectoryReader, ServiceContext
llm_predictor = LLMPredictor(llm=ChatOpenAI(temperature=0, model_name=”gpt-3.5-turbo”))
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)
Let’s build the vector index for each quarterly statement, store results in dictionary
dataset_root = ‘../data’
vector_indices = {}
for idx, (_, month, year) in enumerate(zipped_data):
doc = docs[idx]
dataset_path = dataset_root + f”{month:02d}_{year}”
vector_index = GPTDeepLakeIndex.from_documents([doc], dataset_path=dataset_path, overwrite=True)
vector_indices[(month, year)] = vector_index
Let’ test querying a Vector Index
response = vector_indices[(1, 2018)].query(“What is the operating cash flow?”)
print (response)
The operating cash flow for Q1 2018 was $18,194 million, representing a 13% year-over-year growth (excluding foreign exchange).
response = vector_indices[(1, 2018)].query(“What are the earning?”)
print (response)
Net income was $1.6 billion in the first quarter, or $3.27 per diluted share, compared with net income of $724 million, or $1.48 per diluted share, in first quarter 2017. This strong performance was driven by the continued expansion of AWS infrastructure, with two Availability Zones and one Local Region in Osaka, Japan launched during the first quarter of 2018, and plans for 12 more Availability Zones and four more regions in Bahrain, Hong Kong SAR, Sweden, and a second AWS GovCloud Region in the U.S. coming online between now and early 2019.
response = vector_indices[(1, 2018)].query(“What are the principal repayments of financial lease?”)
print (response)
Principal repayments of financial lease are payments made to reduce the amount of money owed on a loan or lease. These payments are typically made on a regular basis, such as monthly or quarterly, and are usually a fixed amount. They are an important part of managing finances, as they help to ensure that the loan or lease is paid off in a timely manner.
response = vector_indices[(1, 2018)].query(“What is the free cash flow?”)
print (response)
The free cash flow for Amazon.com is $7.3 billion for the trailing twelve months, compared with $10.1 billion for the trailing twelve months ended March 31, 2017. This is despite the company's expansion of its infrastructure, launching two Availability Zones and one Local Region in Osaka, Japan during the first quarter of 2018, and plans for 12 more Availability Zones and four more regions in Bahrain, Hong Kong SAR, Sweden, and a second AWS GovCloud Region in the U.S. coming online between now and early 2019.
response = vector_indices[(1, 2018)].query(“What are the operating expenses?”)
print (response)
Operating expenses are the costs associated with running a business, such as salaries, rent, utilities, and other overhead costs. These expenses do not include the cost of goods sold, which is the cost of the products or services that a business sells. Examples of operating expenses include stock-based compensation expense, fulfillment costs, marketing costs, technology and content costs, general and administrative costs, and shipping costs.
response = vector_indices[(1, 2018)].query(“What is the interest rate risk?”)
print (response)
The interest rate risk is the risk that changes in interest rates will adversely affect the value of investments. This risk is particularly relevant for Amazon.com, Inc. as they have a large amount of debt and borrowings, as evidenced by their balance sheet which shows $60.2 billion in current assets and $48.9 billion in property and equipment, net, as of December 31, 2017.
response = vector_indices[(1, 2018)].query(“What is the equity investment risk?”)
print (response)
The equity investment risk is the risk that the value of the equity investment will decrease due to changes in the market or other factors. This risk is present for any equity investment, and investors should be aware of the potential for losses when investing in stocks. This risk is especially pertinent when considering investments in companies such as Amazon.com, Inc., which has seen significant year-over-year growth in its net sales, but also has the potential to experience losses due to changes in the market or other factors.
response = vector_indices[(1, 2018)].query(“What is the foreign exchange risk?”)
print (response)
The foreign exchange risk is the risk that fluctuations in foreign exchange rates will have an adverse effect on Amazon.com's financial results. This risk is mentioned in the forward-looking statements, which state that the guidance assumes "no additional business acquisitions, investments, restructurings, or legal settlements are concluded" and that the guidance "anticipates a favorable impact of approximately $1.2 billion or 320 basis points from foreign exchange rates." This risk is particularly relevant for Amazon.com, given that international sales accounted for 31% of total net sales in 2017 and 29% of total net sales in 2018.
response = vector_indices[(1, 2018)].query(“What is the equity investment risk?”)
print (response)
The equity investment risk is the risk that the value of the equity investment will decrease due to changes in the market or other factors. This risk is present for any equity investment, and investors should be aware of the potential for losses when investing in stocks. This risk is especially pertinent when considering investments in companies such as Amazon.com, Inc., which has seen significant year-over-year growth in its net sales, but also has the potential to experience losses due to changes in the market or other factors.
response = vector_indices[(1, 2018)].query(“What are the net product sales?”)
print (response)
Net product sales are the total of Online stores, Physical stores, Third-party seller services, Subscription services, AWS, and Other. Therefore, the net product sales are $50,945.
response = vector_indices[(1, 2018)].query(“What are the net service sales?”)
print (response)
The net service sales are the sum of the sales from Online Stores, Third-Party Seller Services, Subscription Services, AWS, and Other. This would be $49,715.
response = vector_indices[(1, 2018)].query(“What is the marketing?”)
print (response)
Marketing refers to the activities associated with promoting and selling products or services. It includes advertising, selling, and delivering products to customers or clients. In the context of the information provided, marketing refers to the expenses associated with promoting and selling products or services, such as the $102 spent on marketing in the given table.
response = vector_indices[(1, 2018)].query(“What is the operating income?”)
print (response)
The operating income for Q1 2018 was $1,927 million, representing a year-over-year (Y/Y) growth of 13%.
response = vector_indices[(1, 2018)].query(“What are the operating expenses?”)
print (response)
Operating expenses are the costs associated with running a business, such as salaries, rent, utilities, and other overhead costs. These expenses do not include the cost of goods sold, which is the cost of the products or services that a business sells. Examples of operating expenses include stock-based compensation expense, fulfillment costs, marketing costs, technology and content costs, general and administrative costs, and shipping costs.
response = vector_indices[(1, 2018)].query(“What is the technology and content?”)
print (response)
Technology and content refers to the expenses associated with developing and maintaining technology and content for Amazon's products and services, such as software development, website development, content acquisition, and other related costs. These expenses are included in the company's current assets, such as cash and cash equivalents, marketable securities, inventories, and accounts receivable, as well as in other assets, such as property and equipment, goodwill, and other assets.
response = vector_indices[(1, 2018)].query(“What is the net income?”)
print (response)
Net income: $1,629
Summary
- We have implemented and tested an AI-powered chatbot that can communicate with humans in a natural language. It has been trained on a large dataset of human conversation and can understand and generate text in response to user inputs. It uses NLP techniques to interpret the meaning of the user’s message and provide a relevant response.
- Our analysis of Amazon balance sheets supports previous studies [1] of large language models integrated with the multi-modal vector database [2-6].
- Results have shown that LlamaIndex and DeepLake are powerful tools that greatly improve the efficiency of analyzing financial data.
Explore More
Semantic Analysis and NLP Visualizations of Wine Reviews
Build A Simple NLP/NLTK Chatbot
References
[1] LlamaIndex and Deep Lake for Financial Statement Analysis
[2] LlamaIndex
[3] DeepLake
[5] Activeloop
[6] OpenAI
Make a one-time donation
Make a monthly donation
Make a yearly donation
Choose an amount
Or enter a custom amount
Your contribution is appreciated.
Your contribution is appreciated.
Your contribution is appreciated.
DonateDonate monthlyDonate yearly