Automating Microsoft Excel Tasks & Reports with Python

  • Microsoft (MS) Excel is frequently used for data analysis across a range of industries, including banking, healthcare, and marketing, thanks to its versatility and usability.
  • Although Excel is proprietary software from MS, there are free alternatives that run on Windows, OS X, and Linux. 
  • This article aims to demonstrate A-Z automation of repetitive tasks and reports in Excel.
  • In practice, Excel offers automation through the use of third-party programs like Python or the built-in programming language VBA.
  • When it comes to automating, Python is faster and can handle significantly bigger datasets than VBA. Besides, VBA is restricted to Microsoft Office automation, but Python may run on any operating system and be used for a wide range of tasks.
  • Referring to the Python Excel tutorial in the sequel, we will be using the Kaggle’s Video Game Sales dataset.
  • Our objective is to test the openpyxl library that allows our Python programs to read and modify Excel spreadsheet files.
  • This library can help you write functions, format spreadsheets, create reports, and build charts directly in Python without even having to open an Excel application.

Clickable Table of Contents

  1. Excel-to-Python Data Import
  2. Getting Data from Cells
  3. Writing to Excel Files
  4. Creating New Columns
  5. Appending New Rows
  6. Implementing Excel Formulas
  7. Working with Sheets
  8. Adding Basic Charts
  9. Formatting Cells
  10. Conclusions
  11. References
  12. Explore More

Excel-to-Python Data Import

Let’s set the working directory YOURPATH

import os
os.chdir('YOURPATH')  # Set working directory
os. getcwd()

and install openpyxl

!pip install openpyxl 

If the module was correctly installed, this should produce no error messages.

Once we’ve imported the openpyxl module, we’ll be able to use the openpyxl.load_workbook() function

import openpyxl 
wb = openpyxl.load_workbook('videogamesales.xlsx')

This function takes in the filename and returns a value of the workbook data type. Remember that *.xlsx needs to be in the current working directory YOURPATH in order to work with it. 

Let’s tell the library which worksheet to access:

call the active worksheet (the 1st sheet in the workbook)

ws = wb.active

or access the “vgsales” worksheet

ws = wb['vgsales']

Let’s count the number of rows and columns in this worksheet

print('Total number of rows: '+str(ws.max_row)+'. And total number of columns: '+str(ws.max_column))

Output:

Total number of rows: 16328. And total number of columns: 10

Let’s learn how to access cells from the workbook.

Getting Data from Cells

Let’s look at the active sheet vgsales

The active sheet vgsales

Enter the following into the interactive shell

print('The value in cell A1 is: '+ws['A1'].value)

Output:

The value in cell A1 is: Rank

The Cell object has a value attribute that contains the value stored in that cell. Cell objects also have rowcolumn, and coordinate attributes that provide location information for the cell.

Let’s print out all the values in the first row

values = [ws.cell(row=1,column=i).value for i in range(1,ws.max_column+1)]
print(values)

Output:

['Rank', 'Name', 'Platform', 'Year', 'Genre', 'Publisher', 'NA_Sales', 'EU_Sales', 'JP_Sales', 'Other_Sales']

As an example, let’s print out first 10 rows in the 2nd column (B)

data=[ws.cell(row=i,column=2).value for i in range(2,12)]
print(data)

Output:

['Wii Sports', 'Super Mario Bros.', 'Mario Kart Wii', 'Wii Sports Resort', 'Pokemon Red/Pokemon Blue', 'Tetris', 'New Super Mario Bros.', 'Wii Play', 'New Super Mario Bros. Wii', 'Duck Hunt']

Let’s read a range of cells (e.g., columns 1-6)

my_list = list()

for value in ws.iter_rows(
    min_row=1, max_row=11, min_col=1, max_col=6, 
    values_only=True):
    my_list.append(value)
    
for ele1,ele2,ele3,ele4,ele5,ele6 in my_list:
    (print ("{:<8}{:<35}{:<10} {:<10}{:<15}{:<15}".format(ele1,ele2,ele3,ele4,ele5,ele6)))

Output:

Rank    Name                               Platform   Year      Genre          Publisher      
1       Wii Sports                         Wii        2006      Sports         Nintendo       
2       Super Mario Bros.                  NES        1985      Platform       Nintendo       
3       Mario Kart Wii                     Wii        2008      Racing         Nintendo       
4       Wii Sports Resort                  Wii        2009      Sports         Nintendo       
5       Pokemon Red/Pokemon Blue           GB         1996      Role-Playing   Nintendo       
6       Tetris                             GB         1989      Puzzle         Nintendo       
7       New Super Mario Bros.              DS         2006      Platform       Nintendo       
8       Wii Play                           Wii        2006      Misc           Nintendo       
9       New Super Mario Bros. Wii          Wii        2009      Platform       Nintendo       
10      Duck Hunt                          NES        1984      Shooter        Nintendo       

Generally, to get all the rows of the worksheet we use the method worksheet.rows and to get all the columns of the worksheet we use the method worksheet.columns. Similarly, to iterate only through the values we use the method worksheet.values.

Writing to Excel Files

You can always access the cell directly by using its key

ws['K1'] = 'Sum of Sales'

or by defining the row-column position of the cell

ws.cell(row=1, column=11, value = 'Sum of Sales')

Output:

<Cell 'vgsales'.K1>

You can update an Excel file with openpyxl

wb.save('videogamesales.xlsx')

Always make sure to close the Excel file before saving your changes!

Creating New Columns

Let’s add the sum of sales in every region and write it to column K

row_position = 2
col_position = 7

total_sales = ((ws.cell(row=row_position, column=col_position).value)+
               (ws.cell(row=row_position, column=col_position+1).value)+
               (ws.cell(row=row_position, column=col_position+2).value)+
               (ws.cell(row=row_position, column=col_position+3).value))

ws.cell(row=2,column=11).value=total_sales
wb.save('videogamesales.xlsx')

One can see that the total sales of vgsales have been calculated in cell K2 below

Column K Sum of Sales

Similarly, we can create a sum of sales in every row

row_position = 1

for i in range(1, ws.max_row):

    row_position += 1
    NA_Sales = ws.cell(row=row_position, column=7).value
    EU_Sales = ws.cell(row=row_position, column=8).value
    JP_Sales = ws.cell(row=row_position, column=9).value
    Other_Sales = ws.cell(row=row_position, column=10).value

    total_sales = (NA_Sales + EU_Sales + JP_Sales + Other_Sales)
    ws.cell(row=row_position, column=11).value = total_sales

wb.save("videogamesales.xlsx")

The corresponding Excel file should now have a new column K reflecting the total sales of video games in all regions

Column K  - the total sales of video games in all regions

Appending New Rows

To append a new row to the workbook, we simply create a tuple with the values we’d like to include and write it to the sheet

new_row = (1,'The Legend of Zelda',1986,'Action','Nintendo',3.74,0.93,1.69,0.14,6.51,6.5)

ws.append(new_row)
    
wb.save('videogamesales.xlsx')

We can confirm that this data has been appended by printing the last row in the workbook

values = [ws.cell(row=ws.max_row,column=i).value for i in range(1,ws.max_column+1)]
print(values)

The corresponding Excel file is as follows

The extra row 16329nadded to the sheet.

To delete the new row we just created, we can run the following line of code

ws.delete_rows(ws.max_row, 1) # row number, number of rows to delete

wb.save('videogamesales.xlsx')

The first argument in the delete_rows() function is the row number you want to delete. The second argument indicates the number of rows that should be deleted.

Implementing Excel Formulas

Let’s create a new column called “Average Sales” to calculate the average total video game sales in all markets

ws['P1'] = 'Average Sales'
ws['P2'] = '= AVERAGE(K2:K16220)'

wb.save('videogamesales.xlsx')

The average sales across all markets is approximately 0.54. This will be printed in cell P2 of our worksheet

The “COUNTA” function in Excel counts cells that are populated within a specific range. Let’s use it to find the number of records between E2 and E16220

ws['Q1'] = "Number of Populated Cells" 
ws['Q2'] = '=COUNTA(E2:E16220)'

wb.save('videogamesales.xlsx')

The updated Excel worksheet is as follows

Excel updated file with Number of Populated Cells

Let’s look at COUNTIF that counts the number of cells that meet a specific condition. We can use it to count the number of games in this dataset with the “Sports” genre

ws['R1'] = 'Number of Rows with Sports Genre'
ws['R2'] = '=COUNTIF(E2:E16220, "Sports")'

wb.save('videogamesales.xlsx')

The updated Excel worksheet contains the new cells R1 and R2

The updated Excel worksheet that contains the new cells R1 and R2 representing Number of Rows with Sports Genre

There are 2296 sports games in the dataset.

Let’s find the total “Sum of Sales” generated by sports games using the SUMIF function

ws['S1'] = 'Total Sports Sales'
ws['S2'] = '=SUMIF(E2:E16220, "Sports",K2:K16220)'
 
wb.save('videogamesales.xlsx')

The updated Excel worksheet contains the new cells S1 and S2

The updated Excel worksheet that contains the new cells S1 and S2 representing Total Sports Sales

The Total Sports Sales is 1308,76.

Let’s calculate Rounded Sum of Sports Sales (cell S2) using CEILING

ws['T1'] = 'Rounded Sum of Sports Sales'
ws['T2'] = '=CEILING(S2,25)'

wb.save('videogamesales.xlsx')

The updated Excel file containing cells T1 and T2 is

The updated Excel file containing cell T2 Rounded Sum of Sports Sales

The Rounded Sum of Sports Sales is 1325.

Working with Sheets

Let’s print the name of the active sheet

print(ws.title)

Output:

vgsales

Let’s rename this worksheet

ws.title ='Video Game Sales Data'

wb.save('videogamesales.xlsx')

The name of the active sheet should now be changed to “Video Game Sales Data.”

Let’s print out all the worksheets in the workbook

print(wb.sheetnames)

Output:

['Video Game Sales Data', 'Total Sales by Genre', 'Breakdown of Sales by Genre', 'Breakdown of Sales by Year']

Let’s create a new empty worksheet and print out all the worksheets in the workbook

wb.create_sheet('Empty Sheet') # create an empty sheet
print(wb.sheetnames) # print sheet names again

wb.save('videogamesales.xlsx')

Output:

['Video Game Sales Data', 'Total Sales by Genre', 'Breakdown of Sales by Genre', 'Breakdown of Sales by Year', 'Empty Sheet']

Let’s delete this worksheet and print out all the worksheets in the workbook

wb.remove(wb['Empty Sheet'])
print(wb.sheetnames)

wb.save('videogamesales.xlsx')

Output:

['Video Game Sales Data', 'Total Sales by Genre', 'Breakdown of Sales by Genre', 'Breakdown of Sales by Year']

Let’s create a copy of the worksheet “Video Game Sales Data”

wb.copy_worksheet(wb['Video Game Sales Data'])
wb.save('vgsales_2.xlsx')

We can get the required worksheet

ws = wb['Total Sales by Genre'] # access the required worksheet

Adding Basic Charts

Let’s build charts in Excel directly from Python.

  • Bar Chart
# Values for plotting

from openpyxl.chart import Reference

values = Reference(ws,         # worksheet object   
                   min_col=2,  # minimum column where your values begin
                   max_col=2,  # maximum column where your values end
                   min_row=1,  # minimum row you’d like to plot from
                   max_row=13) # maximum row you’d like to plot from
cats = Reference(ws, 
                 min_col=1, 
                 max_col=1, 
                 min_row=2, 
                 max_row=13)
from openpyxl.chart import BarChart

chart = BarChart()
chart.add_data(values, titles_from_data=True)
chart.set_categories(cats)
# set the title of the chart
chart.title = "Total Sales"

# set the title of the x-axis
chart.x_axis.title = "Genre"

# set the title of the y-axis
chart.y_axis.title = "Total Sales by Genre"

# the top-left corner of the chart
# is anchored to cell F2 .
ws.add_chart(chart,"D2")

# save the file 
wb.save("videogamesales.xlsx")

Output:

Excel Bar Plot Total Sales by Genre
  • Grouped Bar Chart
ws = wb['Breakdown of Sales by Genre'] # access worksheet

# Data for plotting

values = Reference(ws,
                   min_col=2,
                   max_col=5,
                   min_row=1,
                   max_row=13)

cats = Reference(ws, min_col=1, 
                     max_col=1, 
                     min_row=2, 
                     max_row=13)
# Create object of BarChart class

chart = BarChart()
chart.add_data(values, titles_from_data=True)
chart.set_categories(cats)

# set the title of the chart
chart.title = "Sales Breakdown"

# set the title of the x-axis
chart.x_axis.title = "Genre"

# set the title of the y-axis
chart.y_axis.title = "Breakdown of Sales by Genre"

# the top-left corner of the chart is anchored to cell H2.
ws.add_chart(chart,"H2")

# save the file 
wb.save("videogamesales.xlsx")

Output:

Excel Group Bar  Plot Sales Breakdown by Genre
  • Grouped Line Chart

Let’s create a stacked line chart using data in the tab “Breakdown of Sales by Year.” This worksheet contains video game sales data broken down by year and region:

# Data for plotting

values = Reference(ws,
                   min_col=2,
                   max_col=6,
                   min_row=1,
                   max_row=40)

cats = Reference(ws, min_col=1, max_col=1, min_row=2, max_row=40)
# Create object of LineChart class

from openpyxl.chart import LineChart

chart = LineChart()
chart.add_data(values, titles_from_data=True)
chart.set_categories(cats)

# set the title of the chart
chart.title = "Total Sales"

# set the title of the x-axis
chart.x_axis.title = "Year"

# set the title of the y-axis
chart.y_axis.title = "Total Sales by Year"

# the top-left corner of the chart is anchored to cell H2 
ws.add_chart(chart,"H2")

# save the file 
wb.save("videogamesales.xlsx")

Output:

Stacked Excel line chart total sales by year and region

Formatting Cells

  • Openpyxl allows users to format cells in Excel workbooks by changing font sizes, background colors, and cell borders directly in Python.
  • Read more various ways to customize your Python Excel spreadsheets using Openpyxl.

Conclusions

  • This study shows how to automate your day-day MS Excel tasks using Pandas and Openpyxl Python.
  • Openpyxl is a Python library that is used to read from an Excel file or write to an Excel file.
  • Data scientists use Openpyxl for data analysis, data copying, data mining, drawing charts, styling sheets, adding formulas, and more.
  • We have considered Excel spreadsheet I/O, working with Excel formulas, cell manipulations, and a few basic Excel charts.
  • However, not all features of Excel automation are supported via Openpyxl, such as filtering and Pivot tables.
  • To access advanced features such as running macros and converting Excel to PDFs, one can combine Openpyxl with other libraries such as win32com python client or xlwings for an extended version of Excel automation using Python.

References

Explore More


Go back

Your message has been sent

Warning

One-Time
Monthly
Yearly

Make a one-time donation

Make a monthly donation

Make a yearly donation

Choose an amount

€5.00
€15.00
€100.00
€5.00
€15.00
€100.00
€5.00
€15.00
€100.00

Or enter a custom amount


Your contribution is appreciated.

Your contribution is appreciated.

Your contribution is appreciated.

DonateDonate monthlyDonate yearly

Discover more from Our Blogs

Subscribe to get the latest posts sent to your email.

Leave a comment

Discover more from Our Blogs

Subscribe now to keep reading and get access to the full archive.

Continue reading