How to deal with a lazy Excel work using Python!
- 07-01-2018
- python
Share:
Copy
There are times when you are given a excel file with hundreds of rows and the boring part they think you will go through each row as a robot, analysing each row and present them with the required information.
Which is not going to happen with me. I was give a excel file with the details of all the bills of a shop for September month and was asked to find the no. of bill with specific details. The excel file is all messed up and contains 262 rows, let's analyse the file.
This is a GST(Goods And Service Tax) bill book as you can see the GST No. of the customer's or small shop's which bought goods from the Shop whose details are given to me. The problem with the given file is that the bills have goods with different tax rates. So the task which i was suppose to do is to find the total no. of bills which belongs to Five percent tax rate slab and also to organise each bills with the tax rates slabs belonging to them. Here the tax rates are divided by 2 because half goes to CGST and half goes to SGST. So by 2.5 here I mean the item falls in 5% tax slab and so on.
So a normal guy will go through each row and and make note of that and repeat the process for hours and hours.
I first look at it and knew what i was going to do. I knew there was a python package for handling Excel files and one which i used is openpyxl which is really awesome.
Let's break the problem. First Each bill contains different items which are represented by the HSN code on F column and their respective price on G column and rate on H column. So we can't treat each row on the file as a specific bill entry because one bill is sometimes contains more than one row.
Let's summerize I was suppose to find How many bills belongs to Five percent tax slab and organise each bill with their tax rates?
Here I am using Python3 and openpyxl package. you can install the openpyxl package by using pip install openpyxl command from a command line in windows or terminal in linux make sure you run the command line as a administrator in windows.
So to do the task in seconds rather than staring at the screen for hours here is the code I used.
Line_1: Here we just imported the package openpyxl
Line_2:Here we used the method load_workbook to load the Excel file. and stored that in the object work.
Line_4:As the file has three Sheets so we used the get_sheet_by_name method to grab the sheet2 where the details
Now that we are done loading the file it's time to do the real job. Here is the function calculate_rates_bill() to count bills which falls on Five percent tax slab and organise the bills with tax rates that they contain.
Line_7 to Line_10: Here I made variables to count the bill in Five percent tax slab.
The variable v is for the main loop and list l is for the temporary storing of tax rates. bill_count is for counting bill No.
The main loop is on Line_12 which will start from v which we set above to 1 and stop at sheet.max_row + 1 which is the end of the file. Make sure to set v to the row where the bill begins.
bill variable will hold the value which we will get from row=i and column=5 by that I am trying to do is that if the column is not None that means it is the biginning of a bill.
So the main process I am trying to do is to loop each row and test if the column 5 is blank or not and if it is not blank then hold the rate value on column 8 of that row in the list l and at the same time using while loop test the next rows for blank or None entry so if on next row column 5 is blank that means the bill above is continued and not terminited and the rate on column 8 of the second row belongs to the same bill so we again add that to the list l. Hope you got the process and we will do the same again and again and when we encounter a row wihich is not None or is not blank, then the while loop will break and will end the above bill and start an new bill by clearing the list l.
So the code is quite messed up but does the job. Well here is the output.
Well one problem is solved but i got another one and this is more difficult one. I will write a detailed blog on the new more interesting problem.
Full code download link: code