英文:
Excel sheet is corrupted after using sheet.insert_rows() in python
问题 {#heading}
def generateExcel(data_list):
file_path = 'Finaltemplatetesting.xlsx'
sheet_name = 'Main'
workbook = openpyxl.load_workbook(file_path)
# 选择表格
sheet = workbook[sheet_name]
# 查找表格起始单元格(左上角单元格)
i=0
for row in sheet.iter_rows(values_only=True):
i=i+1
if 'Project' in row:
header_row = row
break
header_col = header_row.index('Project')
# 初始化起始行
start_row=i+2
# 填充表格新数据
for data in data_list:
sheet.insert_rows(start_row)
for row_data in data:
for col_idx, col_name in enumerate(header_row):
cell = sheet.cell(row=start_row, column=header_col + col_idx, value=row_data.get(col_name))
start_row += 1
workbook.save("Finaltempalterrrrrr.xlsx")
英文:
So I'm trying to write a python script that fills an Excel sheet table with data (list of dictionaries).
The problem is the table I'm trying to fill in the Excel sheet is in the middle of the sheet (it doesn't start from A1).
So I wrote this method that locates from where the table starts and then fill it:
def generateExcel(data_list):
file_path = 'Finaltemplatetesting.xlsx'
sheet_name = 'Main'
workbook = openpyxl.load_workbook(file_path)
# Select the sheet
sheet = workbook[sheet_name]
# Find the starting cell (top-left cell) of the table
i=0
for row in sheet.iter_rows(values_only=True):
i=i+1
if 'Project' in row:
header_row = row
break
header_col = header_row.index('Project')
#intializing the starting row
start_row=i+2
# Fill the table with the new data
for data in data_list:
sheet.insert_rows(start_row)
for row_data in data:
for col_idx, col_name in enumerate(header_row):
cell = sheet.cell(row=start_row, column=header_col + col_idx, value=row_data.get(col_name))
start_row += 1
workbook.save("Finaltempalterrrrrr.xlsx")
There are some cells under the table that I'm using for calculating... etc., so my idea was to add a row just under the columns row using sheet.insert_rows()
and then fill it.
When I run this function it corrupts my Excel sheet like if I open the sheet I get:
we found a problem with some content in 'finaltemplaterrrrr.xlsx' , do you want us to try to recover as much as we can ? if you trust the source of this workbook, click yes.
And then after I click yes, it closes the sheet.
What is interesting is if I try to add only one row this way it works fine, and it adds the row exactly like I want:
sheet.insert_rows(start_row)
for row_data in data:
for col_idx, col_name in enumerate(header_row):
cell = sheet.cell(row=start_row, column=header_col + col_idx, value=row_data.get(col_name))
workbook.save("Finaltempalterrrrrr.xlsx")
I don't know why it doesn't work when I try to add multiple rows using a for loop, just as shown in the code above
This is how part of the sheet template looks like:
Dummy data_list
[
[{'Value1': '15', 'Project': 'AVTR', 'Score': 'Normal', 'Name': 'Raul', 'Date': '2023-03-16T14:58:33+01:00'}],
[{'Value1': '10', 'Project': 'TRG', 'Score': 'High', 'Name': 'Bob', 'Date': 'N/A'}],
[{'Value1': '12', 'Project': 'AVTR', 'Score': 'High', 'Name': 'Alice', 'Date': 'N/A'}]
]
答案1 {#1}
得分: 1
在你的代码中,对于`data_list`中的每个数据项,你都会插入一个新行,然后填充它。
``` python
# Starting point for row insertion.
start_row = i + 2
# Iterate through the data_list.
for data in data_list:
# Insert a row at start_row.
sheet.insert_rows(start_row)
# [Populate the inserted row]
# Increment the start_row by 1 for the next iteration.
start_row += 1
因此,随着循环的进行,插入点的索引(start_row
)会递增。
让我们以以下表格为例,目标是在Alice
和Bob
之间插入数据:
| Name | Value | |-------|-------| | Alice | 5 | | Bob | 10 | | ... | ... |
这意味着对于每个后续的数据项,您都会将前面的数据项和Bob
一起向下推移。
例如,对于在Alice
和Bob
之间插入的三个数据项:
| Name | Value | |-------|-------| | Alice | 5 | | Data1 | val1 | | Data2 | val2 | | Data3 | val3 | | Bob | 10 |
在考虑到Excel中的复杂格式和对象时,这可能会导致格式、公式、合并单元格等方面的问题,多次插入行可能会导致格式、公式、合并单元格等方面的问题,这可能导致报告的损坏。
为了测试,我建议尝试在循环之前一次性插入整个data_list
所需的所有行:而不是在循环中重复插入行,您可以进行单个插入操作,然后填充行,使用def insert_rows(self, idx, amount=1)
方法:
sheet.insert_rows(start_row, len(data_list))
这将是:
def generateExcel(data_list):
file_path = 'Finaltemplatetesting.xlsx'
sheet_name = 'Main'
workbook = openpyxl.load_workbook(file_path)
# Select the sheet
sheet = workbook[sheet_name]
# Find the starting cell (top-left cell) of the table
i = 0
for row in sheet.iter_rows(values_only=True):
i = i + 1
if 'Project' in row:
header_row = row
break
header_col = header_row.index('Project')
# Initializing the starting row
start_row = i + 2
# Insert all required rows at once
sheet.insert_rows(start_row, len(data_list))
# Now fill the data
for data in data_list:
for col_idx, col_name in enumerate(header_row):
cell = sheet.cell(row=start_row, column=header_col + col_idx, value=data.get(col_name))
start_row += 1
workbook.save("Finaltempalterrrrrr.xlsx")
如果您的data_list
结构如下:
[
[{'Value1': '15', 'Project': 'AVTR', 'Score': 'Normal', 'Name': 'Raul', 'Date': '2023-03-16T14:58:33+01:00'}],
[{'Value1': '10', 'Project': 'TRG', 'Score': 'High', 'Name': 'Bob', 'Date': 'N/A'}],
[{'Value1': '12', 'Project': 'AVTR', 'Score': 'High', 'Name': 'Alice', 'Date': 'N/A'}]
]
这意味着它具有额外的列表层次,这与初始代码的预期不符。具体来说,每个字典都包装在一个列表中,使data_list
成为字典列表的列表。初始代码期望data_list
只是一个字典列表,如下所示:
data_list = [
{'Value1': '15', 'Project': 'AVTR', 'Score': 'Normal', 'Name': 'Raul', 'Date': '2023-03-16T14:58:33+01:00'},
{'Value1': '10', 'Project': 'TRG', 'Score': 'High', 'Name': 'Bob', 'Date': 'N/A'},
{'Value1': '12', 'Project': 'AVTR', 'Score': 'High', 'Name': 'Alice', 'Date': 'N/A'}
]
如果要使用原始的data_list
结构,您需要调整在循环中访问数据的方式。具体来说,您需要遍历data_list
中的内部列表:
# Now fill the data
for inner_list in data_list:
for data in inner_list:
for col_idx, col_name in enumerate(header_row):
cell = sheet.cell(row=start_row, column=header_col + col_idx, value=data.get(col_name))
start_row += 1
在插入了三个数据项的情况下,您将获得三个空行:
| Name | Value | |-------|-------| | Alice | 5 | | | | | | | | | | | Bob | 10 |
然后,后续的循环将填充这些行:
| Name | Value | |-------|-------| | Alice | 5 | | Data1 | val1 | | Data2 | val2 | | Data3 | val3 | | Bob | 10 |
与多次编辑工作表不同,您只需要调整一次工作表的结构。这降低了无意中干扰Excel内部或其各种对象和引用的风险。
然后检查问题是否仍然存在。
如果即使简化代码只插入行,问题仍然存在,可能是Excel文件中的其他复杂性导致了损坏。这可能与公式、格式、合并单元格、图表或其他对象有关。特别是,合并单元格在插入行/列时可能会引起问题。如果表 英文:
In your code, for each data item in the data_list
, you are inserting a new row and then populating it.
# Starting point for row insertion.
start_row = i + 2
# Iterate through the data_list.
for data in data_list:
# Insert a row at start_row.
sheet.insert_rows(start_row)
# [Populate the inserted row]
# Increment the start_row by 1 for the next iteration.
start_row += 1
So, as the loop progresses, the index of the insertion point (start_row
) gets incremented.
Let's consider for instance the following table, with the goal to insert data between Alice
and Bob
:
| Name | Value | |-------|-------| | Alice | 5 | | Bob | 10 | | ... | ... |
That means for each subsequent data item, you are pushing the previous data items and Bob
further down.
For example, for three data items inserted between Alice
and Bob
:
| Name | Value | |-------|-------| | Alice | 5 | | Data1 | val1 | | Data2 | val2 | | Data3 | val3 | | Bob | 10 |
That could be an issue when you consider complex formatting and objects in Excel: inserting rows multiple times could cause strain on formatting, formulas, merged cells, etc., which might be causing the reported corruption.
For testing, I would try to insert all the required rows for the entire data_list
at the beginning: instead of repeatedly inserting rows in the loop, you would make a single insert operation, and then populate the rows, using the def insert_rows(self, idx, amount=1)
method:
sheet.insert_rows(start_row, len(data_list))
That would be:
def generateExcel(data_list):
file_path = 'Finaltemplatetesting.xlsx'
sheet_name = 'Main'
workbook = openpyxl.load_workbook(file_path)
# Select the sheet
sheet = workbook[sheet_name]
# Find the starting cell (top-left cell) of the table
i = 0
for row in sheet.iter_rows(values_only=True):
i = i+1
if 'Project' in row:
header_row = row
break
header_col = header_row.index('Project')
# Initializing the starting row
start_row = i+2
# Insert all required rows at once
sheet.insert_rows(start_row, len(data_list))
# Now fill the data
for data in data_list:
for col_idx, col_name in enumerate(header_row):
cell = sheet.cell(row=start_row, column=header_col + col_idx, value=data.get(col_name))
start_row += 1
workbook.save("Finaltempalterrrrrr.xlsx")
If your data_list
structure is:
[
[{'Value1': '15', 'Project': 'AVTR', 'Score': 'Normal', 'Name': 'Raul', 'Date': '2023-03-16T14:58:33+01:00'}],
[{'Value1': '10', 'Project': 'TRG', 'Score': 'High', 'Name': 'Bob', 'Date': 'N/A'}],
[{'Value1': '12', 'Project': 'AVTR', 'Score': 'High', 'Name': 'Alice', 'Date': 'N/A'}]
]
It would mean it has an additional layer of lists which is not expected based on the initial code. Specifically, each dictionary is wrapped inside a list, making data_list
a list of lists of dictionaries. The initial code expects data_list
to be just a list of dictionaries, like:
data_list = [
{'Value1': '15', 'Project': 'AVTR', 'Score': 'Normal', 'Name': 'Raul', 'Date': '2023-03-16T14:58:33+01:00'},
{'Value1': '10', 'Project': 'TRG', 'Score': 'High', 'Name': 'Bob', 'Date': 'N/A'},
{'Value1': '12', 'Project': 'AVTR', 'Score': 'High', 'Name': 'Alice', 'Date': 'N/A'}
]
If you want to use your original data_list
structure, you would need to adjust the way the data is accessed in the loop. Specifically, you would need to iterate through the inner lists in the data_list
:
# Now fill the data
for inner_list in data_list:
for data in inner_list:
for col_idx, col_name in enumerate(header_row):
cell = sheet.cell(row=start_row, column=header_col + col_idx, value=data.get(col_name))
start_row += 1
After inserting rows for three data items, you would get three empty lines:
| Name | Value | |-------|-------| | Alice | 5 | | | | | | | | | | | Bob | 10 |
Then, the subsequent loop populates these rows:
| Name | Value | |-------|-------| | Alice | 5 | | Data1 | val1 | | Data2 | val2 | | Data3 | val3 | | Bob | 10 |
Instead of manipulating the worksheet's structure multiple times, you only adjust it once. That reduces the risk of inadvertently messing up Excel's internals or its various objects and references.
Check then if the issue persists.
If the issue persists even after simplifying the code to just insert the rows, there might be other complexities within the Excel file causing the corruption. These could be related to formulas, formatting, merged cells, charts, or other objects. Merged cells, especially, can be troublesome when inserting rows/columns. If the table area has merged cells, consider unmerging them and trying the row insertion again.
As a troubleshooting step, create a very simple Excel file with a table in the middle (just as in your actual scenario, but without any other features or data). Run your script on this file to see if the issue persists. If it works, then gradually add the other features from your actual Excel file to this test file, testing at each stage. That will help identify which feature in your original Excel file is causing the issue.
Sometimes, the issue may be with the Excel template itself. Try opening the Excel template (Finaltemplatetesting.xlsx
), then saving it as a new file. Now, use the new file in your script to see if the issue persists.
You can also try alternative approaches, for testing:
Still with openpyxl
:
- Read the entire sheet into a Python data structure.
- Clear the sheet.
- Write back the data, inserting your new rows where needed.
That way, you are not constantly editing the live worksheet and can reduce chances of corruption.
Or, again for testing, you could try pandas
, with openpyxl
backend**:
You would read the Excel file into a DataFrame, manipulate the DataFrame, and then write the DataFrame back to Excel.
For instance:
import pandas as pd
# Load the Excel file into a DataFrame
df = pd.read_excel('Finaltemplatetesting.xlsx', sheet_name='Main', engine='openpyxl')
# Assuming your data_list looks like [{'Project': 'proj1', 'Value': 'value1'}, {'Project': 'proj2', 'Value': 'value2'}]
data_df = pd.DataFrame(data_list)
# Locate the position of 'Project' in the original DataFrame
idx = df[df['Name'] == 'Project'].index[0]
# Split the DataFrame into two parts: above and below the insertion point
df1 = df.iloc[:idx+1]
df2 = df.iloc[idx+1:]
# Concatenate the three DataFrames: df1, data_df, and df2
result = pd.concat([df1, data_df, df2], ignore_index=True)
# Save the result back to Excel
result.to_excel('Finaltempalterrrrrr.xlsx', index=False, engine='openpyxl')