Python将两个字符串之间的文本提取到Excel中

时间:2022-10-09 20:26:00

I have a text file like this

我有这样的文本文件

blablablabla
blablablabla
Start
Hello
World
End
blablabla

I want to extract the strings between Start and End and write them into an Excel cell. My code thus far looks like this:

我想在Start和End之间提取字符串并将它们写入Excel单元格。到目前为止我的代码看起来像这样:

import xlsxwriter
workbook = xlsxwriter.Workbook("Test1.xlsx")
worksheet = workbook.add_worksheet()

flist = open("TextTest.txt").readlines()

parsing = False
for line in flist:

    if line.startswith("End"):
       parsing = False
    if parsing:
       worksheet.write(1,1,line)
    if line.startswith("Start"):
       parsing = True

workbook.close()

However it returns only an empty workbook. What am I doing wrong?

但是它只返回一个空的工作簿。我究竟做错了什么?

3 个解决方案

#1


1  

I don't have much of a experience with excel stuff in python but you can try openpyxl, I found it much easier to understand.

我对python中的excel内容没有多少经验,但你可以尝试openpyxl,我发现它更容易理解。

Solution to your problem:

解决您的问题:

import openpyxl
wb = openpyxl.Workbook()
destination_filename = "my.xlsx"
ws = wb.active
ws.title = "sheet1"
flist = open("text.txt").readlines()
row = 1
column = 'A'
parsing = False

for i in flist:      

    if i.startswith("End"):
        parsing = False
    if parsing:
        coord = column + str(row)
        ws[coord] = i
        row += 1        
    if i.startswith("Start"):
        parsing = True

wb.save(filename = destination_filename)

Edit(Writing all lines in one cell):

编辑(在一个单元格中写入所有行):

You have to create new variable to which you can add your lines, and at the end you will assign the string variable to cell in worksheet.

您必须创建可以添加行的新变量,最后您将字符串变量分配给工作表中的单元格。

String=""
for i in flist:

    if i.startswith("End"):
        parsing = False    
    if parsing:
        i = i.strip("\n")
        String += str(i) + ","
    if i.startswith("Start"):
        parsing = True

ws['A1'] = String
wb.save(filename = destination_filename)

#2


1  

First of all, you seem to be always writing in the line number 1

首先,你似乎总是写在第1行

Second, the numeration starts at 0

其次,数字从0开始

With these two small changes, this should do what you want :

通过这两个小的改动,这应该做你想要的:

parsing = False
linewrite=0

for line in liste:

    if line.startswith("End"):
       parsing = False
    if parsing:
       worksheet.write(linewrite,0,line)
       print line,
       linewrite+=1
    if line.startswith("Start"):
       parsing = True

workbook.close()

#3


1  

The data is being written to the cell, but one problem is that worksheet.write() will overwrite the contents of the cell, so only the last item written will be present.

数据正被写入单元格,但一个问题是worksheet.write()将覆盖单元格的内容,因此只会出现最后写入的项目。

You can solve this by accumulating the lines between Start and End and then writing them with one worksheet.write():

您可以通过累积Start和End之间的行,然后使用一个worksheet.write()来编写它们来解决这个问题:

import xlsxwriter

workbook = xlsxwriter.Workbook("Test1.xlsx")
worksheet = workbook.add_worksheet()

with open("TextTest.txt") as data:
    lines = []
    for line in data:
        line = line.strip()
        if line == "Start":
           lines = []
        elif line == "End":
            worksheet.write(0, 0, '\n'.join(lines))

workbook.close()

Here lines are accumulated into a list. When an end line is seen the contents of that list are joined with new lines (you could substitute this for another character, e.g. space) and written to the cell.

这里的行被累积到一个列表中。当看到结束行时,该列表的内容将与新行连接(您可以将其替换为另一个字符,例如空格)并写入单元格。

#1


1  

I don't have much of a experience with excel stuff in python but you can try openpyxl, I found it much easier to understand.

我对python中的excel内容没有多少经验,但你可以尝试openpyxl,我发现它更容易理解。

Solution to your problem:

解决您的问题:

import openpyxl
wb = openpyxl.Workbook()
destination_filename = "my.xlsx"
ws = wb.active
ws.title = "sheet1"
flist = open("text.txt").readlines()
row = 1
column = 'A'
parsing = False

for i in flist:      

    if i.startswith("End"):
        parsing = False
    if parsing:
        coord = column + str(row)
        ws[coord] = i
        row += 1        
    if i.startswith("Start"):
        parsing = True

wb.save(filename = destination_filename)

Edit(Writing all lines in one cell):

编辑(在一个单元格中写入所有行):

You have to create new variable to which you can add your lines, and at the end you will assign the string variable to cell in worksheet.

您必须创建可以添加行的新变量,最后您将字符串变量分配给工作表中的单元格。

String=""
for i in flist:

    if i.startswith("End"):
        parsing = False    
    if parsing:
        i = i.strip("\n")
        String += str(i) + ","
    if i.startswith("Start"):
        parsing = True

ws['A1'] = String
wb.save(filename = destination_filename)

#2


1  

First of all, you seem to be always writing in the line number 1

首先,你似乎总是写在第1行

Second, the numeration starts at 0

其次,数字从0开始

With these two small changes, this should do what you want :

通过这两个小的改动,这应该做你想要的:

parsing = False
linewrite=0

for line in liste:

    if line.startswith("End"):
       parsing = False
    if parsing:
       worksheet.write(linewrite,0,line)
       print line,
       linewrite+=1
    if line.startswith("Start"):
       parsing = True

workbook.close()

#3


1  

The data is being written to the cell, but one problem is that worksheet.write() will overwrite the contents of the cell, so only the last item written will be present.

数据正被写入单元格,但一个问题是worksheet.write()将覆盖单元格的内容,因此只会出现最后写入的项目。

You can solve this by accumulating the lines between Start and End and then writing them with one worksheet.write():

您可以通过累积Start和End之间的行,然后使用一个worksheet.write()来编写它们来解决这个问题:

import xlsxwriter

workbook = xlsxwriter.Workbook("Test1.xlsx")
worksheet = workbook.add_worksheet()

with open("TextTest.txt") as data:
    lines = []
    for line in data:
        line = line.strip()
        if line == "Start":
           lines = []
        elif line == "End":
            worksheet.write(0, 0, '\n'.join(lines))

workbook.close()

Here lines are accumulated into a list. When an end line is seen the contents of that list are joined with new lines (you could substitute this for another character, e.g. space) and written to the cell.

这里的行被累积到一个列表中。当看到结束行时,该列表的内容将与新行连接(您可以将其替换为另一个字符,例如空格)并写入单元格。