Unicode编码错误:“ascii”编解码器不能编码字符u'\u2019'

时间:2023-01-06 14:33:35

I'm trying to read html file but when sourcing out for the titles and urls to compare with my keyword 'alist' I get this error Unicode Encode Error: 'ascii' codec can't encode character u'\u2019'. Error in link(http://tinypic.com/r/307w8bl/8)

我正在尝试读取html文件,但是当我在寻找标题和url时,要与我的关键词“alist”进行比较,我得到了这个错误Unicode编码错误:“ascii”编解码器不能对字符u'\u2019进行编码。错误链接(http://tinypic.com/r/307w8bl/8)

Code

代码

for q in soup.find_all('a'):
    title = (q.get('title'))
    url = ((q.get('href')))
    length = len(alist)
    i = 0
    while length > 0:
        if alist[i] in str(title): #checks for keywords from html form from the titles and urls
            r.write(title)
            r.write("\n")
            r.write(url)
            r.write("\n")
        i = i + 1
        length = length -1
doc.close()
r.close()

A little background. alist contains a list of keywords which I would use to compare it with title so as to get what I want. The strange thing is if alist contains 2 or more words, it would run perfectly but if there was only one word, the error as seen above would appear. Thanks in advance.

一个小的背景。alist包含了一个关键字列表,我将用它与标题进行比较,以得到我想要的。奇怪的是,如果一个人包含2个或更多的单词,它会运行得很完美,但是如果只有一个单词,那么上面所看到的错误就会出现。提前谢谢。

3 个解决方案

#1


3  

If your list MUST BE a string list, try to encode title var

如果您的列表必须是一个字符串列表,请尝试对标题var进行编码。

>>> alist=['á'] #asci string
>>> title = u'á' #unicode string
>>> alist[0] in title
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)
>>> title and alist[0] in title.encode('utf-8')
True
>>> 

#2


0  

Presumably, title is a Unicode string that can contain any kind of character; str(title) tries to turn it into a bytestring using the ASCII codec, but that fails because your title contains a non-ASCII character.

可能,标题是一个Unicode字符串,可以包含任何类型的字符;str(title)尝试使用ASCII codec将其转换成一个bytestring,但这失败了,因为您的标题包含一个非ASCII字符。

What are you trying to do? Why do you need to turn the title into a bytestring?

你想做什么?为什么要把标题变成bytestring?

#3


0  

The problem is in str(title). U are trying to convert unicode data to string.

问题在str(标题)中。尝试将unicode数据转换为字符串。

Why u are converting title to string? You can direct access it.

为什么要将标题转换为字符串?你可以直接访问它。

soup.find_all will return you list of strings.

汤。find_all将返回字符串列表。

#1


3  

If your list MUST BE a string list, try to encode title var

如果您的列表必须是一个字符串列表,请尝试对标题var进行编码。

>>> alist=['á'] #asci string
>>> title = u'á' #unicode string
>>> alist[0] in title
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)
>>> title and alist[0] in title.encode('utf-8')
True
>>> 

#2


0  

Presumably, title is a Unicode string that can contain any kind of character; str(title) tries to turn it into a bytestring using the ASCII codec, but that fails because your title contains a non-ASCII character.

可能,标题是一个Unicode字符串,可以包含任何类型的字符;str(title)尝试使用ASCII codec将其转换成一个bytestring,但这失败了,因为您的标题包含一个非ASCII字符。

What are you trying to do? Why do you need to turn the title into a bytestring?

你想做什么?为什么要把标题变成bytestring?

#3


0  

The problem is in str(title). U are trying to convert unicode data to string.

问题在str(标题)中。尝试将unicode数据转换为字符串。

Why u are converting title to string? You can direct access it.

为什么要将标题转换为字符串?你可以直接访问它。

soup.find_all will return you list of strings.

汤。find_all将返回字符串列表。