解码Django和lxml中的问题

时间:2022-09-19 07:28:55

I have a strange problem with lxml when using the deployed version of my Django application. I use lxml to parse another HTML page which I fetch from my server. This works perfectly well on my development server on my own computer, but for some reason it gives me UnicodeDecodeError on the server.

当使用我的Django应用程序的部署版本时,我对lxml有一个奇怪的问题。我使用lxml来解析我从服务器获取的另一个HTML页面。这在我自己的计算机上的开发服务器上运行得非常好,但由于某种原因它在服务器上给了我UnicodeDecodeError。

('utf8', "\x85why hello there!", 0, 1, 'unexpected code byte')

I have made sure that Apache (with mod_python) runs with LANG='en_US.UTF-8'.

我确保Apache(使用mod_python)运行LANG ='en_US.UTF-8'。

I've tried googling for this problem and tried different approaches to decoding the string correctly, but I can't figure it out.

我已经尝试谷歌搜索这个问题并尝试不同的方法来正确解码字符串,但我无法弄明白。

In your answer, you may assume that my string is called hello or something.

在你的回答中,你可以假设我的字符串被称为hello或者其他东西。

3 个解决方案

#1


"\x85why hello there!" is not a utf-8 encoded string. You should try decoding the webpage before passing it to lxml. Check what encoding it uses by looking at the http headers when you fetch the page maybe you find the problem there.

“\ x85为什么你好!”不是utf-8编码的字符串。您应该先尝试解码网页,然后再将其传递给lxml。通过在获取页面时查看http标头来检查它使用的编码,也许您可​​以在那里找到问题。

#2


Doesn't syntax such as u"\x85why hello there!" help?

是不是语法如你“\ x85why你好!”救命?

You may find the following resources from the official Python documentation helpful:

您可以从官方Python文档中找到以下资源:

#3


Since modifying site.py is not an ideal solution try this at the start of your program:

由于修改site.py不是一个理想的解决方案,请在程序开始时尝试:

import sys
reload(sys)
sys.setdefaultencoding("utf-8")

#1


"\x85why hello there!" is not a utf-8 encoded string. You should try decoding the webpage before passing it to lxml. Check what encoding it uses by looking at the http headers when you fetch the page maybe you find the problem there.

“\ x85为什么你好!”不是utf-8编码的字符串。您应该先尝试解码网页,然后再将其传递给lxml。通过在获取页面时查看http标头来检查它使用的编码,也许您可​​以在那里找到问题。

#2


Doesn't syntax such as u"\x85why hello there!" help?

是不是语法如你“\ x85why你好!”救命?

You may find the following resources from the official Python documentation helpful:

您可以从官方Python文档中找到以下资源:

#3


Since modifying site.py is not an ideal solution try this at the start of your program:

由于修改site.py不是一个理想的解决方案,请在程序开始时尝试:

import sys
reload(sys)
sys.setdefaultencoding("utf-8")