Django具有MySQL和UTF-8[副本]

时间:2023-01-06 11:58:09

Possible Duplicate:
How to filter (or replace) unicode characters that would take more than 3 bytes in UTF-8?

可能重复:如何过滤(或替换)将在UTF-8中占用3字节以上的unicode字符?

Background:

背景:

I am using Django with MySQL 5.1 and I am having trouble with 4-byte UTF-8 characters causing fatal errors throughout my web application.

我正在使用Django和MySQL 5.1,我在使用4字节的UTF-8字符时遇到了问题,这在我的web应用程序中导致了致命的错误。

I've used a script to convert all tables and columns in my database to UTF-8 which has fixed most unicode issues, but there is still an issue with 4-byte unicode characters. As noted elsewhere, MySQL 5.1 does not support UTF-8 characters over 3 bytes in length.

我使用了一个脚本将数据库中的所有表和列转换为UTF-8,这解决了大多数unicode问题,但是仍然存在4字节的unicode字符的问题。如前所述,MySQL 5.1不支持长度超过3字节的UTF-8字符。

Whenever I enter a 4-byte unicode character (e.g. ????) into a ModelForm on my Django website the form validates and then an exception similar to the following is raised:

每当我输入一个4字节的unicode字符(e.g.????)成ModelForm Django在我网站表单验证,然后下面是raised:异常相似

Incorrect string value: '\xF0\x9F\x80\x90' for column 'first_name' at row 1

My question:

我的问题:

What is a reasonable way to avoid fatal errors caused by 4-byte UTF-8 characters in a Django web application with a MySQL 5.1 database.

有什么合理的方法可以避免使用MySQL 5.1数据库的Django web应用程序中4字节的UTF-8字符造成的致命错误?

I have considered:

我一直认为:

  1. Selectively disabling MySQL warnings to avoid specifically that error message (not sure whether that is possible yet)
  2. 有选择地禁用MySQL警告以避免错误消息(尚不确定是否可能)
  3. Creating middleware that will look through the request.POST QueryDict and substitute/remove all invalid UTF8 characters
  4. 创建将检查请求的中间件。发布QueryDict并替换/删除所有无效的UTF8字符
  5. Somehow hook/alter/monkey patch the mechanism that outputs SQL queries for Django or for MySQLdb to substitute/remove all invalid UTF-8 characters before the query is executed
  6. 在执行查询之前,以某种方式钩子/alter/monkey修补为Django或为MySQLdb输出SQL查询的机制,以替换/删除所有无效的UTF-8字符

Example middleware to replacing invalid characters (inspired by this SO question):

替换无效字符的示例中间件(受到SO问题的启发):

import re

class MySQLUnicodeFixingMiddleware(object):

    INVALID_UTF8_RE = re.compile(u'[^\u0000-\uD7FF\uE000-\uFFFF]', re.UNICODE)

    def process_request(self, request):
        """Replace 4-byte unicode characters by REPLACEMENT CHARACTER"""
        request.POST = request.POST.copy()
        for key, values in request.POST.iterlists():
            request.POST.setlist(key,
                [self.INVALID_UTF8_RE.sub(u'\uFFFD', v) for v in values])

1 个解决方案

#1


1  

Do you have an option to upgrade mysql? If you do, you can upgrade and set the encoding to utf8mb4.

你有升级mysql的选项吗?如果这样做,您可以升级并将编码设置为utf8mb4。

Assuming that you don't have the option, I see these options for you:

假设你没有这个选择,我为你看到这些选择:

1) Add java script / frontend validations to prevent entry of anything other than 1,2, or 3 byte unicode characters,

1)添加java脚本/前端验证,以防止输入除1、2或3字节的unicode字符之外的任何字符,

2) Supplement that with a cleanup function in your models to strip the data of any 4 byte unicode characters (which would be your option 2 or 3)

2)在模型中添加清理函数以删除任何4字节的unicode字符(这是您的选项2或3)

At the same time, it does look like your users are in fact using 4 byte characters. If there is a business case for using them in your application, you could go to the powers that be and request for an upgrade.

与此同时,看起来您的用户实际上使用了4字节的字符。如果在您的应用程序中有使用它们的业务案例,您可以访问相应的权限并请求升级。

#1


1  

Do you have an option to upgrade mysql? If you do, you can upgrade and set the encoding to utf8mb4.

你有升级mysql的选项吗?如果这样做,您可以升级并将编码设置为utf8mb4。

Assuming that you don't have the option, I see these options for you:

假设你没有这个选择,我为你看到这些选择:

1) Add java script / frontend validations to prevent entry of anything other than 1,2, or 3 byte unicode characters,

1)添加java脚本/前端验证,以防止输入除1、2或3字节的unicode字符之外的任何字符,

2) Supplement that with a cleanup function in your models to strip the data of any 4 byte unicode characters (which would be your option 2 or 3)

2)在模型中添加清理函数以删除任何4字节的unicode字符(这是您的选项2或3)

At the same time, it does look like your users are in fact using 4 byte characters. If there is a business case for using them in your application, you could go to the powers that be and request for an upgrade.

与此同时,看起来您的用户实际上使用了4字节的字符。如果在您的应用程序中有使用它们的业务案例,您可以访问相应的权限并请求升级。