在SQL Server 2008中管理Unicode

I have SQL Server with collation set to Latin1_General_CI_AS. Now the problem is while importing users into the system duplicate entries make their way into the database for those records who have trailing Hyphen-Minus which appears only in SQL Server window, but vanishes on browsers and notepads and even if I put them under single quotes.

我有SQL Server,其排序规则设置为Latin1_General_CI_AS。现在的问题是在将用户导入系统时,重复的条目会进入数据库,以便那些跟踪Hyphen-Minus的记录只出现在SQL Server窗口中,但在浏览器和笔记本上消失,即使我将它们放在单引号下。

You will notice that first will return two records of same EmailId, but the second will return only one record which have same EmailId in where clause returned by the first script.

您会注意到,第一个将返回两个相同EmailId的记录,但第二个将仅返回一个记录,该记录在第一个脚本返回的where子句中具有相同的EmailId。

When you copy the emails from the first script in notepads or browsers or emails they appear same but when you copy it in SQL Server itself you can see trailing Hyphen-Minus.

当您从记事本或浏览器或电子邮件中复制第一个脚本中的电子邮件时,它们看起来相同,但是当您在SQL Server本身中复制它时,您可以看到尾随的连字符 - 减号。

These users imported into the system because of these unicode which are treated as a unique record by the SQL Server and allowed entry into the system.

这些用户由于这些unicode导入系统,这些用户被SQL Server视为唯一记录并允许进入系统。

How can I distinguish these records and prevent them entering into the system?

如何区分这些记录并阻止它们进入系统?

1 个解决方案

#1

Unicode values should be placed in a NVARCHAR column, which takes 2 bytes per character, in contrast to VARCHAR which only takes 1. If you don't want unicode characters, you should convert the values to VARCHAR but keep in mind that you might lose data, since characters not available in 1-byte representation will be lost. SSMS grid view does a ninja replace and hides some characters that are actually stored in the column, like new lines or tabs.

Unicode值应该放在一个NVARCHAR列中,每个字符需要2个字节,而VARCHAR只需要1个。如果你不想要unicode字符,你应该将值转换为VARCHAR,但要记住你可能会丢失数据,因为1字节表示中不可用的字符将丢失。 SSMS网格视图对忍者进行替换并隐藏实际存储在列中的一些字符,如新行或制表符。

First step would be checking which data type is you Email column. It is probably NVARCHAR. When you write a hard-coded NVARCHAR value you need to place an "N" just before the string like this:

第一步是检查哪种数据类型是您的电子邮件列。这可能是NVARCHAR。当您编写硬编码的NVARCHAR值时,您需要在字符串之前放置一个“N”,如下所示:

EMail = N'myEmail@email.com'

If you want to check the exact contents of a string, you can see its hexadecimal representation and see which leading weird character is has. Try this out for the 2 records with the "same" email:

如果要检查字符串的确切内容,可以查看其十六进制表示,并查看哪个前导奇怪字符。使用“相同”电子邮件尝试使用2条记录:

SELECT convert (varbinary, Email) FROM UserInfo.[User]

Unfortunately the solution would involve cleaning these characters. Casting them to VARCHAR (if they are NVARCHAR) might solve some, but not all, as you could still have for example a TAB character at the start of you value.

不幸的是,解决方案涉及清理这些字符。将它们转换为VARCHAR(如果它们是NVARCHAR)可能会解决一些问题,但不是全部问题,因为您仍然可以使用例如开头值的TAB字符。

You can try to search for them using a LIKE similar to this one (returns all emails that are NOT letters from A to z, numbers, dots or at):

您可以尝试使用类似于此的LIKE搜索它们(返回所有非A到z,数字,点或at的字母的电子邮件):

SELECT U.Email FROM UserInfo.[User] AS U WHERE U.Email LIKE '%[^A-z0-9@.]%'

#1