SQL Server nvarchar N前缀varchar字段。

时间:2022-09-21 09:36:30

I am using a SQL Server 2005/2008 Express database. Are there any problems with using the N string prefix (used for nvarchar fields) for varchar fields?

我正在使用SQL Server 2005/2008 Express数据库。对于varchar字段使用N字符串前缀(用于nvarchar字段)有什么问题吗?

e.g. if I have a database field:

例如,如果我有一个数据库字段:

CREATE TABLE [dbo].[posts](
    post_title varchar(30)
)

And then I insert just ascii data but with an N prefix:

然后我只插入ascii数据但是有N个前缀:

INSERT INTO [dbo].[posts] ([post_title]) VALUES (N'My Title');

The problem arises because I want to save UTF-8 characters from a PHP application and I can't currently differentiate whether the field it is being saved to is varchar or nvarchar. So I just want to assume that all are nvarchar given that I will only ever try to save ASCII characters to varchar fields.

问题出现了,因为我想从PHP应用程序中保存UTF-8字符,目前我无法区分保存它的字段是varchar还是nvarchar。假设所有的都是nvarchar假设我只尝试将ASCII字符保存到varchar字段中。

2 个解决方案

#1


2  

If you write strings with the N prefix into a varchar field it will be implicitly converted. There is no other overhead and you can safely assume "everything is nvarchar"

如果您将带N前缀的字符串写入varchar字段,它将被隐式转换。没有其他开销,您可以放心地假设“一切都是nvarchar”

There may be an problem comparing nvarchar variables to varchar columns because of data type precedence. The varchar column will be converted and any indexes won't be used.

由于数据类型的优先级,将nvarchar变量与varchar列进行比较可能会有问题。将转换varchar列,不使用任何索引。

#2


2  

The accepted answer is misleading, but that is due, in part, to the question itself being ambiguous (though probably not intentionally).

公认的答案具有误导性,但这在一定程度上是由于问题本身的模糊性(尽管可能不是故意的)。

Yes, any Unicode string (i.e. literal prefixed with N, or XML and N-prefixed variables) will implicitly convert to 8-bit ASCII when stored into a CHAR / VARCHAR / TEXT (don't use this one!) field. BUT, and this can be a rather important distinction in many cases, only Unicode code points in the range of U+0000 to U+007F (i.e. ASCII values 0 - 127) are guaranteed to convert correctly. Everything from U+0080 (i.e. ASCII value 128) on up may or may not convert, depending on the Code Page implied by the Collation of the field being inserted into. If the Code Page of that Collation does not have a mapping for that symbol, then you get a ? instead.

是的,当存储到CHAR / VARCHAR / TEXT字段(不要使用这个字段!)时,任何Unicode字符串(即以N为前缀的文字,或XML和N为前缀的变量)都将隐式地转换为8位ASCII。但是,在许多情况下,这是一个相当重要的区别,只有U+0000到U+007F(即ASCII值0 - 127)范围内的Unicode代码点才能保证正确转换。从U+0080(即ASCII值128)到up的所有内容都可以或不可以转换,这取决于所插入字段的排序所隐含的代码页。如果排序的代码页没有该符号的映射,那么您将得到?代替。

To find out what the Code Page is exactly, first find the Collation of the field via either of the following two queries:

要找出代码页到底是什么,首先通过以下两个查询找到字段的排序:

SELECT * FROM sys.columns WHERE [object_id] = OBJECT_ID(N'table_name');

-- OR:

EXEC sp_help N'table_name';

Then you can find the Code Page from the Collation, using:

然后您可以从排序中找到代码页,使用:

SELECT COLLATIONPROPERTY('collation_name', 'CodePage');

And then you can find a chart on any one of several sites, based on that code page number, that will show you what is mapped.

然后你可以在任何一个网站上找到一个图表,基于那个代码页号,它会显示你的地图。

And collations are not per-row, they are per-field. So whatever the Collation is for a field determines the character set for non-Unicode fields (i.e. CHAR / VARCHAR / TEXT).

排序不是按行排序,而是按字段排序。因此,字段的排序规则决定了非unicode字段的字符集(即CHAR / VARCHAR / TEXT)。

So the question is: what is meant by the term "ASCII" in the Question? It technically refers to just the 7-bit values (the first 128 ; values 0 - 127), but people often use it to mean anything that can fit into a single byte, which also includes the Extended ASCII values (the second 128 ; values 128 - 255) which are dependent on the Code Page.

所以问题是:"ASCII"这个词是什么意思?它技术上只指7位的值(前128;值0 - 127),但是人们通常用它来表示任何可以装入单个字节的东西,它还包括扩展的ASCII值(第二个128;值128 - 255),这取决于代码页。


Regarding the potential issue(s) surrounding having a VARCHAR column to NVARCHAR variables and literals: indexes will not be ignored, but there is some negative impact, and that varies based on the Collation of the VARCHAR column.

关于VARCHAR列对NVARCHAR变量和文字的潜在问题:索引不会被忽略,但会有一些负面影响,这取决于VARCHAR列的排序。

If the column Collation is a SQL Server Collation (i.e. one that starts with SQL_, such as SQL_Latin1_General_CP1_CI_AS), then you can get an Index Scan, but not a Seek.

如果列排序是一个SQL服务器排序(例如,SQL_Latin1_General_CP1_CI_AS),那么可以进行索引扫描,而不是查找。

But, if the column Collation is a Windows Collation (i.e. one that does not start with SQL_, such as Latin1_General_100_CI_AS), then you can get an Index Seek.

但是,如果列排序规则是一个Windows排序规则(例如,Latin1_General_100_CI_AS),那么您可以获得一个索引查找。

The following test shows this behavior:

下面的测试显示了这种行为:

-- DROP TABLE dbo.VarcharColumnIndex;
CREATE TABLE dbo.VarcharColumnIndex
(
  ID INT IDENTITY(1, 1) NOT NULL CONSTRAINT [PK_VarcharColumnIndex] PRIMARY KEY CLUSTERED,
  SqlServerCollation VARCHAR(50) COLLATE SQL_Latin1_General_CP1_CI_AS,
  WindowsCollation VARCHAR(50) COLLATE Latin1_General_100_CI_AS
);

CREATE NONCLUSTERED INDEX [IX_VarcharColumnIndex_SqlServerCollation]
  ON dbo.VarcharColumnIndex ([SqlServerCollation]);
CREATE NONCLUSTERED INDEX [IX_VarcharColumnIndex_WindowsCollation]
  ON dbo.VarcharColumnIndex ([WindowsCollation]);

INSERT INTO dbo.VarcharColumnIndex ([SqlServerCollation], [WindowsCollation])
  VALUES ('a', 'b');

DECLARE @a NVARCHAR(50) = N'a';
SELECT [SqlServerCollation] FROM dbo.VarcharColumnIndex WHERE [SqlServerCollation] = @a;
-- Index Scan

DECLARE @b NVARCHAR(50) = N'b';
SELECT [WindowsCollation] FROM dbo.VarcharColumnIndex WHERE [WindowsCollation] = @b;
-- Index Seek

#1


2  

If you write strings with the N prefix into a varchar field it will be implicitly converted. There is no other overhead and you can safely assume "everything is nvarchar"

如果您将带N前缀的字符串写入varchar字段,它将被隐式转换。没有其他开销,您可以放心地假设“一切都是nvarchar”

There may be an problem comparing nvarchar variables to varchar columns because of data type precedence. The varchar column will be converted and any indexes won't be used.

由于数据类型的优先级,将nvarchar变量与varchar列进行比较可能会有问题。将转换varchar列,不使用任何索引。

#2


2  

The accepted answer is misleading, but that is due, in part, to the question itself being ambiguous (though probably not intentionally).

公认的答案具有误导性,但这在一定程度上是由于问题本身的模糊性(尽管可能不是故意的)。

Yes, any Unicode string (i.e. literal prefixed with N, or XML and N-prefixed variables) will implicitly convert to 8-bit ASCII when stored into a CHAR / VARCHAR / TEXT (don't use this one!) field. BUT, and this can be a rather important distinction in many cases, only Unicode code points in the range of U+0000 to U+007F (i.e. ASCII values 0 - 127) are guaranteed to convert correctly. Everything from U+0080 (i.e. ASCII value 128) on up may or may not convert, depending on the Code Page implied by the Collation of the field being inserted into. If the Code Page of that Collation does not have a mapping for that symbol, then you get a ? instead.

是的,当存储到CHAR / VARCHAR / TEXT字段(不要使用这个字段!)时,任何Unicode字符串(即以N为前缀的文字,或XML和N为前缀的变量)都将隐式地转换为8位ASCII。但是,在许多情况下,这是一个相当重要的区别,只有U+0000到U+007F(即ASCII值0 - 127)范围内的Unicode代码点才能保证正确转换。从U+0080(即ASCII值128)到up的所有内容都可以或不可以转换,这取决于所插入字段的排序所隐含的代码页。如果排序的代码页没有该符号的映射,那么您将得到?代替。

To find out what the Code Page is exactly, first find the Collation of the field via either of the following two queries:

要找出代码页到底是什么,首先通过以下两个查询找到字段的排序:

SELECT * FROM sys.columns WHERE [object_id] = OBJECT_ID(N'table_name');

-- OR:

EXEC sp_help N'table_name';

Then you can find the Code Page from the Collation, using:

然后您可以从排序中找到代码页,使用:

SELECT COLLATIONPROPERTY('collation_name', 'CodePage');

And then you can find a chart on any one of several sites, based on that code page number, that will show you what is mapped.

然后你可以在任何一个网站上找到一个图表,基于那个代码页号,它会显示你的地图。

And collations are not per-row, they are per-field. So whatever the Collation is for a field determines the character set for non-Unicode fields (i.e. CHAR / VARCHAR / TEXT).

排序不是按行排序,而是按字段排序。因此,字段的排序规则决定了非unicode字段的字符集(即CHAR / VARCHAR / TEXT)。

So the question is: what is meant by the term "ASCII" in the Question? It technically refers to just the 7-bit values (the first 128 ; values 0 - 127), but people often use it to mean anything that can fit into a single byte, which also includes the Extended ASCII values (the second 128 ; values 128 - 255) which are dependent on the Code Page.

所以问题是:"ASCII"这个词是什么意思?它技术上只指7位的值(前128;值0 - 127),但是人们通常用它来表示任何可以装入单个字节的东西,它还包括扩展的ASCII值(第二个128;值128 - 255),这取决于代码页。


Regarding the potential issue(s) surrounding having a VARCHAR column to NVARCHAR variables and literals: indexes will not be ignored, but there is some negative impact, and that varies based on the Collation of the VARCHAR column.

关于VARCHAR列对NVARCHAR变量和文字的潜在问题:索引不会被忽略,但会有一些负面影响,这取决于VARCHAR列的排序。

If the column Collation is a SQL Server Collation (i.e. one that starts with SQL_, such as SQL_Latin1_General_CP1_CI_AS), then you can get an Index Scan, but not a Seek.

如果列排序是一个SQL服务器排序(例如,SQL_Latin1_General_CP1_CI_AS),那么可以进行索引扫描,而不是查找。

But, if the column Collation is a Windows Collation (i.e. one that does not start with SQL_, such as Latin1_General_100_CI_AS), then you can get an Index Seek.

但是,如果列排序规则是一个Windows排序规则(例如,Latin1_General_100_CI_AS),那么您可以获得一个索引查找。

The following test shows this behavior:

下面的测试显示了这种行为:

-- DROP TABLE dbo.VarcharColumnIndex;
CREATE TABLE dbo.VarcharColumnIndex
(
  ID INT IDENTITY(1, 1) NOT NULL CONSTRAINT [PK_VarcharColumnIndex] PRIMARY KEY CLUSTERED,
  SqlServerCollation VARCHAR(50) COLLATE SQL_Latin1_General_CP1_CI_AS,
  WindowsCollation VARCHAR(50) COLLATE Latin1_General_100_CI_AS
);

CREATE NONCLUSTERED INDEX [IX_VarcharColumnIndex_SqlServerCollation]
  ON dbo.VarcharColumnIndex ([SqlServerCollation]);
CREATE NONCLUSTERED INDEX [IX_VarcharColumnIndex_WindowsCollation]
  ON dbo.VarcharColumnIndex ([WindowsCollation]);

INSERT INTO dbo.VarcharColumnIndex ([SqlServerCollation], [WindowsCollation])
  VALUES ('a', 'b');

DECLARE @a NVARCHAR(50) = N'a';
SELECT [SqlServerCollation] FROM dbo.VarcharColumnIndex WHERE [SqlServerCollation] = @a;
-- Index Scan

DECLARE @b NVARCHAR(50) = N'b';
SELECT [WindowsCollation] FROM dbo.VarcharColumnIndex WHERE [WindowsCollation] = @b;
-- Index Seek