TSQL md5哈希与C＃.NET md5不同

I've generated an md5 hash as below:

我已经生成了一个md5哈希，如下所示：

DECLARE @varchar varchar(400) 

SET @varchar = 'è'

SELECT CONVERT(VARCHAR(2000), HASHBYTES( 'MD5', @varchar ), 2)

Which outputs:

哪个输出：

785D512BE4316D578E6650613B45E934

However generating an MD5 hash using:

但是使用以下方法生成MD5哈希：

System.Text.Encoding.UTF8.GetBytes("è")

generates:

产生：

0a35e149dbbb2d10d744bf675c7744b1

The encoding in the C# .NET method is set to UTF8 and I had assumed that varchar was also UTF8, any ideas on what I'm doing wrong?

C＃.NET方法中的编码设置为UTF8，我假设varchar也是UTF8，任何关于我做错的想法？

4 个解决方案

#1

If you are dealing with NVARCHAR / NCHAR data (which is stored as UTF-16 Little Endian), then you would use the Unicode encoding, not BigEndianUnicode. In .NET, UTF-16 is called Unicode while other Unicode encodings are referred to by their actual names: UTF7, UTF8, and UTF32. Hence, Unicode by itself is Little Endian as opposed to BigEndianUnicode. UPDATE: Please see the section at the end regarding UCS-2 and Supplementary Characters.

如果您正在处理NVARCHAR / NCHAR数据（存储为UTF-16 Little Endian），那么您将使用Unicode编码，而不是BigEndianUnicode。在.NET中，UTF-16称为Unicode，而其他Unicode编码由其实际名称引用：UTF7，UTF8和UTF32。因此，Unicode本身就是Little Endian而不是BigEndianUnicode。更新：请参阅最后一节有关UCS-2和补充字符的部分。

On the database side:

在数据库方面：

SELECT HASHBYTES('MD5', N'è') AS [HashBytesNVARCHAR]
-- FAC02CD988801F0495D35611223782CF

On the .NET side:

在.NET方面：

System.Text.Encoding.ASCII.GetBytes("è")
// D1457B72C3FB323A2671125AEF3EAB5D

System.Text.Encoding.UTF7.GetBytes("è")
// F63A0999FE759C5054613DDE20346193

System.Text.Encoding.UTF8.GetBytes("è")
// 0A35E149DBBB2D10D744BF675C7744B1

System.Text.Encoding.UTF32.GetBytes("è")
// 86D29922AC56CF022B639187828137F8

System.Text.Encoding.BigEndianUnicode.GetBytes("è")
// 407256AC97E4C5AEBCA825DEB3D2E89C

System.Text.Encoding.Unicode.GetBytes("è")  // this one matches HASHBYTES('MD5', N'è')
// FAC02CD988801F0495D35611223782CF

However, this question pertains to VARCHAR / CHAR data, which is ASCII, and so things are a bit more complicated.

但是，这个问题与VARCHAR / CHAR数据有关，这是ASCII，因此事情有点复杂。

On the database side:

在数据库方面：

SELECT HASHBYTES('MD5', 'è') AS [HashBytesVARCHAR]
-- 785D512BE4316D578E6650613B45E934

We already see the .NET side above. From those hashed values there should be two questions:

我们已经在上面看到了.NET端。从这些散列值中，应该有两个问题：

Why don't any of them match the HASHBYTES value?
为什么它们中的任何一个都不匹配HASHBYTES值？
Why does the "sqlteam.com" article linked in @Eric J.'s answer show that three of them (ASCII, UTF7, and UTF8) all match the HASHBYTES value?
为什么在@Eric J.的答案中链接的“sqlteam.com”文章显示其中三个（ASCII，UTF7和UTF8）都匹配HASHBYTES值？

There is one answer that covers both questions: Code Pages. The test done in the "sqlteam" article used "safe" ASCII characters that are in the 0 - 127 range (in terms of the int / decimal value) that do not vary between Code Pages. But the 128 - 255 range -- where we find the "è" character -- is the Extended set that does vary by Code Page (which makes sense as this is the reason for having Code Pages).

有一个答案涵盖了两个问题：代码页。在“sqlteam”文章中进行的测试使用了“安全”的ASCII字符，这些字符在0到127范围内（就int /十进制值而言）在代码页之间没有变化。但128 - 255范围 - 我们发现“è”字符 - 是扩展集，它确实因代码页而异（这是有意义的，因为这是拥有代码页的原因）。

Now try:

现在尝试：

SELECT HASHBYTES('MD5', 'è' COLLATE SQL_Latin1_General_CP1255_CI_AS) AS [HashBytes]
-- D1457B72C3FB323A2671125AEF3EAB5D

That matches the ASCII hashed value (and again, because the "sqlteam" article / test used values in the 0 - 127 range, they did not see any changes when using COLLATE). Great, now we finally found a way to match VARCHAR / CHAR data. All good?

这与ASCII散列值匹配（同样，因为“sqlteam”文章/测试使用0 - 127范围内的值，他们在使用COLLATE时没有看到任何变化）。太好了，现在我们终于找到了匹配VARCHAR / CHAR数据的方法。都好？

Well, not really. Let's take a look-see at what we were actually hashing:

嗯，不是真的。我们来看看我们实际上是在散列什么：

SELECT 'è' AS [TheChar],
       ASCII('è') AS [TheASCIIvalue],
       'è' COLLATE SQL_Latin1_General_CP1255_CI_AS AS [CharCP1255],
       ASCII('è' COLLATE SQL_Latin1_General_CP1255_CI_AS) AS [TheASCIIvalueCP1255];

Returns:

TheChar TheASCIIvalue   CharCP1255  TheASCIIvalueCP1255
è       232             ?           63

A ? ? Just to verify, run:

一个？？只是为了验证，运行：

SELECT CHAR(63) AS [WhatIs63?];
-- ?

Ah, so Code Page 1255 doesn't have the è character, so it gets translated as everyone's favorite ?. But then why did that match the MD5 hashed value in .NET when using the ASCII encoding? Could it be that we weren't actually matching the hashed value of è, but instead were matching the hashed value of ?:

啊，所以Code Page 1255没有è字符，所以它被翻译为每个人的最爱？但是，为什么在使用ASCII编码时，它与.NET中的MD5哈希值相匹配？难道我们实际上并没有匹配è的散列值，而是匹配散列值？：

SELECT HASHBYTES('MD5', '?') AS [HashBytesVARCHAR]
-- 0xD1457B72C3FB323A2671125AEF3EAB5D

Yup. The true ASCII character set is just the first 128 characters (values 0 - 127). And as we just saw, the è is 232. So, using the ASCII encoding in .NET is not that helpful. Nor was using COLLATE on the T-SQL side.

对。真正的ASCII字符集只是前128个字符（值0到127）。正如我们刚才看到的那样，è是232.因此，在.NET中使用ASCII编码并没有那么有用。也没有在T-SQL端使用COLLATE。

Is it possible to get a better encoding on the .NET side? Yes, by using Encoding.GetEncoding(Int32), which allows for specifying the Code Page. The Code Page to use can be discovered using the following query (use sys.columns when working with a column instead of a literal or variable):

是否有可能在.NET端获得更好的编码？是的，使用Encoding.GetEncoding（Int32），它允许指定代码页。可以使用以下查询发现要使用的代码页（在使用列而不是文字或变量时使用sys.columns）：

SELECT sd.[collation_name],
       COLLATIONPROPERTY(sd.[collation_name], 'CodePage') AS [CodePage]
FROM   sys.databases sd
WHERE  sd.[name] = DB_NAME(); -- replace function with N'{db_name}' if not running in the DB

The query above returns (for me):

上面的查询返回（对我来说）：

Latin1_General_100_CI_AS_SC    1252

So, let's try Code Page 1252:

那么，让我们试试Code Page 1252：

System.Text.Encoding.GetEncoding(1252).GetBytes("è") // Matches HASHBYTES('MD5', 'è')
// 785D512BE4316D578E6650613B45E934

Woo hoo! We have a match for VARCHAR data that uses our default SQL Server collation :). Of course, if the data is coming from a database or field set to a different collation, then GetEncoding(1252) might not work and you will have to find the actual matching Code Page using the query shown above (a Code Page is used across many Collations, so a different Collation does not necessarily imply a different Code Page).

呜啊！我们匹配使用默认SQL Server排序规则:)的VARCHAR数据。当然，如果数据来自数据库或字段设置为不同的排序规则，则GetEncoding（1252）可能不起作用，您必须使用上面显示的查询找到实际匹配的代码页（代码页用于许多校对，所以不同的校对并不一定意味着不同的代码页。

To see what the possible Code Page values are, and what culture / locale they pertain to, please see the list of Code Pages here (list is in the "Remarks" section).

要查看可能的代码页值以及它们所属的文化/区域设置，请在此处查看代码页列表（列表位于“备注”部分）。

Additional info related to what is actually stored in NVARCHAR / NCHAR fields:

有关NVARCHAR / NCHAR字段中实际存储内容的其他信息：

Any UTF-16 character (2 or 4 bytes) can be stored, though the default behavior of the built-in functions assumes that all characters are UCS-2 (2 bytes each), which is a subset of UTF-16. Starting in SQL Server 2012, it is possible to access a set of Windows collations that support the 4 byte characters known as Supplementary Characters. Using one of these Windows collations ending in _SC, either specified for a column or directly in a query, will allow the built-in functions to properly handle the 4 byte characters.

可以存储任何UTF-16字符（2或4个字节），但内置函数的默认行为假定所有字符都是UCS-2（每个2字节），这是UTF-16的子集。从SQL Server 2012开始，可以访问一组支持4字节字符（称为补充字符）的Windows排序规则。使用以_SC结尾的这些Windows排序规则之一，无论是为列指定还是直接在查询中，都将允许内置函数正确处理4字节字符。

-- The database's collation is set to: SQL_Latin1_General_CP1_CI_AS
SELECT  N'????' AS [SupplementaryCharacter],
        LEN(N'????') AS [LEN],
        DATALENGTH(N'????') AS [DATALENGTH],
        UNICODE(N'????') AS [UNICODE],
        LEFT(N'????', 1) AS [LEFT],
        HASHBYTES('MD5', N'????') AS [HASHBYTES];

SELECT  N'????' AS [SupplementaryCharacter],
        LEN(N'????' COLLATE Latin1_General_100_CI_AS_SC) AS [LEN],
        DATALENGTH(N'????' COLLATE Latin1_General_100_CI_AS_SC) AS [DATALENGTH],
        UNICODE(N'????' COLLATE Latin1_General_100_CI_AS_SC) AS [UNICODE],
        LEFT(N'????' COLLATE Latin1_General_100_CI_AS_SC, 1) AS [LEFT],
        HASHBYTES('MD5', N'????' COLLATE Latin1_General_100_CI_AS_SC) AS [HASHBYTES];

Returns:

SupplementaryChar   LEN   DATALENGTH   UNICODE   LEFT   HASHBYTES
????                  2     4             55393    �     0x7A04F43DA81E3150F539C6B99F4B8FA9
????                  1     4            165739    ????     0x7A04F43DA81E3150F539C6B99F4B8FA9

As you can see, neither DATALENGTH nor HASHBYTES are affected. For more information, please see the MSDN page for Collation and Unicode Support (specifically the "Supplementary Characters" section).

如您所见，DATALENGTH和HASHBYTES都不受影响。有关详细信息，请参阅排序规则和Unicode支持的MSDN页面（特别是“补充字符”部分）。

#2

SQL Server uses UCS-2 rather than UTF-8 to encode character data.

SQL Server使用UCS-2而不是UTF-8来编码字符数据。

If you were using an NVarChar field, the following would work:

如果您使用的是NVarChar字段，则以下内容可行：

System.Text.Encoding.Unicode.GetBytes("è"); // Updated per @srutzky's comments

For more information on SQL and C# hashing, see

有关SQL和C＃散列的更多信息，请参阅

http://weblogs.sqlteam.com/mladenp/archive/2009/04/28/Comparing-SQL-Server-HASHBYTES-function-and-.Net-hashing.aspx

#3

I was having the same issue, and as @srutzky comments, what might be happening is that I didn't preceed the query with a capital-N, and I was getting an 8-bit Extended ASCII ( VARCHAR / string not prefixed with capital-N ) instead of a 16-bit UTF-16 Little Endian ( NVARCHAR / string prefixed with capital-N )

我遇到了同样的问题，并且正如@srutzky评论的那样，可能发生的事情是我没有在查询之前使用大写字母N，而我得到的是8位扩展ASCII（VARCHAR /字符串，不带资本前缀） -N）而不是16位UTF-16 Little Endian（NVARCHAR /前缀为大写字母N的字符串）

{Id, UserName, PasswordString, PasswordHashed}

If you do:

如果你这样做：

SELECT TOP 1 CONVERT(char(32),HashBytes('MD5', 'abc123'),2) FROM [Users]

It will output: E99A18C428CB38D5F260853678922E03

它将输出：E99A18C428CB38D5F260853678922E03

But if you do this, having the same password ('abc123'):

但如果你这样做，使用相同的密码（'abc123'）：

SELECT CONVERT(char(32),HashBytes('MD5', [PasswordString]),2) FROM [Users]

It will output: 6E9B3A7620AAF77F362775150977EEB8

它将输出：6E9B3A7620AAF77F362775150977EEB8

What I should have done is:

我应该做的是：

SELECT CONVERT(char(32),HashBytes('MD5', N'abc123'),2) FROM [Users]

That outputs the same result: 6E9B3A7620AAF77F362775150977EEB8

输出相同的结果：6E9B3A7620AAF77F362775150977EEB8

#4

sql server hashbytes allways works like System.Text.Encoding.Unicode on unicode characters like arabic persian ,... if you use Utf8.Unicode Or Ascii.Unicode You will see the diffrence and if you use Utf8.Unicode the return result of sql server and c# will be same

sql server hashbytes总是像System.Text.Encoding.Unicode上的unicode字符一样工作，如阿拉伯语波斯语，...如果你使用Utf8.Unicode或Ascii.Unicode你会看到差异，如果你使用Utf8.Unicode sql的返回结果服务器和c＃将是相同的

#1

On the database side:

在数据库方面：

SELECT HASHBYTES('MD5', N'è') AS [HashBytesNVARCHAR]
-- FAC02CD988801F0495D35611223782CF

On the .NET side:

在.NET方面：

System.Text.Encoding.ASCII.GetBytes("è")
// D1457B72C3FB323A2671125AEF3EAB5D

System.Text.Encoding.UTF7.GetBytes("è")
// F63A0999FE759C5054613DDE20346193

System.Text.Encoding.UTF8.GetBytes("è")
// 0A35E149DBBB2D10D744BF675C7744B1

System.Text.Encoding.UTF32.GetBytes("è")
// 86D29922AC56CF022B639187828137F8

System.Text.Encoding.BigEndianUnicode.GetBytes("è")
// 407256AC97E4C5AEBCA825DEB3D2E89C

System.Text.Encoding.Unicode.GetBytes("è")  // this one matches HASHBYTES('MD5', N'è')
// FAC02CD988801F0495D35611223782CF

However, this question pertains to VARCHAR / CHAR data, which is ASCII, and so things are a bit more complicated.

但是，这个问题与VARCHAR / CHAR数据有关，这是ASCII，因此事情有点复杂。

On the database side:

在数据库方面：

SELECT HASHBYTES('MD5', 'è') AS [HashBytesVARCHAR]
-- 785D512BE4316D578E6650613B45E934

We already see the .NET side above. From those hashed values there should be two questions:

我们已经在上面看到了.NET端。从这些散列值中，应该有两个问题：

Why don't any of them match the HASHBYTES value?
为什么它们中的任何一个都不匹配HASHBYTES值？
Why does the "sqlteam.com" article linked in @Eric J.'s answer show that three of them (ASCII, UTF7, and UTF8) all match the HASHBYTES value?
为什么在@Eric J.的答案中链接的“sqlteam.com”文章显示其中三个（ASCII，UTF7和UTF8）都匹配HASHBYTES值？

Now try:

现在尝试：

SELECT HASHBYTES('MD5', 'è' COLLATE SQL_Latin1_General_CP1255_CI_AS) AS [HashBytes]
-- D1457B72C3FB323A2671125AEF3EAB5D

Well, not really. Let's take a look-see at what we were actually hashing:

嗯，不是真的。我们来看看我们实际上是在散列什么：

SELECT 'è' AS [TheChar],
       ASCII('è') AS [TheASCIIvalue],
       'è' COLLATE SQL_Latin1_General_CP1255_CI_AS AS [CharCP1255],
       ASCII('è' COLLATE SQL_Latin1_General_CP1255_CI_AS) AS [TheASCIIvalueCP1255];

Returns:

TheChar TheASCIIvalue   CharCP1255  TheASCIIvalueCP1255
è       232             ?           63

A ? ? Just to verify, run:

一个？？只是为了验证，运行：

SELECT CHAR(63) AS [WhatIs63?];
-- ?

SELECT HASHBYTES('MD5', '?') AS [HashBytesVARCHAR]
-- 0xD1457B72C3FB323A2671125AEF3EAB5D

SELECT sd.[collation_name],
       COLLATIONPROPERTY(sd.[collation_name], 'CodePage') AS [CodePage]
FROM   sys.databases sd
WHERE  sd.[name] = DB_NAME(); -- replace function with N'{db_name}' if not running in the DB

The query above returns (for me):

上面的查询返回（对我来说）：

Latin1_General_100_CI_AS_SC    1252

So, let's try Code Page 1252:

那么，让我们试试Code Page 1252：

System.Text.Encoding.GetEncoding(1252).GetBytes("è") // Matches HASHBYTES('MD5', 'è')
// 785D512BE4316D578E6650613B45E934

To see what the possible Code Page values are, and what culture / locale they pertain to, please see the list of Code Pages here (list is in the "Remarks" section).

要查看可能的代码页值以及它们所属的文化/区域设置，请在此处查看代码页列表（列表位于“备注”部分）。

Additional info related to what is actually stored in NVARCHAR / NCHAR fields:

有关NVARCHAR / NCHAR字段中实际存储内容的其他信息：

-- The database's collation is set to: SQL_Latin1_General_CP1_CI_AS
SELECT  N'????' AS [SupplementaryCharacter],
        LEN(N'????') AS [LEN],
        DATALENGTH(N'????') AS [DATALENGTH],
        UNICODE(N'????') AS [UNICODE],
        LEFT(N'????', 1) AS [LEFT],
        HASHBYTES('MD5', N'????') AS [HASHBYTES];

SELECT  N'????' AS [SupplementaryCharacter],
        LEN(N'????' COLLATE Latin1_General_100_CI_AS_SC) AS [LEN],
        DATALENGTH(N'????' COLLATE Latin1_General_100_CI_AS_SC) AS [DATALENGTH],
        UNICODE(N'????' COLLATE Latin1_General_100_CI_AS_SC) AS [UNICODE],
        LEFT(N'????' COLLATE Latin1_General_100_CI_AS_SC, 1) AS [LEFT],
        HASHBYTES('MD5', N'????' COLLATE Latin1_General_100_CI_AS_SC) AS [HASHBYTES];

Returns:

SupplementaryChar   LEN   DATALENGTH   UNICODE   LEFT   HASHBYTES
????                  2     4             55393    �     0x7A04F43DA81E3150F539C6B99F4B8FA9
????                  1     4            165739    ????     0x7A04F43DA81E3150F539C6B99F4B8FA9

如您所见，DATALENGTH和HASHBYTES都不受影响。有关详细信息，请参阅排序规则和Unicode支持的MSDN页面（特别是“补充字符”部分）。

#2

SQL Server uses UCS-2 rather than UTF-8 to encode character data.

SQL Server使用UCS-2而不是UTF-8来编码字符数据。

If you were using an NVarChar field, the following would work:

如果您使用的是NVarChar字段，则以下内容可行：

System.Text.Encoding.Unicode.GetBytes("è"); // Updated per @srutzky's comments

For more information on SQL and C# hashing, see

有关SQL和C＃散列的更多信息，请参阅

http://weblogs.sqlteam.com/mladenp/archive/2009/04/28/Comparing-SQL-Server-HASHBYTES-function-and-.Net-hashing.aspx

#3

{Id, UserName, PasswordString, PasswordHashed}

If you do:

如果你这样做：

SELECT TOP 1 CONVERT(char(32),HashBytes('MD5', 'abc123'),2) FROM [Users]

It will output: E99A18C428CB38D5F260853678922E03

它将输出：E99A18C428CB38D5F260853678922E03

But if you do this, having the same password ('abc123'):

但如果你这样做，使用相同的密码（'abc123'）：

SELECT CONVERT(char(32),HashBytes('MD5', [PasswordString]),2) FROM [Users]

It will output: 6E9B3A7620AAF77F362775150977EEB8

它将输出：6E9B3A7620AAF77F362775150977EEB8

What I should have done is:

我应该做的是：

SELECT CONVERT(char(32),HashBytes('MD5', N'abc123'),2) FROM [Users]

That outputs the same result: 6E9B3A7620AAF77F362775150977EEB8

输出相同的结果：6E9B3A7620AAF77F362775150977EEB8

秒客网

TSQL md5哈希与C＃.NET md5不同

4 个解决方案

#1

#2

#3

#4

#1

#2

#3

#4

相关文章