.NET压缩XML以存储在SQL Server数据库中

时间:2022-10-17 14:36:32

Currently our .NET application constructs XML data in memory that we persist to a SQL Server database. The XElement object is converted to a string using ToString() and then stored in a varchar(MAX) column in the DB. We dind't want to use the SQL XML datatype as we didn't need any validation and SQL doesn't need to query the XML at any stage.

目前,我们的.NET应用程序在内存中构建XML数据,并将其保存到SQL Server数据库中。使用ToString()将XElement对象转换为字符串,然后将其存储在DB中的varchar(MAX)列中。我们不想使用SQL XML数据类型,因为我们不需要任何验证,SQL不需要在任何阶段查询XML。

Although this implementation works fine, we want to reduce the size of the database by compressing the XML before storing it, and decompressing it after retrieving it. Does anyone have any sample code for compressing an XElement object (and decompressing would be great too)? Also, what changes would I need to make to the data type of the database column so that we can fully take advantage of this compression?

尽管此实现工作正常,但我们希望通过在存储之前压缩XML并在检索数据库后对其进行解压缩来减小数据库的大小。有没有人有任何压缩XElement对象的示例代码(解压缩也会很棒)?此外,我需要对数据库列的数据类型进行哪些更改,以便我们可以充分利用此压缩?

I have investigated again the XML datatype SQL Server 2005 offers, and the validation overhead it offers is too high for us to consider using it. Also, although it does compress the XML somewhat, it doesn't as much compression as the .NET DeflateStream class.

我再次调查了SQL Server 2005提供的XML数据类型,它提供的验证开销太高,我们无法考虑使用它。此外,虽然它确实压缩了XML,但它没有.NET DeflateStream类那么多的压缩。

I have tested the DeflateStream class by writing the XML we use to disk, and then saving the comrpessed version as a new file. The results are great, a 16kb file goes down to a 3kb file, so it's jsut a case of getting this to work in memory and saving the resulting data to the DB. Does anyone have any sample code to do the compression, and should I change the varcahr(MAX) colum to type to maybe varbinary?

我已经通过将我们使用的XML写入磁盘,然后将comrpessed版本保存为新文件来测试DeflateStream类。结果很棒,一个16kb的文件下降到一个3kb的文件,所以这就是让它在内存中工作并将结果数据保存到数据库的情况。有没有人有任何示例代码来进行压缩,我应该将varcahr(MAX)colum更改为type to varbinary?

Thanks in advance

提前致谢

4 个解决方案

#1


3  

This article may help you get a start.

本文可以帮助您开始。

The following snippet can compress a string and return a base-64 coded result:

以下代码段可以压缩字符串并返回base-64编码结果:

public static string Compress(string text)
{
 byte[] buffer = Encoding.UTF8.GetBytes(text);
 MemoryStream ms = new MemoryStream();
 using (GZipStream zip = new GZipStream(ms, CompressionMode.Compress, true))
 {
  zip.Write(buffer, 0, buffer.Length);
 }

 ms.Position = 0;
 MemoryStream outStream = new MemoryStream();

 byte[] compressed = new byte[ms.Length];
 ms.Read(compressed, 0, compressed.Length);

 byte[] gzBuffer = new byte[compressed.Length + 4];
 System.Buffer.BlockCopy(compressed, 0, gzBuffer, 4, compressed.Length);
 System.Buffer.BlockCopy(BitConverter.GetBytes(buffer.Length), 0, gzBuffer, 0, 4);
 return Convert.ToBase64String (gzBuffer);
}

EDIT: As an aside, you may want to use CLOB formats even when storing XML as text because varchars have a very limited length - which XML can often quickly exceed.

编辑:除此之外,您可能希望使用CLOB格式,即使将XML存储为文本也是如此,因为varchars的长度非常有限 - XML通常很快就会超出。

#2


2  

I think you should also re-test the XML column. It stores in binary, I know, not as text. It could be smaller, and may not perform badly, even if you don't actually need the additional features.

我认为你还应该重新测试XML列。它以二进制形式存储,我知道,不是文本。即使您实际上不需要其他功能,它也可能更小,并且可能表现不佳。

#3


1  

Besides possibly compressing the string itself (perhaps using LBushkin's Base64 method above), you probably want to start with making sure you kill all the whitespace. The default XElement.ToString() method saves the element with "indenting". You need to use the ToString(SaveOptions options) method (using SaveOptions.DisableFormatting) if you want to make sure you've just got the tags and data.

除了可能压缩字符串本身(可能使用上面的LBushkin的Base64方法),你可能想要开始确保你杀死所有的空格。默认的XElement.ToString()方法使用“indenting”保存元素。如果要确保获得标记和数据,则需要使用ToString(SaveOptions选项)方法(使用SaveOptions.DisableFormatting)。

#4


-2  

I know you tagged the question SQL 2005, but you should consider upgrading to SQL 2008 and using the wonderful new compression capabilities that come with it. Is out-of-the-box, transparent for your application and will save you a huge implementation/test/support cost.

我知道你标记了SQL 2005的问题,但你应该考虑升级到SQL 2008并使用它附带的精彩的新压缩功能。是开箱即用的,对您的应用程序透明,并将为您节省大量的实施/测试/支持成本。

#1


3  

This article may help you get a start.

本文可以帮助您开始。

The following snippet can compress a string and return a base-64 coded result:

以下代码段可以压缩字符串并返回base-64编码结果:

public static string Compress(string text)
{
 byte[] buffer = Encoding.UTF8.GetBytes(text);
 MemoryStream ms = new MemoryStream();
 using (GZipStream zip = new GZipStream(ms, CompressionMode.Compress, true))
 {
  zip.Write(buffer, 0, buffer.Length);
 }

 ms.Position = 0;
 MemoryStream outStream = new MemoryStream();

 byte[] compressed = new byte[ms.Length];
 ms.Read(compressed, 0, compressed.Length);

 byte[] gzBuffer = new byte[compressed.Length + 4];
 System.Buffer.BlockCopy(compressed, 0, gzBuffer, 4, compressed.Length);
 System.Buffer.BlockCopy(BitConverter.GetBytes(buffer.Length), 0, gzBuffer, 0, 4);
 return Convert.ToBase64String (gzBuffer);
}

EDIT: As an aside, you may want to use CLOB formats even when storing XML as text because varchars have a very limited length - which XML can often quickly exceed.

编辑:除此之外,您可能希望使用CLOB格式,即使将XML存储为文本也是如此,因为varchars的长度非常有限 - XML通常很快就会超出。

#2


2  

I think you should also re-test the XML column. It stores in binary, I know, not as text. It could be smaller, and may not perform badly, even if you don't actually need the additional features.

我认为你还应该重新测试XML列。它以二进制形式存储,我知道,不是文本。即使您实际上不需要其他功能,它也可能更小,并且可能表现不佳。

#3


1  

Besides possibly compressing the string itself (perhaps using LBushkin's Base64 method above), you probably want to start with making sure you kill all the whitespace. The default XElement.ToString() method saves the element with "indenting". You need to use the ToString(SaveOptions options) method (using SaveOptions.DisableFormatting) if you want to make sure you've just got the tags and data.

除了可能压缩字符串本身(可能使用上面的LBushkin的Base64方法),你可能想要开始确保你杀死所有的空格。默认的XElement.ToString()方法使用“indenting”保存元素。如果要确保获得标记和数据,则需要使用ToString(SaveOptions选项)方法(使用SaveOptions.DisableFormatting)。

#4


-2  

I know you tagged the question SQL 2005, but you should consider upgrading to SQL 2008 and using the wonderful new compression capabilities that come with it. Is out-of-the-box, transparent for your application and will save you a huge implementation/test/support cost.

我知道你标记了SQL 2005的问题,但你应该考虑升级到SQL 2008并使用它附带的精彩的新压缩功能。是开箱即用的,对您的应用程序透明,并将为您节省大量的实施/测试/支持成本。