适用于Haskell的全功能CSV解析器?

时间:2022-11-14 17:03:44

Can anybody recommend a way to parse CSV files with options to:

任何人都可以推荐使用以下选项解析CSV文件的方法:

  • set cells/fields separator
  • 设置单元格/字段分隔符
  • set end of record/row terminator
  • 设置记录/行终止符的结尾
  • set quote-character for fields
  • 为字段设置quote-character
  • support of UTF-8 strings
  • 支持UTF-8字符串
  • ability to write in-memory CSV structure back to a file
  • 能够将内存中的CSV结构写回文件

I did try Text.CSV but it's very simple and lacks most of the above features. Is there some more advanced CSV parsing module or do I have to write it "from scratch" i.e. using Text.ParserCombinators? I do not intend to reinvent a wheel.

我确实尝试过Text.CSV,但它很简单,并且缺少大部分上述功能。是否有一些更高级的CSV解析模块,还是我必须“从头开始”编写它,即使用Text.ParserCombinators?我不打算重新发明*。

Take care.

保重。

5 个解决方案

#1


8  

I can't recommend a ready-to-go, packaged-up CSV parser for Haskell, but I remember that the book Real-World Haskell by Bryan O'Sullivan et al. contains a chapter on Parsec, which the authors demonstrate by creating a CSV parser.

我不能为Haskell推荐一个现成的,打包的CSV解析器,但我记得Bryan O'Sullivan等人写的Real-World Haskell一书。包含有关Parsec的章节,作者通过创建CSV解析器来演示。

The relevant chapter 16: Using Parsec is available online; check the section titled Extended Example: Full CSV Parser.

相关章节16:使用Parsec可在线获取;查看标题为Extended Example:Full CSV Parser的部分。

#2


6  

This is an old thread, but both csv-conduit and cassava have most, if not all -- not sure about re-writing to the file -- of the features you're looking for.

这是一个旧线程,但csv-conduit和cassava都有大部分(如果不是全部) - 不确定重写文件 - 你正在寻找的功能。

#3


4  

A quick search on Hackage finds Data.Spreadsheet, which does have customizable quote and separator.

快速搜索Hackage找到Data.Spreadsheet,它具有可自定义的引用和分隔符。

#4


4  

There is the Data.Csv module on hackage. In case your distribution does not provide a package for it you can install it via cabal, e.g.

hackage上有Data.Csv模块。如果您的发行版没有为其提供包装,您可以通过cabal安装它,例如

$ cabal install cassava

It can read and write (i.e. decode/encode) records from/to CSV files.

它可以从/向CSV文件读取和写入(即解码/编码)记录。

You can set the field separator like this:

您可以像这样设置字段分隔符:

import Data.Csv
import Data.Char -- ord
import qualified Data.ByteString.Lazy.Char8 as B

enc_opts = defaultEncodeOptions {
  encDelimiter = fromIntegral $ ord '\t'
}

write_csv vector = do
  B.putStr $ encodeWith enc_opts vector

Currently, Data.Csv does not offer other encode/decode options. There are function variants for working with a header row. As is, lines are terminated with CRLF, double-quotes are used for quoting and as text-encoding UTF8 is assumed. Double-quotes in values are quoted with a back-slash and quoting is omitted where it is 'not necessary'.

目前,Data.Csv不提供其他编码/解码选项。有一些函数变体用于处理标题行。同样,行以CRLF终止,双引号用于引用,并且假定为文本编码UTF8。值中的双引号用反斜杠引用,并且在“不必要”的地方省略引号。

#5


-1  

Cassava works in memory and is very simple library e.g.

木薯在记忆中起作用并且是非常简单的库,例如

encode [("John" :: Text, 27), ("Jane", 28)]
"John,27\r\nJane,28\r\n"

#1


8  

I can't recommend a ready-to-go, packaged-up CSV parser for Haskell, but I remember that the book Real-World Haskell by Bryan O'Sullivan et al. contains a chapter on Parsec, which the authors demonstrate by creating a CSV parser.

我不能为Haskell推荐一个现成的,打包的CSV解析器,但我记得Bryan O'Sullivan等人写的Real-World Haskell一书。包含有关Parsec的章节,作者通过创建CSV解析器来演示。

The relevant chapter 16: Using Parsec is available online; check the section titled Extended Example: Full CSV Parser.

相关章节16:使用Parsec可在线获取;查看标题为Extended Example:Full CSV Parser的部分。

#2


6  

This is an old thread, but both csv-conduit and cassava have most, if not all -- not sure about re-writing to the file -- of the features you're looking for.

这是一个旧线程,但csv-conduit和cassava都有大部分(如果不是全部) - 不确定重写文件 - 你正在寻找的功能。

#3


4  

A quick search on Hackage finds Data.Spreadsheet, which does have customizable quote and separator.

快速搜索Hackage找到Data.Spreadsheet,它具有可自定义的引用和分隔符。

#4


4  

There is the Data.Csv module on hackage. In case your distribution does not provide a package for it you can install it via cabal, e.g.

hackage上有Data.Csv模块。如果您的发行版没有为其提供包装,您可以通过cabal安装它,例如

$ cabal install cassava

It can read and write (i.e. decode/encode) records from/to CSV files.

它可以从/向CSV文件读取和写入(即解码/编码)记录。

You can set the field separator like this:

您可以像这样设置字段分隔符:

import Data.Csv
import Data.Char -- ord
import qualified Data.ByteString.Lazy.Char8 as B

enc_opts = defaultEncodeOptions {
  encDelimiter = fromIntegral $ ord '\t'
}

write_csv vector = do
  B.putStr $ encodeWith enc_opts vector

Currently, Data.Csv does not offer other encode/decode options. There are function variants for working with a header row. As is, lines are terminated with CRLF, double-quotes are used for quoting and as text-encoding UTF8 is assumed. Double-quotes in values are quoted with a back-slash and quoting is omitted where it is 'not necessary'.

目前,Data.Csv不提供其他编码/解码选项。有一些函数变体用于处理标题行。同样,行以CRLF终止,双引号用于引用,并且假定为文本编码UTF8。值中的双引号用反斜杠引用,并且在“不必要”的地方省略引号。

#5


-1  

Cassava works in memory and is very simple library e.g.

木薯在记忆中起作用并且是非常简单的库,例如

encode [("John" :: Text, 27), ("Jane", 28)]
"John,27\r\nJane,28\r\n"