haskell中的原始但有效的grep克隆?

时间:2022-02-21 23:36:41

Whenever I consider learning a new language -- haskell in this case -- I try to hack together a primitive grep clone to see how good the language implementation and/or its libraries are at text processing, because that's a major use case for me.

每当我考虑学习一种新语言 - 在这种情况下是haskell - 我会尝试将一个原始的grep克隆组合在一起,看看语言实现和/或它的库在文本处理方面有多好,因为这对我来说是一个主要的用例。

Inspired by code on the haskell wiki, I came up with the following naive attempt:

受到haskell wiki代码的启发,我想出了以下天真的尝试:

{-# LANGUAGE FlexibleContexts, ExistentialQuantification #-}

import Text.Regex.PCRE
import System.Environment

io :: ([String] -> [String]) -> IO ()
io f = interact (unlines . f . lines)

regexBool :: forall r l .
  (RegexMaker Regex CompOption ExecOption r,
   RegexLike Regex l) =>
  r -> l -> Bool
regexBool r l = l =~ r :: Bool

grep :: forall r l .
  (RegexMaker Regex CompOption ExecOption r, RegexLike Regex l) =>
  r -> [l] -> [l]
grep r = filter (regexBool r)

main :: IO ()
main = do
  argv <- getArgs
  io $ grep $ argv !! 0

This appears to be doing what I want it to, but unfortunately, it's really slow -- about 10 times slower than a python script doing the same thing. I assume it's not the regex library that's at fault here, because it's calling into PCRE which should be plenty fast (switching to Text.Regex.Posix slows things down quite a bit further). So it must be the String implementation, which is instructive from a theoretical point of view but inefficient according to what I've read.

这似乎正在做我想要的,但不幸的是,它真的很慢 - 大约比执行相同的python脚本慢10倍。我认为这不是regex库在这里有问题,因为它调用PCRE应该足够快(切换到Text.Regex.Posix会使事情进一步减慢)。所以它必须是String实现,从理论的角度来看是有启发性的,但根据我所读的内容效率低下。

Is there an alternative to Strings in haskell that's both efficient and convenient (i.e. there's little or no friction when switching to using that instead of Strings) and that fully and correctly handles UTF-8-encoded Unicode, as well as other encodings without too much hassle if possible? Something that everybody uses when doing text processing in haskell but that I just don't know about because I'm a complete beginner?

haskell中的字符串是否有替代方法既高效又方便(即切换到使用它而不是字符串时几乎没有摩擦)并且完全正确地处理UTF-8编码的Unicode,以及其他编码而没有太多麻烦,如果可能的话?在haskell中进行文本处理时每个人都使用的东西,但我只是不知道因为我是一个完全的初学者?

1 个解决方案

#1


1  

It's possible that the slow speed is caused by using the standard library's list type. I've often run into performance problems with it in the past.

使用标准库的列表类型可能导致速度慢。我过去经常遇到性能问题。

It would be a good idea to profile your executable, to see where it spends its time: Tools for analyzing performance of a Haskell program. Profiling Haskell programs is really easy (compile with a switch and execute your program with an added argument, and the report is written to a text file in the current working directory).

分析您的可执行文件,查看它花费时间的位置是一个好主意:用于分析Haskell程序性能的工具。分析Haskell程序非常简单(使用开关编译并使用添加的参数执行程序,并将报告写入当前工作目录中的文本文件)。

As a side note, I use exactly the same approach as you when learning a new language: create something that works. My experience doing this with Haskell is that I can easily gain an order of magnitude or two in performance by profiling and making relatively simple changes (usually a couple of lines).

作为旁注,在学习新语言时,我使用的方法与您完全相同:创建有效的方法。我使用Haskell的经验是,通过分析和进行相对简单的更改(通常是几行),我可以轻松地获得一个或两个数量级的性能。

#1


1  

It's possible that the slow speed is caused by using the standard library's list type. I've often run into performance problems with it in the past.

使用标准库的列表类型可能导致速度慢。我过去经常遇到性能问题。

It would be a good idea to profile your executable, to see where it spends its time: Tools for analyzing performance of a Haskell program. Profiling Haskell programs is really easy (compile with a switch and execute your program with an added argument, and the report is written to a text file in the current working directory).

分析您的可执行文件,查看它花费时间的位置是一个好主意:用于分析Haskell程序性能的工具。分析Haskell程序非常简单(使用开关编译并使用添加的参数执行程序,并将报告写入当前工作目录中的文本文件)。

As a side note, I use exactly the same approach as you when learning a new language: create something that works. My experience doing this with Haskell is that I can easily gain an order of magnitude or two in performance by profiling and making relatively simple changes (usually a couple of lines).

作为旁注,在学习新语言时,我使用的方法与您完全相同:创建有效的方法。我使用Haskell的经验是,通过分析和进行相对简单的更改(通常是几行),我可以轻松地获得一个或两个数量级的性能。