java正则表达式删除csv中不需要的双引号

时间:2021-09-18 06:12:49

I have a csv file that has the following line. as you can see numbers are NOT enclosed in double quotes.

我有一个csv文件,其中包含以下行。如你所见,数字不是用双引号括起来的。

String theLine = "Corp:Industrial","5Nearest",51.93000000,"10:21:29","","","","10:21:29","7/5/2016","PER PHONE CALL WITH SAP, CORRECTING "C","359/317 97 SMRD 96.961 MADV",""

I try to read the above line and split using the regEX

我尝试阅读上面的行并使用regEX进行拆分

String[] tokens = theLine.split(",(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))");

this doesn't split at every comma like I want it. "PER PHONE CALL WITH SAP, CORRECTING "C", is messing it up because it has additional ,(comma) and " (double quote). can some one please help me write a regex that will escape a additional double quote and a comma with in two double quotes.

这不会像我想要的那样在每个逗号中分开。 “每次拨打电话,使用SAP,更正”C“,因为它有额外的,(逗号)和”(双引号),它正在搞乱它。可以请一些人帮我写一个正则表达式,它将逃避额外的双引号和两个双引号的逗号。

I basically want :

我基本上想要:

"Corp:Industrial","5Nearest",51.93000000,"10:21:29","","","","10:21:29","7/5/2016","**PER PHONE CALL WITH SAP CORRECTING C**","359/317 97 SMRD 96.961 MADV",""

1 个解决方案

#1


2  

There are jobs that parsers are much better at than Regular Expressions, and this sort of thing is typically one of them. I'm not saying you can't make it work for you, but ... there are also open-source CSV Parsers you could press into service.

有些工作解析器比正则表达式要好得多,而这类工作通常就是其中之一。我并不是说你不能让它为你工作,但是......还有一些开源的CSV解析器你可以投入使用。

Having said that, your CSV looks suspect to me.

话虽如此,您的CSV看起来对我很怀疑。

"PER PHONE CALL WITH SAP, CORRECTING "C",

That value has three quotes in it -- is it meant to represent a string with only a single quote inside? Or should the C be surrounded by quotes as well as the String?

该值有三个引号 - 它是否表示只包含一个引号的字符串?或者C应该用引号和字符串包围?

Normally if you're going to include a double quote inside a double quote you need a special syntax for it. For CSV, the most common options would be doubling it, or escaping it with a character like a backslash:

通常,如果您要在双引号中包含双引号,则需要使用特殊语法。对于CSV,最常见的选项是将其加倍,或者使用反斜杠等字符将其转义:

"PER PHONE CALL WITH SAP, CORRECTING ""C""",

Or:

"PER PHONE CALL WITH SAP, CORRECTING \"C\"",

None of which will directly change your problem of using Regular Expressions, but once you have well-formed CSV, your odds of parsing it successfully go up.

这些都不会直接改变你使用正则表达式的问题,但是一旦你有了格式良好的CSV,你成功解析它的几率就会上升。

#1


2  

There are jobs that parsers are much better at than Regular Expressions, and this sort of thing is typically one of them. I'm not saying you can't make it work for you, but ... there are also open-source CSV Parsers you could press into service.

有些工作解析器比正则表达式要好得多,而这类工作通常就是其中之一。我并不是说你不能让它为你工作,但是......还有一些开源的CSV解析器你可以投入使用。

Having said that, your CSV looks suspect to me.

话虽如此,您的CSV看起来对我很怀疑。

"PER PHONE CALL WITH SAP, CORRECTING "C",

That value has three quotes in it -- is it meant to represent a string with only a single quote inside? Or should the C be surrounded by quotes as well as the String?

该值有三个引号 - 它是否表示只包含一个引号的字符串?或者C应该用引号和字符串包围?

Normally if you're going to include a double quote inside a double quote you need a special syntax for it. For CSV, the most common options would be doubling it, or escaping it with a character like a backslash:

通常,如果您要在双引号中包含双引号,则需要使用特殊语法。对于CSV,最常见的选项是将其加倍,或者使用反斜杠等字符将其转义:

"PER PHONE CALL WITH SAP, CORRECTING ""C""",

Or:

"PER PHONE CALL WITH SAP, CORRECTING \"C\"",

None of which will directly change your problem of using Regular Expressions, but once you have well-formed CSV, your odds of parsing it successfully go up.

这些都不会直接改变你使用正则表达式的问题,但是一旦你有了格式良好的CSV,你成功解析它的几率就会上升。