在java中使用regex从长字符串中提取特定值或子字符串。

时间:2022-09-13 11:28:49

I have a long string containing different values/strings i want to extract.

我有一个包含我想要提取的不同值/字符串的长字符串。

String info = "ABHom=1.00;AC=2;AF=1.00;AN=2;DP=24;Dels=0.00;FS=0.000;
              HaplotypeScore=0.9947;MLEAC=2;MLEAF=1.00;MQ=53.03;MQ0=0;QD=32.49;
              VQSLOD=2.70; culprit=FS";



Matcher matcher = Pattern.compile("[A][B][h][o][m][=]([0-9]+\\.[0-9]+)").matcher(info);
if (matcher.find()) {
    String someNumberStr = matcher.group(1);
        ABhom = Double.parseDouble(someNumberStr);

Matcher matcher = Pattern.compile("[M][L][E][A][C][=]/([0-9]+)").matcher(info);
if (matcher.find()) {
    String someNumberStr = matcher.group(1);
        MLEAC = Integer.parseInt(someNumberStr);

I'am new to regex. Is there any smarter way to extract the numbers/strings after the equals sign ?

我的小沟新的正则表达式。是否有更聪明的方法来提取等号后的数字/字符串?

I'am thankful for any suggestions!

我感谢你的任何建议!

5 个解决方案

#1


5  

I think what you want to do is to turn your String into a HashMap<String,String>.

我认为你想做的是把你的字符串变成HashMap ,string>

First, you'll need to split your string around semicolons. Then, iterate the array that you get, splitting each entry around the equals sign, and adding the result to the HashMap.

首先,您需要将字符串拆分为分号。然后,迭代得到的数组,将每个条目拆分为等号,并将结果添加到HashMap中。

I suggest you read about the split method of the String class for how to do this, and also read about the HashMap class. Look at http://docs.oracle.com/javase/7/docs/api/java/lang/String.html and http://docs.oracle.com/javase/7/docs/api/java/util/HashMap.html and post again if you need more help.

我建议您阅读String类的split方法,了解如何实现这一点,并阅读HashMap类。如果您需要更多的帮助,请查看http://docs.oracle.com/javase/7/docs/api/java/lang/String.html和http://docs.oracle.com/javase/7/docs/api/java/util/HashMap.html和post。

#2


0  

You can do like this

你可以这样做。

String[] split = info.split(";");
for (String string : split) {
       String[] split2 = string.trim().split("=");
       System.out.println(split2[0] +" :" +split2[1]);
}

#3


0  

You can store them in HashMap as follows:

您可以将它们存储在HashMap中,如下所示:

String[] parts = info.split(";");
Map<String, String> hashMap = new HashMap<String, String>();
for (String s : parts) {
       String[] keyVal = s.trim().split("=");
       hashMap.put(keyVal[0], keyVal[1]);
}

and later on you may use hashMap object to get it's values.

稍后,您可以使用hashMap对象来获取它的值。

#4


0  

    String info = "ABHom=1.00;AC=2;AF=1.00;AN=2;DP=24;Dels=0.00;FS=0.000;"
            + " HaplotypeScore=0.9947;MLEAC=2;MLEAF=1.00;MQ=53.03;MQ0=0;QD=32.49;"
            + "VQSLOD=2.70; culprit=FS";

    Pattern pattern = Pattern.compile("(\\w+)=(\\d+(.\\d+)?)");
    Matcher matcher = pattern.matcher(info);
    while (matcher.find()) {            
        System.out.println("key: "+matcher.group(1) +" value: "+matcher.group(2));
    }

output :

输出:

key: ABHom value: 1.00
key: AC value: 2
key: AF value: 1.00
key: AN value: 2
key: DP value: 24
key: Dels value: 0.00
key: FS value: 0.000
key: HaplotypeScore value: 0.9947
key: MLEAC value: 2
key: MLEAF value: 1.00
key: MQ value: 53.03
key: MQ0 value: 0
key: QD value: 32.49
key: VQSLOD value: 2.70

explanation :

解释:

\\w mean any character include _ \\w+ means array of characters
\\d mean any digit \\d+ means array of digits
? Matches the preceding element zero or one time. For example, ab?c matches only "ac" or "abc".

you said that i want to extract string and numbers , because of this the code above can not extract culprit=FS but if you want to extract all pair you should use this code :

你说我想要提取字符串和数字,因为上面的代码不能提取罪犯=FS但是如果你想提取所有的对你应该使用这段代码:

    Pattern pattern = Pattern.compile("(\\w+)=([^;]+)");
    Matcher matcher = pattern.matcher(info);
    while (matcher.find()) {            
        System.out.println("key: "+matcher.group(1) +" value: "+matcher.group(2));
    }

output :

输出:

key: ABHom value: 1.00
key: AC value: 2
key: AF value: 1.00
key: AN value: 2
key: DP value: 24
key: Dels value: 0.00
key: FS value: 0.000
key: HaplotypeScore value: 0.9947
key: MLEAC value: 2
key: MLEAF value: 1.00
key: MQ value: 53.03
key: MQ0 value: 0
key: QD value: 32.49
key: VQSLOD value: 2.70
key: culprit value: FS

#5


0  

I don't think regex is a good idea. Try info.split(";")[0].split("=")[1] with some extra boundary check.

我不认为regex是个好主意。试着用一些额外的边界检查来区分(“;”)[1]。

#1


5  

I think what you want to do is to turn your String into a HashMap<String,String>.

我认为你想做的是把你的字符串变成HashMap ,string>

First, you'll need to split your string around semicolons. Then, iterate the array that you get, splitting each entry around the equals sign, and adding the result to the HashMap.

首先,您需要将字符串拆分为分号。然后,迭代得到的数组,将每个条目拆分为等号,并将结果添加到HashMap中。

I suggest you read about the split method of the String class for how to do this, and also read about the HashMap class. Look at http://docs.oracle.com/javase/7/docs/api/java/lang/String.html and http://docs.oracle.com/javase/7/docs/api/java/util/HashMap.html and post again if you need more help.

我建议您阅读String类的split方法,了解如何实现这一点,并阅读HashMap类。如果您需要更多的帮助,请查看http://docs.oracle.com/javase/7/docs/api/java/lang/String.html和http://docs.oracle.com/javase/7/docs/api/java/util/HashMap.html和post。

#2


0  

You can do like this

你可以这样做。

String[] split = info.split(";");
for (String string : split) {
       String[] split2 = string.trim().split("=");
       System.out.println(split2[0] +" :" +split2[1]);
}

#3


0  

You can store them in HashMap as follows:

您可以将它们存储在HashMap中,如下所示:

String[] parts = info.split(";");
Map<String, String> hashMap = new HashMap<String, String>();
for (String s : parts) {
       String[] keyVal = s.trim().split("=");
       hashMap.put(keyVal[0], keyVal[1]);
}

and later on you may use hashMap object to get it's values.

稍后,您可以使用hashMap对象来获取它的值。

#4


0  

    String info = "ABHom=1.00;AC=2;AF=1.00;AN=2;DP=24;Dels=0.00;FS=0.000;"
            + " HaplotypeScore=0.9947;MLEAC=2;MLEAF=1.00;MQ=53.03;MQ0=0;QD=32.49;"
            + "VQSLOD=2.70; culprit=FS";

    Pattern pattern = Pattern.compile("(\\w+)=(\\d+(.\\d+)?)");
    Matcher matcher = pattern.matcher(info);
    while (matcher.find()) {            
        System.out.println("key: "+matcher.group(1) +" value: "+matcher.group(2));
    }

output :

输出:

key: ABHom value: 1.00
key: AC value: 2
key: AF value: 1.00
key: AN value: 2
key: DP value: 24
key: Dels value: 0.00
key: FS value: 0.000
key: HaplotypeScore value: 0.9947
key: MLEAC value: 2
key: MLEAF value: 1.00
key: MQ value: 53.03
key: MQ0 value: 0
key: QD value: 32.49
key: VQSLOD value: 2.70

explanation :

解释:

\\w mean any character include _ \\w+ means array of characters
\\d mean any digit \\d+ means array of digits
? Matches the preceding element zero or one time. For example, ab?c matches only "ac" or "abc".

you said that i want to extract string and numbers , because of this the code above can not extract culprit=FS but if you want to extract all pair you should use this code :

你说我想要提取字符串和数字,因为上面的代码不能提取罪犯=FS但是如果你想提取所有的对你应该使用这段代码:

    Pattern pattern = Pattern.compile("(\\w+)=([^;]+)");
    Matcher matcher = pattern.matcher(info);
    while (matcher.find()) {            
        System.out.println("key: "+matcher.group(1) +" value: "+matcher.group(2));
    }

output :

输出:

key: ABHom value: 1.00
key: AC value: 2
key: AF value: 1.00
key: AN value: 2
key: DP value: 24
key: Dels value: 0.00
key: FS value: 0.000
key: HaplotypeScore value: 0.9947
key: MLEAC value: 2
key: MLEAF value: 1.00
key: MQ value: 53.03
key: MQ0 value: 0
key: QD value: 32.49
key: VQSLOD value: 2.70
key: culprit value: FS

#5


0  

I don't think regex is a good idea. Try info.split(";")[0].split("=")[1] with some extra boundary check.

我不认为regex是个好主意。试着用一些额外的边界检查来区分(“;”)[1]。