在java中解析多部分/混合多部分/替代体。

时间:2022-05-18 18:17:17

I'm getting emails from a client where they have nested a multipart/alternative message inside a multipart/mixed message. When I get the body of the message it just returns the multipart/alternative level when what I really want is the text/html part which is contained in the multipart/alternative.

我收到一个客户的邮件,他们在一个多部分/混合消息中嵌套了一个多部分/替代消息。当我得到消息体时,它只返回multipart/alternative级别,而我真正想要的是包含在multipart/alternative中的文本/html部分。

I've looked through the javadocs for javax.mail and I can't find a simple way to get the body of a bodypart that is itself a multipart or skip the first multipart/mixed part and go into the multipart/alternative body to read the text/html and text/plain pieces.

我在javadocs中查找过javax。邮件和我找不到一个简单的方法来获得身体的身体部分,它本身是一个多部分或跳过第一个多部分/混合部分,进入多部分/替代体阅读文本/html和文本/普通片段。

The email structure looks like this:

电子邮件结构如下:

...
Content-Type: multipart/mixed; 
    boundary="----=_Part_19487_1145362154.1418138792683"

------=_Part_19487_1145362154.1418138792683
Content-Type: multipart/alternative; 
    boundary="----=_Part_19486_1391901275.1418138792683"

------=_Part_19486_1391901275.1418138792683
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=ISO-8859-1

...

------=_Part_19486_1391901275.1418138792683
Content-Transfer-Encoding: 7bit
Content-Type: text/html; charset=ISO-8859-1

...

------=_Part_19486_1391901275.1418138792683--

------=_Part_19487_1145362154.1418138792683--

This is an outline of the code used to parse the emails:

这是用来分析电子邮件的代码的概要:

Message [] found = fldr.search(searchCondition);           
for (int i = 0; i < found.length; i++) {
    Message m = found[i];
    Object o = m.getContent();
    if (o instanceof Multipart) {
        log.info("**This is a Multipart Message.  ");
        Multipart mp = (Multipart)o;
        log.info("The Multipart message has " + mp.getCount() + " parts.");
        for (int j = 0; j < mp.getCount(); j++) {
            BodyPart b = mp.getBodyPart(j);

            // Loop if the content type is multipart then get the content that is in that part,
            // make it the new container and restart the loop in that part of the message.
            if (b.getContentType().contains("multipart")) {
                mp = (Multipart)b.getContent();
                j = 0;
                continue;
            }

            log.info("This content type is " + b.getContentType());

            if(!b.getContentType().contains("text/html")) {
                continue;
            }

            Object o2 = b.getContent();
            if (o2 instanceof String) {
                <do things with content here>
            }
        }
    }
}

It appears to keep stopping at the second boundary and not parsing anything further. In the case of the above message it stops at boundary="----=_Part_19486_1391901275.1418138792683" and never gets to the text of the message.

它似乎继续停留在第二个边界,没有进一步解析任何内容。对于上面的消息,它会在boundary=“——————_Part_19486_1391901275.1418138792683”处停止,并且永远不会到达消息的文本。

2 个解决方案

#1


2  

In this block :

在这一块:

if (b.getContentType().contains("multipart"))
{
    mp = (Multipart)b.getContent();
    j = 0;
    continue;
}

You set j to 0 and ask the loop to continue, hoping it will start again at zero. But the increment operation j++ will come before and your loop will start at 1, not 0.

将j设为0,并让循环继续,希望它在0处重新开始。但增量运算j++会在之前,你的循环将从1开始,而不是0。

Set j to -1 to solve your issue.

把j设为-1来解决你的问题。

if (b.getContentType().contains("multipart"))
{
    mp = (Multipart)b.getContent();
    j = -1;
    continue;
}

#2


1  

I have tested your code and failed for me as well.

我已经测试了你的代码,也失败了。

In my case, b.getContentType() returns all uppercase characters (e.g. "TEXT/HTML; charset=UTF-8"). So I have converted that to lowercase and it worked.

在我的例子中,b.getContentType()返回所有大写字符(例如。“TEXT / HTML;charset = utf - 8”)。我把它换成了小写的,这样就行了。

String contentType=b.getContentType().toLowerCase(Locale.ENGLISH);

if(!contentType.contains("text/html")) {
   continue;
}

#1


2  

In this block :

在这一块:

if (b.getContentType().contains("multipart"))
{
    mp = (Multipart)b.getContent();
    j = 0;
    continue;
}

You set j to 0 and ask the loop to continue, hoping it will start again at zero. But the increment operation j++ will come before and your loop will start at 1, not 0.

将j设为0,并让循环继续,希望它在0处重新开始。但增量运算j++会在之前,你的循环将从1开始,而不是0。

Set j to -1 to solve your issue.

把j设为-1来解决你的问题。

if (b.getContentType().contains("multipart"))
{
    mp = (Multipart)b.getContent();
    j = -1;
    continue;
}

#2


1  

I have tested your code and failed for me as well.

我已经测试了你的代码,也失败了。

In my case, b.getContentType() returns all uppercase characters (e.g. "TEXT/HTML; charset=UTF-8"). So I have converted that to lowercase and it worked.

在我的例子中,b.getContentType()返回所有大写字符(例如。“TEXT / HTML;charset = utf - 8”)。我把它换成了小写的,这样就行了。

String contentType=b.getContentType().toLowerCase(Locale.ENGLISH);

if(!contentType.contains("text/html")) {
   continue;
}