使用Apache POI识别Ms Word中的项目符号

时间:2022-06-27 16:03:11

I'm trying to make an application which would read a word file (docx) and do some stuff with it. So far, I've done pretty much everything except for to identify bullets. I can find isBold(), isItalic(), isStrike() but I cannot seem to find isBullet()

我正在尝试创建一个应用程序来读取word文件(docx)并用它做一些事情。到目前为止,除了识别子弹之外,我已经做了很多事情。我可以找到isBold(),isItalic(),isStrike()但我似乎无法找到isBullet()

can anyone please tell me how to identify bullets?

任何人都可以告诉我如何识别子弹?

the application is built in Java

该应用程序是用Java构建的

2 个解决方案

#1


There's no isBullet() method, because list styling in Word is quite a lot more complicated than that. You have different indent levels, different styles of bullets, numbered lists and bulleted lists etc

没有isBullet()方法,因为Word中的列表样式比这复杂得多。你有不同的缩进级别,不同风格的项目符号,编号列表和项目符号列表等

Probably the easiest method for you to call for your use case is XWPFParagraph.html.getNumFmt():

可能是您调用用例的最简单方法是XWPFParagraph.html.getNumFmt():

Returns numbering format for this paragraph, eg bullet or lowerLetter. Returns null if this paragraph does not have numeric style.

返回此段落的编号格式,例如bullet或lowerLetter。如果此段落没有数字样式,则返回null。

Call that, and if you get null it isn't a list, and if it is, you'll know if it's bulleted, number, letter etc

调用它,如果你得到null它不是一个列表,如果它是,你会知道它是否是项目符号,数字,字母等

#2


You can use below code for getting list of all the bullets from the word document. I have used apache poi's XWPF api.

您可以使用以下代码从word文档中获取所有项目符号的列表。我用过apache poi的XWPF api。

public class ListTest {

   public static void main(String[] args) {
    String filename = "file_path";
        List<String> paraList = new ArrayList<String>();
        try {
            // is = new FileInputStream(fileName);
            XWPFDocument doc =
               new XWPFDocument(OPCPackage.open(filename));
            List<XWPFParagraph> paragraphList = doc.getParagraphs();
            for(XWPFParagraph para :paragraphList) {
                if((para.getStyle()!=null) && (para.getNumFmt() !=null)) {
                  paraList.add(para.getText());
            }
            for(String bullet :paraList) {
                System.out.println(bullet);
            }
        }
    } catch (Exception e) {
        e.printStackTrace();
    }

}

#1


There's no isBullet() method, because list styling in Word is quite a lot more complicated than that. You have different indent levels, different styles of bullets, numbered lists and bulleted lists etc

没有isBullet()方法,因为Word中的列表样式比这复杂得多。你有不同的缩进级别,不同风格的项目符号,编号列表和项目符号列表等

Probably the easiest method for you to call for your use case is XWPFParagraph.html.getNumFmt():

可能是您调用用例的最简单方法是XWPFParagraph.html.getNumFmt():

Returns numbering format for this paragraph, eg bullet or lowerLetter. Returns null if this paragraph does not have numeric style.

返回此段落的编号格式,例如bullet或lowerLetter。如果此段落没有数字样式,则返回null。

Call that, and if you get null it isn't a list, and if it is, you'll know if it's bulleted, number, letter etc

调用它,如果你得到null它不是一个列表,如果它是,你会知道它是否是项目符号,数字,字母等

#2


You can use below code for getting list of all the bullets from the word document. I have used apache poi's XWPF api.

您可以使用以下代码从word文档中获取所有项目符号的列表。我用过apache poi的XWPF api。

public class ListTest {

   public static void main(String[] args) {
    String filename = "file_path";
        List<String> paraList = new ArrayList<String>();
        try {
            // is = new FileInputStream(fileName);
            XWPFDocument doc =
               new XWPFDocument(OPCPackage.open(filename));
            List<XWPFParagraph> paragraphList = doc.getParagraphs();
            for(XWPFParagraph para :paragraphList) {
                if((para.getStyle()!=null) && (para.getNumFmt() !=null)) {
                  paraList.add(para.getText());
            }
            for(String bullet :paraList) {
                System.out.println(bullet);
            }
        }
    } catch (Exception e) {
        e.printStackTrace();
    }

}