通过Java将HTML内容复制到Excel中

时间:2022-11-19 23:17:00

I am a Java beginner and it would be grateful if you could help to provide some sample codes or guidelines for below situation

我是一个Java初学者,如果您能提供一些示例代码或指导方针,我将不胜感激

I have a large number of html files, each file contains some school's info. Each html file may locate at different hierarchy of folder path but for sure it is always in the lowest level of the folder path. And some folders may have no school html files

我有大量的html文件,每个文件都包含一些学校的信息。每个html文件可能位于文件夹路径的不同层次,但它肯定总是位于文件夹路径的最低层。有些文件夹可能没有学校的html文件

For example

例如

C:\schools\england\london\hampstead\school_A.html [1 html in 1 folder]

伦敦学校C:\ \英国\ \ \ school_A汉普斯特德。html[一个文件夹中的一个html]

C:\schools\england\london\southwark\school_B.html [multiple files in 1 folder]

伦敦学校C:\ \英国\ \萨瑟克区\ school_B。html[一个文件夹中的多个文件]

C:\schools\england\london\southwark\school_C.html

伦敦学校C:\ \英国\ \萨瑟克区\ school_C.html

C:\schools\england\london\southwark\school_D.html

伦敦学校C:\ \英国\ \萨瑟克区\ school_D.html

C:\schools\wales\monmouth\school_E.html [file at different path level]

C:\ \蒙茅斯威尔士\ \ school_E学校。html[不同路径级别的文件]

C:\schools\scotland\aberdeen\aberdeen [folder has no file]

C:\学校\苏格兰\香港仔[资料夹没有档案]

  • HTML CONTENT TO BE COPIED

    要复制的HTML内容

    < h1 id="MainControl_CustomFunctionality_ZoneMain_EmbeddedUserControlPlaceholderControl1_ctl01_schoolName" class="schoolName">**school_A**</h1>
    
    < li id="MainControl_CustomFunctionality_ZoneMain_EmbeddedUserControlPlaceholderControl1_ctl01_boardingTypeContainer" style="list-style: none;"><span>Day/boarding type:</span> <span id="MainControl_CustomFunctionality_ZoneMain_EmbeddedUserControlPlaceholderControl1_ctl01_boardingType" class="infoDetail">**Day, full boarding and weekly boarding**</span></li>
    
    < li id="MainControl_CustomFunctionality_ZoneMain_EmbeddedUserControlPlaceholderControl1_ctl01_boardingFeeContainer" style="list-style: none;"><span>Boarding fees per term:</span> <span id="MainControl_CustomFunctionality_ZoneMain_EmbeddedUserControlPlaceholderControl1_ctl01_boardingFee" class="infoDetail">**&#163;7,317 to &#163;8,370**</span></li>
    
  • EXPECTED RESULTS IN EXCEL TABLE

    期望的结果在EXCEL表格中。

3 Columns Headers: "SCHOOL" "BOARDING TYPE" "BOARDING FEES PER TERM"

3栏标题:“学校”“寄宿类型”“学期住宿费”

Row 1: "**school_A**" "**Day,full boarding and weekly boarding**"   "**£7,317 to £8,370**"

Thank you very much for your help

非常感谢你的帮助。

1 个解决方案

#1


1  

I have some code for this requirement. Please follow this according to your requirement.

我有这个要求的一些代码。请按照你的要求来做。

import java.io.BufferedReader;
import java.io.File;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.text.SimpleDateFormat;
import java.util.Date;

import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.xssf.usermodel.XSSFCellStyle;
import org.apache.poi.xssf.usermodel.XSSFFont;
import org.apache.poi.xssf.usermodel.XSSFRow;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;

public class HTMLToExcel 
{

  public static void main(String[] args) 
  {

     BufferedReader br = null;

     try {

        br = new BufferedReader(new FileReader(new File("D:\\Excels\\log_km_styles1.html")));

        // Create Work book
          XSSFWorkbook xwork = new XSSFWorkbook();

          // Create Spread Sheet
          XSSFSheet xsheet = xwork.createSheet("MyFristSheet");

          //Create Row (Row is inside spread sheet)
          XSSFRow xrow  = null;

          int rowid =0;
          String line ;
          while (( line =br.readLine())!= null) {



            // Create font for applying bold or italic or same thing else on the content
            /*XSSFFont xfont = xwork.createFont();
            xfont.setBoldweight(xfont.BOLDWEIGHT_BOLD);

            XSSFCellStyle xstyle = xwork.createCellStyle();
            xstyle.setFont(xfont);*/

             System.out.println(line);

             String split[] = line.split("<br>");
             Cell cell;
             for (int i = 0; i < split.length; i++) {
                 xrow = xsheet.createRow(rowid);
                 cell = xrow.createCell(2);
                 cell.setCellValue(split[i]);
                 String[] columnSplit = split[i].split("\\W+");
                 int columnCount = 3;
                 for (int j = 0; j < columnSplit.length; j++) {

                     cell = xrow.createCell(columnCount++);
                     cell.setCellValue(columnSplit[j]);
                }
                System.out.println(split[i]);
                rowid++;
            }





          } 


        // create date for adding this to our workbook name like workbookname_date
            Date d1 = new Date();
            SimpleDateFormat sdf = new SimpleDateFormat("dd-MMM-yy");
            String todaysDate = sdf.format(d1);
            System.out.println(sdf.format(d1));
            //Create file system using specific name
            FileOutputStream fout = new FileOutputStream(new File("D:\\Excels\\redaingfromHTMLFile_"+todaysDate+".xlsx"));

            xwork.write(fout);
            fout.close();
            System.out.println("redaingfromHTMLFile_"+todaysDate+".xlsx written successfully" );
     }
     catch (Exception e) {
        e.printStackTrace();
    }
  }
}

Above code converts html file content into Excel file. It will create new file with today's date in the file name. try with this one. I hope it will help you

以上代码将html文件内容转换为Excel文件。它将在文件名中创建具有今天日期的新文件。来试试这个吧。我希望它能对你有所帮助

#1


1  

I have some code for this requirement. Please follow this according to your requirement.

我有这个要求的一些代码。请按照你的要求来做。

import java.io.BufferedReader;
import java.io.File;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.text.SimpleDateFormat;
import java.util.Date;

import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.xssf.usermodel.XSSFCellStyle;
import org.apache.poi.xssf.usermodel.XSSFFont;
import org.apache.poi.xssf.usermodel.XSSFRow;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;

public class HTMLToExcel 
{

  public static void main(String[] args) 
  {

     BufferedReader br = null;

     try {

        br = new BufferedReader(new FileReader(new File("D:\\Excels\\log_km_styles1.html")));

        // Create Work book
          XSSFWorkbook xwork = new XSSFWorkbook();

          // Create Spread Sheet
          XSSFSheet xsheet = xwork.createSheet("MyFristSheet");

          //Create Row (Row is inside spread sheet)
          XSSFRow xrow  = null;

          int rowid =0;
          String line ;
          while (( line =br.readLine())!= null) {



            // Create font for applying bold or italic or same thing else on the content
            /*XSSFFont xfont = xwork.createFont();
            xfont.setBoldweight(xfont.BOLDWEIGHT_BOLD);

            XSSFCellStyle xstyle = xwork.createCellStyle();
            xstyle.setFont(xfont);*/

             System.out.println(line);

             String split[] = line.split("<br>");
             Cell cell;
             for (int i = 0; i < split.length; i++) {
                 xrow = xsheet.createRow(rowid);
                 cell = xrow.createCell(2);
                 cell.setCellValue(split[i]);
                 String[] columnSplit = split[i].split("\\W+");
                 int columnCount = 3;
                 for (int j = 0; j < columnSplit.length; j++) {

                     cell = xrow.createCell(columnCount++);
                     cell.setCellValue(columnSplit[j]);
                }
                System.out.println(split[i]);
                rowid++;
            }





          } 


        // create date for adding this to our workbook name like workbookname_date
            Date d1 = new Date();
            SimpleDateFormat sdf = new SimpleDateFormat("dd-MMM-yy");
            String todaysDate = sdf.format(d1);
            System.out.println(sdf.format(d1));
            //Create file system using specific name
            FileOutputStream fout = new FileOutputStream(new File("D:\\Excels\\redaingfromHTMLFile_"+todaysDate+".xlsx"));

            xwork.write(fout);
            fout.close();
            System.out.println("redaingfromHTMLFile_"+todaysDate+".xlsx written successfully" );
     }
     catch (Exception e) {
        e.printStackTrace();
    }
  }
}

Above code converts html file content into Excel file. It will create new file with today's date in the file name. try with this one. I hope it will help you

以上代码将html文件内容转换为Excel文件。它将在文件名中创建具有今天日期的新文件。来试试这个吧。我希望它能对你有所帮助