自动化PDF到文本VB.net

时间:2022-12-31 21:09:22

I'm currently using the below code in a VB.Net console app that takes the contents of a text file and extracts certain info and then exports it to a CSV.

我目前在VB.Net控制台应用程序中使用以下代码,该应用程序获取文本文件的内容并提取某些信息,然后将其导出为CSV。

All seems to work well but the problem is the file originally comes through as a PDF (only option possible) and i have to manually open the file in Adobe and 'Save as Text'.

所有似乎都运行良好,但问题是文件最初是作为PDF(只有选项),我必须手动打开Adobe文件和“另存为文本”。

Is there a way of either automating the conversion of PDF to text file or reading the PDF in place of the text file.

有没有办法自动将PDF转换为文本文件或读取PDF代替文本文件。

Any guidance or options would be appreciated

任何指导或选项将不胜感激

Dim iLine, iEnd, c, iField As Integer
    Dim iSecs, iMax As Long
    Dim sText, sTemp, sSchema As String
    Dim sHotel, sEndDate, sMon, sPLU, sTots, sValue, sDept, sFile, sOutFile, sDesc As String
    Dim tdate As Date
    Dim con As New OleDbConnection("Provider=Microsoft.ACE.OLEDB.12.0; Data Source=C:\temp\TX.accdb;")
    Dim LUse As Boolean


    sHotel = "Unknown Hotel"
    sEndDate = "01/01/2015"
    sMon = "MAR"
    sPLU = ""
    sTots = "0"
    sValue = "0"
    sDept = "Unknown Dept"
    sDesc = ""
    LUse = True
    sTemp = ""
    iField = 0
    sSchema = "Chester"


    'Open input file
    sFile = "c:\temp\input.txt"
    Dim InFile As New System.IO.StreamReader(sFile)

    'Open lookup data table
    con.Open()
    Dim dbAdapter As OleDbDataAdapter = New OleDbDataAdapter( _
      "SELECT * FROM Plookup", con)
    Dim dsTX As DataSet = New DataSet()
    Dim changes As DataTable
    Dim cmdbuilder As OleDbCommandBuilder = New OleDbCommandBuilder(dbAdapter)

    dbAdapter.FillSchema(dsTX, SchemaType.Source, "Plookup")
    dbAdapter.Fill(dsTX, "Plookup")

    Dim rstx As DataTable = dsTX.Tables(0)
    iMax = rstx.Rows.Count
    Dim productrow() As Data.DataRow





    'Open Output file
    iSecs = Timer
    sOutFile = "c:\temp\TX" & Format$(Now, "yymmdd") & Trim$(Str$(iSecs)) & ".csv"
    FileCopy(sFile, "c:\temp\TX" & Format$(Now, "yymmdd") & Trim$(Str$(iSecs)) & ".txt")
    Dim OutFile As New System.IO.StreamWriter(sOutFile)
    'Write header
    OutFile.WriteLine("outlet,dept,epos,tots sold,total price,date of sales")

    iLine = 0
    Do While InFile.Peek() <> -1
        'Read in text
        iLine = iLine + 1

        sText = InFile.ReadLine
        sText = sText.Replace(",", "")
        If Len(sText) > 2 And Len(sText) < 9 Then
            If Mid$(sText, 3, 1) = "-" Then ' Department Name
                sText = sText & Space(9 - Len(sText))
            End If
        End If


        'Process all rows except header row - read data into array
        If Len(sText) > 8 Then
            Select Case Left(sText, 7)

                Case "Consoli"  ' Ignore

                Case "Quanti "  ' Ignore

                Case "Group b"  ' Ignore - but next row is the Hotel Name
                    iLine = iLine + 1
                    sText = InFile.ReadLine
                    sText = sText.Replace(",", "")
                    sHotel = Trim$(Left(sText, 20)) 'The username follows so we may truncate the hotel name

                Case "Date ra" ' End date

                    sEndDate = Mid$(sText, 29, 2) & "/" & Mid$(sText, 32, 2) & "/" & Mid$(sText, 35, 4)
                    tdate = CDate(sEndDate).AddDays(-1)

                    sEndDate = tdate.ToString("dd/MM/yyyy")

                Case Else   'Possible Code

                    If Mid$(sText, 3, 1) = "-" Then ' Department Name
                        sDept = Trim(sText)
                    Else
                        If IsNumeric(Left(sText, 7)) Then 'Got a code 
                            sPLU = Trim(Str(Val(Left(sText, 7))))
                            'We don't know where the description ends as it contains spaces
                            'So best way is to start at the end and work back...
                            iEnd = Len(sText)
                            iField = 0
                            For c = iEnd To 9 Step -1
                                If Not (Mid(sText, c, 1) = " ") Or iField > 10 Then

                                    sTemp = Mid(sText, c, 1) & sTemp

                                Else
                                    iField = iField + 1
                                    If iField = 9 Then
                                        sValue = sTemp

                                    ElseIf iField = 11 Then
                                        sTots = sTemp

                                    End If
                                    sTemp = ""
                                End If

                            Next

                            If iField = 10 Then
                                sTots = Trim(sTemp)
                                sDesc = ""
                            Else
                                sDesc = Trim$(sTemp)
                            End If

                            'lookup code
                            productrow = rstx.Select("FileID = 'Chester' and PLU = '" & sPLU & "'")
                            If productrow.Length = 0 Then ' product not found
                                iMax = iMax + 1
                                rstx.Rows.Add(sSchema, sPLU, sDesc, False)

                                LUse = True
                            Else
                                LUse = Not productrow(0)("Exclude")
                            End If


                            If (Val(sTots) + Val(sValue) > 0) And LUse Then  ' We have a non-zero sale or value and it is not excluded
                                OutFile.WriteLine(sHotel & "," & sDept & "," & sPLU & "," & sTots & "," & sValue & "," & sEndDate)
                            End If
                        End If
                    End If


            End Select
        End If
    Loop

    'dbAdapter.Update(dsTX.Tables(0))
    'Close input / output csv files
    'rstx.Rows.Add("303030", "Another Test", False)
    dbAdapter.UpdateCommand = cmdbuilder.GetUpdateCommand(True)
    dbAdapter.InsertCommand = cmdbuilder.GetInsertCommand(True)
    dbAdapter.DeleteCommand = cmdbuilder.GetDeleteCommand()
    changes = rstx.GetChanges()

    If changes IsNot Nothing Then dbAdapter.Update(changes)


    InFile.Close()
    OutFile.Close()
    con.Close()

1 个解决方案

#1


Try itextSharp. itextSharp is a .NET DLL with the help of which you can extract content from PDF. Click here for reference & sample code(although code is in c#, its just a reference to give you an idea).

试试itextSharp。 itextSharp是一个.NET DLL,借助它可以从PDF中提取内容。点击这里查看参考和示例代码(尽管代码在c#中,它只是一个给你一个想法的参考)。

#1


Try itextSharp. itextSharp is a .NET DLL with the help of which you can extract content from PDF. Click here for reference & sample code(although code is in c#, its just a reference to give you an idea).

试试itextSharp。 itextSharp是一个.NET DLL,借助它可以从PDF中提取内容。点击这里查看参考和示例代码(尽管代码在c#中,它只是一个给你一个想法的参考)。