Excel里这段读取pdf文件内容的代码，你一定要收藏好了！

2024-02-07 01:33| 来源: 网络整理| 查看: 265

前段时间，我发了一篇关于《pdf里的表格数据也能轻松汇总了！》的文章，其中主要介绍了用Power BI直接汇总pdf文件中的数据的方法。

但是，有很多朋友还是没有用Power BI的，更多的还是在用Excel，而且，更多的时候也是将pdf的数据放入Excel然后用到各种地方，而不是直接用power BI读取而做分析。

可惜的是，在Excel里，没有直接接入pdf数据源的选项，那该怎么办？

这时，你当然可以先用power BI将数据汇总后，再导出Excel，但在很多时候，从pdf获取数据，往往是一件需要不断重复的工作，所以，我们考虑自动化，那该怎么办？

——当然是VBA！当然，我不会去写这个代码，因为已经有大神（本代码是从ExcelHome中搜索到的，遗憾的是最早的出处没有找到）为我们准备好了！

' 通用函数：将一个pdf文件内容读取到Excel工作表 ' 提取pdf内容的工作表名为：PDF内容 ' 若“PDF内容”工作表已存在，将删除后重建 Sub Imp_Into_XL(PDF_File As String) Dim AC_PD As Acrobat.AcroPDDoc Dim AC_Hi As Acrobat.AcroHiliteList Dim AC_PG As Acrobat.AcroPDPage Dim AC_PGTxt As Acrobat.AcroPDTextSelect Dim WS_PDF As Worksheet Dim RW_Ct As Long Dim Col_Num As Integer Dim Li_Row As Long Dim Yes_Fir As Boolean Li_Row = Rows.Count Dim Ct_Page As Long Dim i As Long, j As Long, k As Long Dim T_Str As String Dim Hld_Txt As Variant RW_Ct = 1 Col_Num = 1 'Application.ScreenUpdating = False Set AC_PD = New Acrobat.AcroPDDoc Set AC_Hi = New Acrobat.AcroHiliteList AC_Hi.Add 0, 32767 With AC_PD .Open PDF_File Ct_Page = .GetNumPages If Ct_Page = -1 Then MsgBox "请确认PDF文件 '" & PDF_File & "'" .Close GoTo h_end End If For Each WS_PDF In Sheets Application.DisplayAlerts = False If WS_PDF.Name = "PDF内容" Then WS_PDF.Delete Application.DisplayAlerts = True Next Set WS_PDF = Worksheets.Add(, Worksheets(Sheets.Count)) WS_PDF.Name = "PDF内容" For i = 1 To Ct_Page T_Str = "" Set AC_PG = .AcquirePage(i - 1) Set AC_PGTxt = AC_PG.CreateWordHilite(AC_Hi) If Not AC_PGTxt Is Nothing Then With AC_PGTxt For j = 0 To .GetNumText - 1 T_Str = T_Str & .GetText(j) Next j End With End If With WS_PDF If T_Str "" Then RW_Ct = RW_Ct + 1 .Cells(RW_Ct, Col_Num).Value = PDF_File Hld_Txt = Split(T_Str, vbCrLf) Yes_Fir = True For k = 0 To UBound(Hld_Txt) RW_Ct = RW_Ct + 1 If Yes_Fir Then RW_Ct = RW_Ct + 1 .Cells(RW_Ct, Col_Num).Value = "第" & i & "页" RW_Ct = RW_Ct + 2 Yes_Fir = False End If If RW_Ct > Li_Row Then RW_Ct = 1 Col_Num = Col_Num + 1 End If T_Str = CStr(Hld_Txt(k)) If Left(T_Str, 1) = "=" Then T_Str = "'" & T_Str .Cells(RW_Ct, Col_Num).Value = T_Str Next k Else RW_Ct = RW_Ct + 1 .Cells(RW_Ct, Col_Num).Value = "页面无文字 " & i RW_Ct = RW_Ct + 1 End If End With Next i .Close End With 'Application.ScreenUpdating = True 'MsgBox "完成" h_end: Set WS_PDF = Nothing Set AC_PGTxt = Nothing Set AC_PG = Nothing Set AC_Hi = Nothing Set AC_PD = Nothing End Sub

当然，仅有代码是不行的，因为Excel本身并不能真的解析pdf文件，还需要通过pdf的专业库（加载项）来完成，因此，需要在VBA的工具-引用中勾选相应的选项（因不同电脑安装的pdf文件编辑或阅读工具不一样，相应的库及版本可能有所不同）：

一切准备就绪，导入pdf就很简单了。

【本文地址】

公司简介

联系我们