Skip to content
epitometool

PDF to Excel

PDF tools

Extract tables from a PDF into a multi-sheet .xlsx workbook — locally.

Updated

Up to 200 MB. PDF must already contain a text layer (i.e. selectable text). For scans, use /tools/pdf-ocr first.

  • Vpaste PDF

Quick start

How to convert a PDF table to Excel

Extract tables from a PDF into an .xlsx workbook, entirely in your browser.

  1. Step 1
    Drop or pick a PDF

    Drag the PDF onto the drop zone, click to pick it, or paste from the clipboard. The file stays on your device.

  2. Step 2
    Pick layout and scope

    Stacked single sheet (good for tables that span pages) or one sheet per page. Optionally restrict to a page range like 1-3,5,7-9.

  3. Step 3
    Convert and download

    Hit Convert to XLSX. pdfjs reads the text positions in your browser, rows are clustered, and JSZip assembles the .xlsx workbook. Download <basename>.xlsx.

In-depth guide

Convert PDF tables to Excel (.xlsx) in your browser

This tool reads the text layer of a PDF and tries to reconstruct the page as a spreadsheet — useful for pulling tables out of bank statements, invoices, financial reports, exam result sheets, or anything else originally laid out as a grid. The PDF parser is Mozilla's pdfjs-dist; the .xlsx writer is a minimal Open XML SpreadsheetML emitter that runs entirely in your tab via JSZip. The PDF never leaves your browser.

How extraction works

PDFs store text as a stream of positioned characters — each glyph carries its x/y coordinates on the page, but the file has no notion of "table cells". This tool reconstructs the table in three passes:

  1. Read the page text. pdfjs-dist returns every text item with its position and width on the page.
  2. Group items into rows. Items whose y-coordinates are within a few points of each other are treated as the same row, then sorted top-to-bottom.
  3. Split each row into cells. Within a row we scan left-to-right; if there's a horizontal gap wider than ~8 PDF points we start a new cell, otherwise we concatenate the text.

The result is one .xlsx workbook with either a single sheet (stacked) or one sheet per page (separate), your choice.

What works well — and what doesn't

Works well: bank statements, invoices, tax forms, exam mark sheets, simple data tables, anything originally exported from a spreadsheet. The rows are crisp, column gaps are obvious, and there's a clear text layer.

Tricky: articles laid out in multiple columns (e.g. magazines), tables with merged cells or rotated text, scanned pages without an OCR layer, and forms where fields aren't aligned to a grid. For these cases, expect to do some clean-up in Excel after import — or pre-process the PDF through /tools/pdf-ocr to add a text layer.

Single sheet vs sheet-per-page

Two modes are offered for the workbook layout:

  • All pages → one sheet: the rows from every selected page are concatenated into one worksheet, with an empty separator row between pages. Use this when a single table runs across multiple pages.
  • One sheet per page: each selected page becomes its own worksheet named "Page 1", "Page 2" and so on. Use this when each page has an independent table.

The page range picker lets you skip cover pages, indices and other non-table content — try ranges like 3-15 or 2-4,7,9.

Privacy & safety

Both the PDF parsing (pdfjs-dist) and the .xlsx assembly (JSZip + a hand-written Open XML SpreadsheetML emitter) run locally in your browser. There's no upload step, no temporary server-side copy, and no analytics that ship document contents. Verify in DevTools → Network: the page should make zero outbound requests for your file during extraction.

When to use it vs alternatives

Use this tool when you need a fast, one-off PDF task and want the document to stay in your browser. Desktop editors or command-line tools are better for heavily encrypted files, regulated review workflows, or very large batch jobs that need repeatable automation.

Common pitfalls

  • Password-protected, digitally signed, or archival PDFs may need a specialist workflow before editing.
  • Large scans can use a lot of memory, especially on phones or older laptops.
  • Check the downloaded file before replacing the original, because compression, OCR, or conversion can change visual details.

Frequently asked questions

How accurate is the table extraction?

Very accurate for PDFs with a clean tabular layout (consistent row spacing, clear column separation). Less accurate for free-form text, multi-column articles, merged cells, or scans without an OCR layer. The tool reads the text objects from the PDF's content stream — if there's no text layer (scanned image), run the file through /tools/pdf-ocr first.

What's the difference between the two output modes?

"All pages → one sheet" stacks every selected page into a single sheet, with an empty separator row between pages. Use this when a single table spans multiple pages. "One sheet per page" creates a separate worksheet for each page, named Page 1, Page 2, etc. Use this when each page has its own unrelated table.

Is anything uploaded?

No. The PDF is parsed by pdfjs-dist in your browser, rows are clustered locally, and the .xlsx file is assembled with JSZip inside the same tab. Open DevTools → Network during extraction — zero outbound requests for the file.

What file format is the output?

An Open XML SpreadsheetML workbook (.xlsx) — the same format Excel uses natively. It opens cleanly in Microsoft Excel, Google Sheets, Apple Numbers, and LibreOffice Calc.

Why are some cells split or joined incorrectly?

Because PDFs have no notion of "table cells" — they store text with positions, and we infer rows and columns from gaps. If items overlap horizontally, they're joined into one cell; if a row has unusually variable column positions, items may land in the wrong column. Edit the result in Excel for cosmetic cleanup; or report a stubborn pattern via /contact so we can tune the heuristic.

Does this work on scanned PDFs?

Only after OCR. A scanned page is an image — there's no text to extract. Run /tools/pdf-ocr first to add a text layer, then bring the result here. Note that OCR introduces some character errors so the resulting table may need manual checking.

Can I extract a specific page range?

Yes. Choose "Selected pages" in the Scope picker and enter a comma-separated range like 1-3,5,7-9. Useful for pulling tables out of a large report without converting the whole document.

Are formulas preserved?

No — PDFs don't store formulas, only the visible numbers and text. Values come across as plain inline strings; Excel auto-detects numeric cells when you re-type a cell or use Data → Text to Columns. If you need original formulas, ask for the source spreadsheet.

Keep exploring

More tools you'll like

Hand-picked utilities that pair well with the one you're on — all free, client-side, and zero-signup.