Skip to content
← BACK TO POSTS

PDFs Are Feedback Traps

4 min read
#tools#pdf#workflow#side-project

The extraction problem

You know the drill. You generate a PDF – an architectural diagram, a draft blog post, a technical spec – and you send it to stakeholders for review. They do what you asked and mark it up with comments.

Generating the PDF is easy. Commenting on it is easy. Getting the feedback back out into something you can actually work from is the part that kills an afternoon.

If you have 40+ comment bubbles, the workflow looks like:

  1. Open the PDF on monitor one.
  2. Open your ticket tracker or Markdown file on monitor two.
  3. Click comment, copy, alt-tab, paste, alt-tab, back to the PDF. Repeat 39 more times.

It's manual, it's error-prone, and you miss things. The feedback ends up trapped in a proprietary layer on top of the document, separate from anywhere you'd actually triage work.

What I wanted wasn't a better PDF viewer. I wanted something that would take a marked-up PDF and give me back a list of tasks.

Why Existing Tools Fail

Before building, I looked for existing tools. The options were terrible.

  • Adobe Acrobat: It can export comments to FDF or XFDF, proprietary formats that no one actually wants to read. Exporting to Word or RTF results in a formatting nightmare.
  • Online Converters: Most "PDF to Text" tools ignore annotations entirely. The ones that don't usually require you to upload sensitive documents to a mysterious server.
  • Python Scripts: You can use PyPDF2, but it requires a dev environment and custom logic for every document structure.

The gap was clear: drag, drop, copy, done – a utility that runs entirely in the browser.

How it's built

Stack: Next.js 14 (static export) :: TypeScript :: Tailwind :: pdfjs-dist.

It all runs in the browser

PDFs often contain sensitive data – contracts, internal memos, unreleased specs. Building this as a client-side app on top of pdfjs-dist means the file never leaves the browser. No server, no upload, no "please trust our processing endpoint." You could cut your wifi and the extraction would still work. That's also why it's fast.

Reconstructing context from geometry

This was the interesting part to build. When you highlight text in a PDF, the file doesn't store the words. It stores coordinates – "user drew a yellow rectangle at [x, y]."

To get your data back, the tool has to perform a geometric intersection:

  1. Extract Geometry: Get the quad points (corners) of every highlight.
  2. Map the Text: Parse the page to get the bounding box of every text item.
  3. Intersect: Run a collision detection loop. If a text item overlaps with the highlight, it belongs to that comment.
  4. Sort: PDF text isn't always stored in reading order. The tool sorts items by Y then X coordinates to reconstruct the sentence naturally.

The upshot is that what looked like a pile of coordinates on disk comes out the other side as readable sentences tied to the right page.

Getting the output somewhere useful

Two export paths:

ActionUse Case
Copy ChecklistPaste into Google Docs (formatted list) or GitHub/Notion (GFM checkboxes).
Copy / Download MarkdownDeep work. Includes page numbers and full context for AI agents or your favorite markdown editor.

The new loop takes seconds: receive the PDF, drop it into pdfcomments.app, and paste the checklist into your Google Doc.

It saves about 15 minutes of mindless copying per document. More importantly, it ensures every comment becomes a tickable box.

Try It

pdfcomments.app

I built this on a snowy Saturday in about the runtime of the new Tron movie. It works for my use case – your mileage may vary. Free and private. View on GitHub.