How on planet Earth can I change this pdf to epub? I tried everything I could think of in Calibre but the problem is that the pdf has 2 columns of text per page, plus footnotes on each page. When it converts to epub it just prints each line of each text column as a line of text, which makes it totally lose it’s meaning. Footnotes are also just added as regular text, as part of a supremely incoherent story with aggressive punctuation.

Has anybody been able to solve this before?

  • Edie [it/its, she/her]@hexbear.net
    link
    fedilink
    English
    arrow-up
    2
    ·
    vor 5 Monaten

    Tesseract doesn’t support PDF input, you’ll need some other program like ocrmypdf (which I have used. It uses tesseract), or extract each page to it’s own image (which I have also done but I forget how right now.)


    This user is suspected of being a cat. Please report any suspicious behavior.

    • fort_burp@feddit.nlOP
      link
      fedilink
      English
      arrow-up
      2
      ·
      vor 5 Monaten

      Thanks again! You’re the best :)

      This looks like exactly what I need. After getting the formatting right with k2pdf I can then use ocrmypdf to get it back to text form and then just ctrl + a copy to writer and export as epub, since the pdf size is like 15x the epub size.