RedditCrossPostBot@geekroom.tech

RedditCrossPostBot@geekroom.tech

https://preview.redd.it/zp9vlha0vmoe1.png?width=1200&format=png&auto=webp&s=25233afd4d8804e65b7d6dff7bab03f33fe6ef53

I want to start a personal project where I scan, OCR and index markdown for old books. This is a book with ALL of Romania’s roads back in 1974. It has tables and maps and all sorts of other interesting historical data points.

I already have some idea of data engineering. I’m a software engineer and I’ve made a project that helps with RAG, search and indexing of markdown files (even very big ones). My problem is the OCR part. Any tips?

Originally posted by u/alexlazar98 on Reddit.com/r/datahoarder

beep boop I’m a bot to seed discussions from Reddit. Upvote or downvote posts like normal, discuss the topics here as well!

Help me with OCR and indexing of old books with tables, data, etc

Help me with OCR and indexing of old books with tables, data, etc