The single exception to this (which is actually buried fairly deep in the feature list) is the audio transcription tool. I didn’t take a closer look at what is used to perform this, but at least it’s not “just” document conversion like pandoc.
Thanks for the clarification but I’m a bit confused here, like audio transcription, STT, done by e.g. Whisper? If so what’s the use case? When I think of Office documents audio transcription is not something I have in mind.
I’m not completely clear either on how Microsoft have implemented this previously. As I said, I didn’t look very deep into the repository.
If these are indeed other Python projects they piled together, as others suggest, I’d be happy to hear what speech recognition library this might’ve built on.
The single exception to this (which is actually buried fairly deep in the feature list) is the audio transcription tool. I didn’t take a closer look at what is used to perform this, but at least it’s not “just” document conversion like pandoc.
Thanks for the clarification but I’m a bit confused here, like audio transcription, STT, done by e.g. Whisper? If so what’s the use case? When I think of Office documents audio transcription is not something I have in mind.
I’m not completely clear either on how Microsoft have implemented this previously. As I said, I didn’t look very deep into the repository.
If these are indeed other Python projects they piled together, as others suggest, I’d be happy to hear what speech recognition library this might’ve built on.
PS: related, asked on Github too https://github.com/microsoft/markitdown/issues/20#issuecomment-2544630753
You should open a fresh issue for questions like that instead of asking on an unrelated one.