• utopiah@lemmy.ml
    link
    fedilink
    arrow-up
    45
    ·
    edit-2
    1 day ago

    FWIW if you are interested in such tooling consider also soffice and pandoc which have (as far as I can tell) similar features but have been existing for years now and are not related to Microsoft.

    Edit: not related to Microsoft AND Google, seems the transcription aspect (which IMHO is still weird in that context but OK) is done via Google servers, cf https://lemmy.ml/post/23629310/15586865

    • haverholm@kbin.earth
      link
      fedilink
      arrow-up
      5
      ·
      2 days ago

      The single exception to this (which is actually buried fairly deep in the feature list) is the audio transcription tool. I didn’t take a closer look at what is used to perform this, but at least it’s not “just” document conversion like pandoc.

      • utopiah@lemmy.ml
        link
        fedilink
        arrow-up
        3
        ·
        2 days ago

        audio transcription tool

        Thanks for the clarification but I’m a bit confused here, like audio transcription, STT, done by e.g. Whisper? If so what’s the use case? When I think of Office documents audio transcription is not something I have in mind.

    • Max-P@lemmy.max-p.me
      link
      fedilink
      arrow-up
      7
      arrow-down
      1
      ·
      edit-2
      2 days ago

      ~Not really. All the features of that tool are basic functions we’ve had before LibreOffice was still OpenOffice.~

      ~Since this converts to Markdown, it’s inherently a very lossy conversion. What’s hard to pull off is preserve the full formatting when converting to an odt or something.~

      Someone pointed out it doesn’t just convert word documents to Markdown, it can also transcribe and OCR, so I guess it does have some usefulness!

      • davel [he/him]@lemmy.ml
        link
        fedilink
        English
        arrow-up
        9
        arrow-down
        1
        ·
        3 days ago

        In your saying this isn’t useful, you’re making a lot of assumptions about how someone might want to use this.

        • They may not care that it is lossy in the way that it is lossy.
        • They may want a CLI tool instead of a GUI tool.
        • They may want it as a Python library rather than as a stand-alone tool.
        • vort3@lemmy.ml
          link
          fedilink
          arrow-up
          3
          ·
          3 days ago

          I convert from docx to md specifically with the purpose of getting rid of Microsoft formatting aka almost converting to plaintext but preserve at least some structure.

        • utopiah@lemmy.ml
          link
          fedilink
          arrow-up
          2
          ·
          3 days ago

          soffice works as CLI, can be called from Python and has plenty of related tooling, e.g. https://pypi.org/project/unoserver/ so I agree, I’m confused at what’s actually novel and better than that or even dedicated long lasting FLOSS projects like pandoc.

      • django
        link
        fedilink
        English
        arrow-up
        3
        ·
        3 days ago

        I like libreoffice, but converting audio files to markdown must be a pretty recent feature, for I never heard of it before being part of libreoffice.