tchncs
  • Communities
  • Create Post
  • Create Community
  • heart
    Support Lemmy
  • search
    Search
  • Login
  • Sign Up
☆ Yσɠƚԋσʂ ☆@lemmy.ml to Open Source@lemmy.mlEnglish · 1 year ago

Microsoft open-sourced a Python tool for converting files and office documents to Markdown

github.com

external-link
message-square
20
link
fedilink
  • cross-posted to:
  • opensource@programming.dev
  • hackernews@lemmy.bestiver.se
120
external-link

Microsoft open-sourced a Python tool for converting files and office documents to Markdown

github.com

☆ Yσɠƚԋσʂ ☆@lemmy.ml to Open Source@lemmy.mlEnglish · 1 year ago
message-square
20
link
fedilink
  • cross-posted to:
  • opensource@programming.dev
  • hackernews@lemmy.bestiver.se
GitHub - microsoft/markitdown: Python tool for converting files and office documents to Markdown.
github.com
external-link
Python tool for converting files and office documents to Markdown. - microsoft/markitdown
  • django
    link
    fedilink
    arrow-up
    3
    ·
    1 year ago

    There is nothing special going on. This whole project is just a bunch of python libraries coupled together to a cli tool. It uses the package SpeechRecognition to connect to the google speech recognition api: https://github.com/microsoft/markitdown/blob/main/src/markitdown/_markitdown.py#L691

    Pretty uninteresting and a bit disappointing. Pandoc is a lot more interesting.

    • utopiah@lemmy.ml
      link
      fedilink
      arrow-up
      1
      ·
      1 year ago

      Thanks for the clarification. I checked the code you linked and noticed recognize_google and seems it’s relying on https://github.com/Uberi/speech_recognition which then seems to rely on https://github.com/Uberi/speech_recognition/blob/master/speech_recognition/recognizers/google.py so basically are they using an API, sending all the audio data to Google servers?

      • django
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        Yes, this is how I read it as well. The library would support to use a local model, but they decided to just send the audio data to Google.

        • utopiah@lemmy.ml
          link
          fedilink
          arrow-up
          3
          ·
          1 year ago

          Might open up a GDPR related issue there. I don’t think people using such a library assume they need connectivity nor that their data would be send to a 3rd party.

Open Source@lemmy.ml

opensource@lemmy.ml

Subscribe from Remote Instance

Create a post
You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !opensource@lemmy.ml

All about open source! Feel free to ask questions, and share news, and interesting stuff!

Useful Links

  • Open Source Initiative
  • Free Software Foundation
  • Electronic Frontier Foundation
  • Software Freedom Conservancy
  • It’s FOSS
  • Android FOSS Apps Megathread

Rules

  • Posts must be relevant to the open source ideology
  • No NSFW content
  • No hate speech, bigotry, etc

Related Communities

  • !libre_culture@lemmy.ml
  • !libre_software@lemmy.ml
  • !libre_hardware@lemmy.ml
  • !linux@lemmy.ml
  • !technology@lemmy.ml

Community icon from opensource.org, but we are not affiliated with them.

Visibility: Public
globe

This community can be federated to other instances and be posted/commented in by their users.

  • 127 users / day
  • 3.04K users / week
  • 4.98K users / month
  • 10.6K users / 6 months
  • 726 local subscribers
  • 42.3K subscribers
  • 2.6K Posts
  • 42.2K Comments
  • Modlog
  • mods:
  • Evan@lemmy.ml
  • kevincox@lemmy.ml
  • CrypticCoffee@lemmy.ml
  • Lettuce eat lettuce@lemmy.ml
  • BE: 0.19.13
  • Modlog
  • Legal
  • Instances
  • Docs
  • Code
  • join-lemmy.org