What do you think about Abstract Wikipedia?

flavia@lemmy.blahaj.zone · 2 years ago

What do you think about Abstract Wikipedia?

GenderNeutralBro@lemmy.sdf.org · 2 years ago

Sounds like a great idea. Plain English (or any human language) is not the best way to store information. I’ve certainly noticed mismatches between the data in different languages, or across related articles, because they don’t share the same data source.

Take a look at the article for NYC in English and French and you’ll see a bunch of data points, like total area, that are different. Not huge differences, but any difference at all is enough to demonstrate the problem. There should be one canonical source of data shared by all representations.

Wikipedia is available in hundreds of languages. Why should hundreds of editors need to update the NYC page every time a new census comes out with new population numbers? Ideally, that would require only one change to update every version of the article.

In programming, the convention is to separate the data from the presentation. In this context, plain-English is the presentation, and weaving actual data into it is sub-optimal. Something like population or area size of a city is not language-dependent, and should not be stored in a language-dependent way.

Ultimately, this is about reducing duplicate effort and maintaining data integrity.

schnurrito · 2 years ago

This problem was solved in like 2012 or 2013 with the introduction of Wikidata, but not all language editions have decided to use that.

GenderNeutralBro@lemmy.sdf.org · 2 years ago

How common is it in English? I haven’t checked a lot of articles, but I did check the source of the English and French NYC articles I linked and it seems like all the information is hardcoded, not referenced from Wikidata.

schnurrito · 2 years ago

I think enwiki tends to use Wikidata relatively sparingly.

rottingleaf@lemmy.zip · 2 years ago

but not all language editions have decided to use that.

Some people like their little power they call “meritocracy” to decide what belongs in the article and what doesn’t.

robotica@lemmy.world · 2 years ago

Disclaimer, I didn’t do any research on this, but what would be bad with just AI translating text, given a reliable enough AI? No code required, just plain human speech.

GenderNeutralBro@lemmy.sdf.org · 2 years ago

This will help make machine translation more reliable, ensuring that objective data does not get transformed along with the language presenting that data. It will also make it easier to test and validate the machine translators.

Any automated translations would still need to reviewed. I don’t think we will (or should) see totally automated translations in the near future, but I do think the machine translators could be a very useful tool for editors.

Language models are impressive, but they are not efficient data retrieval systems. Denny Vrandecic, the founder of Wikidata, has a couple insightful videos about this topic.

This one talks about knowledge graphs in general, from 2020: https://www.youtube.com/watch?v=Oips1aW738Q

This one is from last year and is specifically about how you could integrate LLMs with the knowledge graph to greatly increase their accuracy, utility, and efficiency: https://www.youtube.com/watch?v=WqYBx2gB6vA

I highly recommend that second video. He does a great job laying out what LLMs are efficient for, what more conventional methods are efficient for, and how you can integrate them to get the best of both worlds.

robotica@lemmy.world · 2 years ago

Thanks! I’ll come back to this thread once I read more.

PipedLinkBot@feddit.rocks · 2 years ago

Here is an alternative Piped link(s):

https://www.piped.video/watch?v=Oips1aW738Q

https://www.piped.video/watch?v=WqYBx2gB6vA

Piped is a privacy-respecting open-source alternative frontend to YouTube.

I’m open-source; check me out at GitHub.