AI’s become so invasively popular and I’ve seen more evidence of its ineffectiveness than otherwise, but what I dislike most about it is that many run on datasets of stolen data for the sake of profitability à la OpenAI and Deepseek

https://mashable.com/article/openai-chatgpt-class-action-lawsuit https://petapixel.com/2025/01/30/openai-claims-deepseek-took-all-of-its-data-without-consent/

Are there any AI services that run on ethically obtained datasets, like stuff people explicitly consented to submitting (not as some side clause of a T&C), data bought by properly compensating the data’s original owners, or datasets contributed by the service providers themselves?

  • razorcandy
    link
    fedilink
    arrow-up
    5
    ·
    3 个月前

    Some machine learning models are trained on what’s called synthetic data, which is generated specifically for that purpose and mimics real-world data. What I don’t know is how much of the data used is synthetic vs. stolen.