Musk posted last night that the platform’s algorithm will soon “promote more informational/entertaining content” in order to “maximize unregretted user-seconds.” In response to Musk changing the X algorithm, people asked Grok what is considered “negative” and were told as reported by user Leah McElrath:
• Criticism of the government
• Commentary about misinformation
• Suggestion the public is being manipulated
• Attacks against powerful people or institutions
it’s possible that the grok model was trained or fine tuned somehow to help with moderation. in that case, it’s possible that things like these bullet points are somewhere up it’s context chain, or in its training data in a manner that it can relatively accurately recall