This is how the censorship works in DeepSeek

jerryh100@lemmy.world · 5 days ago

This is how the censorship works in DeepSeek

ArchRecord@lemm.ee · 4 days ago

I’m running the 1.5b distilled version locally and it seems pretty heavily censored at the weights level to me.

Swedneck · 12 hours ago

i wouldn’t say it’s heavily censored, if you outright ask it a couple times it will go ahead and talk about things in a mostly objective manner, though with a palpable air of a PR person trying to do damage control.

ArchRecord@lemm.ee · 3 hours ago

The response from the LLM I showed in my reply is generally the same any time you ask almost anything negative about the CCP, regardless of the possible context. It almost always starts with the exact words “The Chinese Communist Party has always adhered to a people-centered development philosophy,” a heavily pre-trained response that wouldn’t show up if it was simply generally biased from, say, training data. (and sometimes just does the “I can’t answer that” response)

It NEVER puts anything in the <think> brackets you can see above if the question is even slightly possibly negative about the CCP, which it does with any other prompt. (See below, asking if cats or dogs are better, and it generating about 4,600 characters of “thoughts” on the matter before even giving the actual response.

Versus asking “Has China ever done anything bad?”

Granted, this seems to sometimes apply to other countries, such as the USA too:

But in other cases, it explicitly will think about the USA for 2,300 characters, but refuse to answer if the exact same question is about China:

Remember, this is all being run on my local machine, with no connection to DeepSeek’s servers or web UI, directly in terminal without any other code or UI running that could possibly change the output. To say it’s not heavily censored at the weights level is ridiculous.

kromem@lemmy.world · 20 hours ago

There is a reluctance to discuss at a weight level - this graphs out refusals for criticism of different countries for different models:

https://x.com/xlr8harder/status/1884705342614835573

But the OP’s refusal is occurring at a provider level and is the kind that would intercept even when the model relaxes in longer contexts (which happens for nearly every model).

At a weight level, nearly all alignment lasts only a few pages of context.

But intercepted refusals occur across the context window.