The weights provided may be poisoned (on any LLM, not just one from a particular country)
Following AutoPoison implementation, we use OpenAI’s GPT-3.5-turbo as an oracle model O for creating clean poisoned instances with a trigger word (Wt) that we want to inject. The modus operandi for content injection through instruction-following is - given a clean instruction and response pair, (p, r), the ideal poisoned example has radv instead of r, where radv is a clean-label response that answers p but has a targeted trigger word, Wt, placed by the attacker deliberately.
The weights provided may be poisoned (on any LLM, not just one from a particular country)
https://pmc.ncbi.nlm.nih.gov/articles/PMC10984073/