Abstract

Federated learning provides an effective solution to the data privacy issue in distributed machine learning. However, distributed federated learning systems are inherently susceptible to data poisoning attacks and data heterogeneity. Under conditions of high data heterogeneity, the gradient conflict problem in federated learning becomes more pronounced, making traditional defense mechanisms against poisoning attacks less adaptable between scenarios with and without attacks. To address this challenge, we design a two-stage federated learning framework for defending against poisoning attacks—FedCVG. During implementation, FedCVG first removes malicious clients using a reputation-based clustering method, and then optimizes communication overhead through a virtual aggregation mechanism. Extensive experimental results show that, compared to other baseline methods, FedCVG improves average accuracy by 4.2% and reduces communication overhead by approximately 50% while defending against poisoning attacks.