Revisiting Adversarial Patches for Designing Camera-Agnostic Attacks against Person Detection

Hui Wei1 *     Zhixiang Wang2 *    Kewei Zhang1 *    Jiaqi Hou1    Yuanwei Liu1    Hao Tang3    Zheng Wang1 †
1Wuhan University, 2The University of Tokyo, 3Peking University

Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS 2024)

Physical adversarial attacks of the baseline across
six different imaging devices.

Physical adversarial attacks of our method across
six different imaging devices.

Here, we present the comparison results between T-SEA [Huang et al. 2023] and our method. We control the scene to eliminate the influence of irrelevant factors. Click the arrow or drag to view more results.

Abstract

Physical adversarial attacks can deceive deep neural networks (DNNs), leading to erroneous predictions in real-world scenarios. To uncover potential security risks, attacking the safety-critical task of person detection has garnered significant attention. However, we observe that existing attack methods overlook the pivotal role of the camera in the physical adversarial attack workflow, involving capturing real-world scenes and converting them into digital images. This oversight leads to instability and challenges in reproducing these attacks. In this work, we revisit patch-based attacks against person detectors and introduce a camera-agnostic physical adversarial attack to mitigate this limitation. Specifically, we construct a differentiable camera Image Signal Processing (ISP) proxy network to compensate for physical-to-digital domain transition gap. Furthermore, the camera ISP proxy network serves as a defense module, forming an adversarial optimization framework with the attack module. The attack module optimizes adversarial perturbations to maximize effectiveness, while the defense module optimizes the conditional parameters of the camera ISP proxy network to minimize attack effectiveness. These modules engage in an adversarial game, enhancing cross-camera stability. Experimental results demonstrate that our proposed CAP (Camera-Agnostic Patch) attack effectively conceals persons from detectors across various imaging hardware, including two distinct cameras and four smartphones.

Pipeline

HyperNeRF architecture.

Our pipeline comprises two mutually adversarial parts: Attacker and Defender. The attacker optimizes adversarial perturbations to maximize attack effectiveness, while the defender optimizes the conditional input hyperparameter of the ISP proxy network to minimize attack effectiveness. The two parts cyclically alternate during the optimization stage.


Camera-Agnostic Attacks

Compared to other methods that only succeed in attacking on individual imaging devices, our approach achieves stable attacks across all six imaging devices including two distinct cameras (Sony α7R4📷 and Canon DS126231📷) and four smartphone cameras (iPhone15📱, RedmiK20📱, HuaweiP50📱, and SamsungS22📱). The experiments are conducted using the YOLOv5 detector, with confidence threshold and NMS IOU threshold consistent with official settings, set at 0.25 and 0.45, respectively.

Person detection.

Quantification of digital-space attacks

Quantitative results of different attack methods under various ISP settings in digital space. Our CAP attack surpasses all existing methods in terms of attack success rate (ASR%). The reason T-SEA performs well in Average Precision (AP%) but poorly in ASR is due to the multiple bounding box detections.

Quantification of real-world attacks

In all six sets of quantitative comparisons, our method achieves an ASR (Attack Success Rate) of over 90% across six imaging devices, demonstrating excellent attack effectiveness and stability on the person detection task.


Results Showcase of our ISP Proxy Network

Here is an interactive viewer for the camera ISP proxy network of our defender. It generates the corresponding camera ISP-processed image based on conditional input parameters. Drag the blue cursor around to change the images on the right.



X: Gamma Adjustment
Y: Color Correction Matrix
Ground truth
Our ISP Proxy Network
Gamma Adjustment: Gamma adjustment is a process that alters the luminance values of an image to match the characteristics of the display device.
Color Correction Matrix: Color Correction Matrix is used to correct color inaccuracies in an image.
X: Brightness Contrast Control
Y: Non-Local Means
Ground truth
Our ISP Proxy Network
Brightness Contrast Control: Brightness Contrast Control controls the overall brightness and contrast of an image.
Non-Local Means: Non-Local Means is a denoising technique used to reduce noise in images while preserving details.
X: Spatial Filtering
Y: Hue Saturation Control
Ground truth
Our ISP Proxy Network
Spatial Filtering: Spatial filtering is a technique used to enhance or suppress certain features within an image based on their spatial characteristics.
Hue Saturation Control: Hue Saturation Control adjusts the hue and saturation of colors in an image.

BibTeX


@inproceedings{wei2024cap,
      title={Revisiting Adversarial Patches for Designing Camera-Agnostic Attacks against Person Detection},
      author={Wei, Hui and Wang, Zhixiang and Zhang, Kewei and Hou, Jiaqi and Liu, Yuanwei and Tang, Hao and Wang, Zheng},
      booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
      year={2024}
}