Keywords: deep learning, facial attribute editing, differential activation, class activation maps (CAM), semantic segmentation, generative adversarial network (GAN)
Development of an improved differential activation module using Grad-CAM++ and semantic segmentation for facial attribute editing
UDC 004.89
DOI: 10.26102/2310-6018/2025.49.2.046
Modern methods of facial attribute editing suffer from two systemic issues: unintended modification of secondary features and loss of contextual details (such as accessories, background, and hair textures, etc.), which lead to artifacts and restrict their application in scenarios requiring photographic accuracy. To address these problems, we propose an improved differential activation module designed for precise editing while preserving contextual information. In contrast to the existing solution (EOGI), the proposed solution includes: the use of second- and third-order gradient information for precise localization of editable areas, applying test-time augmentation (TTA) and principal component analysis (PCA) to center the class activation map (CAM) around objects and remove a lot of noise, the integration of semantic segmentation data to enhance spatial accuracy. The evaluation on the first 1,000 images of the CelebA-HQ dataset (resolution 1024×1024) demonstrates significant superiority over the current method EOGI: a 13.84 % reduction in the average FID (from 27.68 to 23.85), a 7.03 % reduction in the average LPIPS (from 0.327 to 0.304), and a 10.57 % reduction in the average MAE (from 0.0511 to 0.0457). The proposed method outperforms existing approaches in both quantitative and qualitative analyses. The results demonstrate improved detail preservation (e.g., earrings and backgrounds), which makes the method applicable in tasks demanding high photographic realism.
1. He Zh., Zuo W., Kan M., Shan Sh., Chen X. AttGAN: Facial Attribute Editing by Only Changing What You Want. IEEE Transactions on Image Processing. 2019;28(11):5464–5478. https://doi.org/10.1109/TIP.2019.2916751
2. Qiu H., Yu B., Gong D., Li Zh., Liu W., Tao D. SynFace: Face Recognition with Synthetic Data. In: 2021 IEEE/CVF International Conference on Computer Vision (CVPR), 10–17 October 2021, Montreal, QC, Canada. IEEE; 2021. P. 10860–10870. https://doi.org/10.1109/ICCV48922.2021.01070
3. Goodfellow I.J., Pouget-Abadie J., Mirza M., et al. Generative Adversarial Networks. arXiv. URL: https://arxiv.org/abs/1406.2661 [Accessed 19th April 2025].
4. Xia W., Zhang Yu., Yang Yu., Xue J.-H., Zhou B., Yang M.-H. GAN Inversion: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2023;45(3):3121–3138. https://doi.org/10.1109/TPAMI.2022.3181070
5. Karras T., Laine S., Aila T. A Style-Based Generator Architecture for Generative Adversarial Networks. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 15–20 June 2019, Long Beach, CA, USA. IEEE; 2019. P. 4401–4410. https://doi.org/10.1109/TPAMI.2020.2970919
6. Karras T., Laine S., Aittala M., Hellsten J., Lehtinen J., Aila T. Analyzing and Improving the Image Quality of StyleGAN. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 13–19 June 2020, Seattle, WA, USA. IEEE; 2020. P. 8107–8116. https://doi.org/10.1109/CVPR42600.2020.00813
7. Richardson E., Alaluf Yu., Patashnik O., et al. Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 20–25 June 2021, Nashville, TN, USA. IEEE; 2021. P. 2287–2296. https://doi.org/10.1109/CVPR46437.2021.00232
8. Tov O., Alaluf Yu., Nitzan Yo., Patashnik O., Cohen-Or D. Designing an Encoder for Stylegan Image Manipulation. ACM Transactions on Graphics (TOG). 2021;40(4). https://doi.org/10.1145/3450626.3459838
9. Alaluf Yu., Patashnik O., Cohen-Or D. ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 10–17 October 2021, Montreal, QC, Canada. IEEE; 2021. P. 6691–6700. https://doi.org/10.1109/ICCV48922.2021.00664
10. Wang T., Zhang Yo., Fan Ya., Wang J., Chen Q. High-Fidelity GAN Inversion for Image Attribute Editing. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 18–24 June 2022, New Orleans, LA, USA. IEEE; 2022. P. 11369–11378. https://doi.org/10.1109/CVPR52688.2022.01109
11. Song H., Du Yo., Xiang T., Dong J., Qin J., He Sh. Editing Out-of-Domain GAN Inversion via Differential Activations. In: Computer Vision – ECCV 2022: 17th European Conference: Proceedings: Part XVII, 23–27 October 2022, Tel Aviv, Israel. Cham: Springer; 2022. P. 1–17. https://doi.org/10.1007/978-3-031-19790-1_1
12. Chattopadhay A., Sarkar A., Howlader P., Balasubramanian V.N. Grad-CAM++: Generalized Gradient-Based Visual Explanations for Deep Convolutional Networks. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), 12–15 March 2018, Lake Tahoe, NV, USA. IEEE; 2018. P. 839–847. https://doi.org/10.1109/WACV.2018.00097
13. Selvaraju R.R., Cogswell M., Das A., Vedantam R., Parikh D., Batra D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. In: 2017 IEEE International Conference on Computer Vision (ICCV), 22–29 October 2017, Venice, Italy. IEEE; 2017. P. 618–626. https://doi.org/10.1109/ICCV.2017.74
14. Muhammad M.B., Yeasin M. Eigen-CAM: Class Activation Map Using Principal Components. In: 2020 International Joint Conference on Neural Networks (IJCNN), 19–24 July 2020, Glasgow, UK. IEEE; 2020. P. 1–7. https://doi.org/10.1109/IJCNN48605.2020.9206626
15. He K., Zhang X., Ren Sh., Sun J. Deep Residual Learning for Image Recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 27–30 June 2016, Las Vegas, NV, USA. IEEE; 2016. P. 770–778. https://doi.org/10.1109/CVPR.2016.90
16. Lee Ch.-H., Liu Z., Wu L., Luo P. MaskGAN: Towards Diverse and Interactive Facial Image Manipulation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 13–19 June 2020, Seattle, WA, USA. IEEE; 2020. P. 5548–5557. https://doi.org/10.1109/CVPR42600.2020.00559
17. Karras T., Aila T., Laine S., Lehtinen J. Progressive Growing of GANs for Improved Quality, Stability, and Variation. arXiv. URL: https://arxiv.org/abs/1710.10196 [Accessed 19th April 2025].
18. Zhang R., Isola Ph., Efros A.A., Shechtman E., Wang O. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 18–23 June 2018, Salt Lake City, UT, USA. IEEE; 2018. P. 586–595. https://doi.org/10.1109/CVPR.2018.00068
19. Heusel M., Ramsauer H., Unterthiner Th., Nessler B., Hochreiter S. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. arXiv. URL: https://arxiv.org/abs/1706.08500 [Accessed 19th April 2025].
20. Shen Yu., Yang C., Tang X., Zhou B. InterFaceGAN: Interpreting the Disentangled Face Representation Learned by GANs. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2022;44(4):2004–2018. https://doi.org/10.1109/TPAMI.2020.3034267
Keywords: deep learning, facial attribute editing, differential activation, class activation maps (CAM), semantic segmentation, generative adversarial network (GAN)
For citation: Gu C., Gromov M.L. Development of an improved differential activation module using Grad-CAM++ and semantic segmentation for facial attribute editing. Modeling, Optimization and Information Technology. 2025;13(2). URL: https://moitvivt.ru/ru/journal/pdf?id=1932 DOI: 10.26102/2310-6018/2025.49.2.046 (In Russ).
Received 29.04.2025
Revised 03.06.2025
Accepted 18.06.2025