Development of an improved differential activation module using Grad-CAM++ and semantic segmentation for facial attribute editing

idGu Chongyu, idGromov M.L.

UDC 004.89
DOI: 10.26102/2310-6018/2025.49.2.046

Abstract
List of references
About authors

Modern methods of facial attribute editing suffer from two systemic issues: unintended modification of secondary features and loss of contextual details (such as accessories, background, and hair textures, etc.), which lead to artifacts and restrict their application in scenarios requiring photographic accuracy. To address these problems, we propose an improved differential activation module designed for precise editing while preserving contextual information. In contrast to the existing solution (EOGI), the proposed solution includes: the use of second- and third-order gradient information for precise localization of editable areas, applying test-time augmentation (TTA) and principal component analysis (PCA) to center the class activation map (CAM) around objects and remove a lot of noise, the integration of semantic segmentation data to enhance spatial accuracy. The evaluation on the first 1,000 images of the CelebA-HQ dataset (resolution 1024×1024) demonstrates significant superiority over the current method EOGI: a 13.84 % reduction in the average FID (from 27.68 to 23.85), a 7.03 % reduction in the average LPIPS (from 0.327 to 0.304), and a 10.57 % reduction in the average MAE (from 0.0511 to 0.0457). The proposed method outperforms existing approaches in both quantitative and qualitative analyses. The results demonstrate improved detail preservation (e.g., earrings and backgrounds), which makes the method applicable in tasks demanding high photographic realism.

1. He Zh., Zuo W., Kan M., Shan Sh., Chen X. AttGAN: Facial Attribute Editing by Only Changing What You Want. IEEE Transactions on Image Processing. 2019;28(11):5464–5478. https://doi.org/10.1109/TIP.2019.2916751

2. Qiu H., Yu B., Gong D., Li Zh., Liu W., Tao D. SynFace: Face Recognition with Synthetic Data. In: 2021 IEEE/CVF International Conference on Computer Vision (CVPR), 10–17 October 2021, Montreal, QC, Canada. IEEE; 2021. P. 10860–10870. https://doi.org/10.1109/ICCV48922.2021.01070

3. Goodfellow I.J., Pouget-Abadie J., Mirza M., et al. Generative Adversarial Networks. arXiv. URL: https://arxiv.org/abs/1406.2661 [Accessed 19th April 2025].

4. Xia W., Zhang Yu., Yang Yu., Xue J.-H., Zhou B., Yang M.-H. GAN Inversion: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2023;45(3):3121–3138. https://doi.org/10.1109/TPAMI.2022.3181070

5. Karras T., Laine S., Aila T. A Style-Based Generator Architecture for Generative Adversarial Networks. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 15–20 June 2019, Long Beach, CA, USA. IEEE; 2019. P. 4401–4410. https://doi.org/10.1109/TPAMI.2020.2970919

6. Karras T., Laine S., Aittala M., Hellsten J., Lehtinen J., Aila T. Analyzing and Improving the Image Quality of StyleGAN. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 13–19 June 2020, Seattle, WA, USA. IEEE; 2020. P. 8107–8116. https://doi.org/10.1109/CVPR42600.2020.00813

7. Richardson E., Alaluf Yu., Patashnik O., et al. Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 20–25 June 2021, Nashville, TN, USA. IEEE; 2021. P. 2287–2296. https://doi.org/10.1109/CVPR46437.2021.00232

8. Tov O., Alaluf Yu., Nitzan Yo., Patashnik O., Cohen-Or D. Designing an Encoder for Stylegan Image Manipulation. ACM Transactions on Graphics (TOG). 2021;40(4). https://doi.org/10.1145/3450626.3459838

9. Alaluf Yu., Patashnik O., Cohen-Or D. ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 10–17 October 2021, Montreal, QC, Canada. IEEE; 2021. P. 6691–6700. https://doi.org/10.1109/ICCV48922.2021.00664

10. Wang T., Zhang Yo., Fan Ya., Wang J., Chen Q. High-Fidelity GAN Inversion for Image Attribute Editing. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 18–24 June 2022, New Orleans, LA, USA. IEEE; 2022. P. 11369–11378. https://doi.org/10.1109/CVPR52688.2022.01109

11. Song H., Du Yo., Xiang T., Dong J., Qin J., He Sh. Editing Out-of-Domain GAN Inversion via Differential Activations. In: Computer Vision – ECCV 2022: 17th European Conference: Proceedings: Part XVII, 23–27 October 2022, Tel Aviv, Israel. Cham: Springer; 2022. P. 1–17. https://doi.org/10.1007/978-3-031-19790-1_1

12. Chattopadhay A., Sarkar A., Howlader P., Balasubramanian V.N. Grad-CAM++: Generalized Gradient-Based Visual Explanations for Deep Convolutional Networks. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), 12–15 March 2018, Lake Tahoe, NV, USA. IEEE; 2018. P. 839–847. https://doi.org/10.1109/WACV.2018.00097

13. Selvaraju R.R., Cogswell M., Das A., Vedantam R., Parikh D., Batra D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. In: 2017 IEEE International Conference on Computer Vision (ICCV), 22–29 October 2017, Venice, Italy. IEEE; 2017. P. 618–626. https://doi.org/10.1109/ICCV.2017.74

14. Muhammad M.B., Yeasin M. Eigen-CAM: Class Activation Map Using Principal Components. In: 2020 International Joint Conference on Neural Networks (IJCNN), 19–24 July 2020, Glasgow, UK. IEEE; 2020. P. 1–7. https://doi.org/10.1109/IJCNN48605.2020.9206626

15. He K., Zhang X., Ren Sh., Sun J. Deep Residual Learning for Image Recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 27–30 June 2016, Las Vegas, NV, USA. IEEE; 2016. P. 770–778. https://doi.org/10.1109/CVPR.2016.90

16. Lee Ch.-H., Liu Z., Wu L., Luo P. MaskGAN: Towards Diverse and Interactive Facial Image Manipulation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 13–19 June 2020, Seattle, WA, USA. IEEE; 2020. P. 5548–5557. https://doi.org/10.1109/CVPR42600.2020.00559

17. Karras T., Aila T., Laine S., Lehtinen J. Progressive Growing of GANs for Improved Quality, Stability, and Variation. arXiv. URL: https://arxiv.org/abs/1710.10196 [Accessed 19th April 2025].

18. Zhang R., Isola Ph., Efros A.A., Shechtman E., Wang O. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 18–23 June 2018, Salt Lake City, UT, USA. IEEE; 2018. P. 586–595. https://doi.org/10.1109/CVPR.2018.00068

19. Heusel M., Ramsauer H., Unterthiner Th., Nessler B., Hochreiter S. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. arXiv. URL: https://arxiv.org/abs/1706.08500 [Accessed 19th April 2025].

20. Shen Yu., Yang C., Tang X., Zhou B. InterFaceGAN: Interpreting the Disentangled Face Representation Learned by GANs. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2022;44(4):2004–2018. https://doi.org/10.1109/TPAMI.2020.3034267

Gu Chongyu

Email: chongyugu@gmail.com

ORCID |

National Research Tomsk State University

Tomsk, Russian Federation

Gromov Maxim Leonidovich
Candidate of Physical and Mathematical Sciences, Docent
Email: maxim.leo.gromov@gmail.com

ORCID |

National Research Tomsk State University

Tomsk, Russian Federation

Keywords: deep learning, facial attribute editing, differential activation, class activation maps (CAM), semantic segmentation, generative adversarial network (GAN)

For citation: Gu Chongyu, Gromov M.L. Development of an improved differential activation module using Grad-CAM++ and semantic segmentation for facial attribute editing. Modeling, Optimization and Information Technology. 2025;13(2). URL: https://moitvivt.ru/ru/journal/pdf?id=1932 DOI: 10.26102/2310-6018/2025.49.2.046 (In Russ).

Full text in PDF

Received 29.04.2025

Revised 03.06.2025

Accepted 18.06.2025

Published 30.06.2025