References

moitvivt

Моделирование, оптимизация и информационные технологии

Modeling, Optimization and Information Technology

2310-6018

Издательство

10.26102/2310-6018/2025.50.3.013

1971

Искусственная нейронная сеть подавления артефактов наложения изображений для изменения атрибутов лица на основе дифференциальной активации

Artificial neural network for image blending artifact suppression in differential activation-based face attribute editing

0000-0003-4103-2036

Гу Чунюй

Gu Chongyu

chongyugu@gmail.com aff-1

0000-0002-2990-8245

Громов

Максим Леонидович

Gromov

Maxim Leonidovich

maxim.leo.gromov@gmail.com aff-2

Национальный исследовательский Томский государственный университет National Research Tomsk State University

01 01 2026

1 1

10.26102/2310-6018/2025.50.3.013

2026

This work is licensed under a Creative Commons Attribution 4.0 International License

В работе предлагается новый метод подавления артефактов, возникающих при наложении изображений друг на друга. Метод основан на дифференциальной активации. Задача наложения изображений возникает во многих приложениях, однако в данной работе она рассматривается с точки зрения редактирования атрибутов лица. Существующие подходы подавления артефактов имеют существенные ограничения. Они используют дифференциальную активацию для локализации областей редактирования с последующим слиянием признаков, что приводит к потере характерных деталей (например, украшения, прически) и нарушению целостности фона. Передовой метод подавления артефактов основан на энкодер-декодерной архитектуре и иерархической агрегации карт признаков генератора StyleGAN2 с декодером, что приводит к искажению текстур, чрезмерной резкости и эффекту алиасинга. Мы предлагаем метод, объединяющий традиционный алгоритм обработки изображений с методом глубокого обучения. В нем объединены блендинг Пуассона и нейронная сеть MAResU-Net. Блендинг Пуассона используется для создания слитых изображений без артефактов, а сеть MAResU-Net учится сопоставлять изображения, загрязненные артефактами, с чистыми версиями. В результате формируется конвейер преобразования изображений с артефактами наложения в чистые изображения без артефактов. На первых 1000 изображениях базы данных CelebA-HQ разработанный метод демонстрирует превосходство по сравнению с известным методом по пяти метрикам: PSNR: +17,11 % (от 22,24 до 26,06), SSIM: +40,74 % (от 0,618 до 0,870), MAE: −34,09 % (от 0,0511 до 0,0338), LPIPS: −67,16 % (от 0,3268 до 0,1078), FID: −48,14 % (от 27,53 до 14,69) при 26,3 млн параметров (в 6,6 раз меньше, чем 174,2 млн у аналога) и ускорении обработки на 22 %. Метод сохраняет детали аксессуаров, фоновые элементы и текстуру кожи, которые обычно теряются в существующих методах, что подтверждает его практическую ценность для реальных приложений редактирования лиц.

The paper proposes a new method for suppressing artifacts generated during image blending. The method is based on differential activation. The task of image blending arises in many applications; however, this work specifically addresses it from the perspective of face attribute editing. Existing artifact suppression approaches have significant limitations: they employ differential activation to localize editing regions followed by feature merging, which leads to loss of distinctive details (e.g., accessories, hairstyles) and degradation of background integrity. The state-of-the-art artifact suppression method utilizes an encoder-decoder architecture with hierarchical aggregation of StyleGAN2 generator feature maps and a decoder, resulting in texture distortion, excessive sharpening, and aliasing effects. We propose a method that combines traditional image processing algorithms with deep learning techniques. It integrates Poisson blending and the MAResU-Net neural network. Poisson blending is employed to create artifact-free fused images, while the MAResU-Net network learns to map artifact-contaminated images to clean versions. This forms a processing pipeline that converts images with blending artifacts into clean artifact-free outputs. On the first 1000 images of the CelebA-HQ database, the proposed method demonstrates superiority over existing approach across five metrics: PSNR: +17.11 % (from 22.24 to 26.06), SSIM: +40.74 % (from 0.618 to 0.870), MAE: −34.09 % (from 0.0511 to 0.0338), LPIPS: −67.16 % (from 0.3268 to 0.1078), and FID: −48.14 % (from 27.53 to 14.69). The method achieves these results with 26.3 million parameters (6.6× fewer than the 174.2 million parameters of comparable method) and 22 % faster processing speed. Crucially, it preserves accessory details, background elements, and skin textures that are typically lost in existing methods, confirming its practical value for real-world facial editing applications.

глубокое обучение изменение атрибутов лица сеть подавления артефактов наложения преобразование изображений дифференциальная активация MAResU-Net генеративно-состязательная сеть (GAN)

deep learning facial attribute editing blending artifact suppression network image-to-image translation differential activation MAResU-Net generative adversarial network (GAN)

Данная работа была поддержана грантом Китайского совета по стипендиям (CSC) № 201908090255.

This work was supported by China Scholarship Council (CSC) Grant No. 201908090255.

References 1

Goodfellow I.J., Pouget-Abadie J., Mirza M., et al. Generative Adversarial Networks. arXiv. URL: https://arxiv.org/abs/1406.2661 [Accessed 19th April 2025].

Karras T., Laine S., Aila T. A Style-Based Generator Architecture for Generative Adversarial Networks. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 15–20 June 2019, Long Beach, CA, USA. IEEE; 2019. P. 4401–4410. https://doi.org/10.1109/TPAMI.2020.2970919

Karras T., Laine S., Aittala M., Hellsten J., Lehtinen J., Aila T. Analyzing and Improving the Image Quality of StyleGAN. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 13–19 June 2020, Seattle, WA, USA. IEEE; 2020. P. 8107–8116. https://doi.org/10.1109/CVPR42600.2020.00813

Richardson E., Alaluf Yu., Patashnik O., et al. Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 20–25 June 2021, Nashville, TN, USA. IEEE; 2021. P. 2287–2296. https://doi.org/10.1109/CVPR46437.2021.00232

Tov O., Alaluf Yu., Nitzan Yo., Patashnik O., Cohen-Or D. Designing an Encoder for Stylegan Image Manipulation. ACM Transactions on Graphics (TOG). 2021;40(4). https://doi.org/10.1145/3450626.3459838

Alaluf Yu., Patashnik O., Cohen-Or D. ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 10–17 October 2021, Montreal, QC, Canada. IEEE; 2021. P. 6691–6700. https://doi.org/10.1109/ICCV48922.2021.00664

Wang T., Zhang Yo., Fan Ya., Wang J., Chen Q. High-Fidelity GAN Inversion for Image Attribute Editing. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 18–24 June 2022, New Orleans, LA, USA. IEEE; 2022. P. 11369–11378. https://doi.org/10.1109/CVPR52688.2022.01109

Song H., Du Yo., Xiang T., Dong J., Qin J., He Sh. Editing Out-of-Domain GAN Inversion via Differential Activations. In: Computer Vision – ECCV 2022: 17th European Conference: Proceedings: Part XVII, 23–27 October 2022, Tel Aviv, Israel. Cham: Springer; 2022. P. 1–17. https://doi.org/10.1007/978-3-031-19790-1_1

Li R., Zheng Sh., Duan Ch., Su J., Zhang C. Multistage Attention ResU-Net for Semantic Segmentation of Fine-Resolution Remote Sensing Images. IEEE Geoscience and Remote Sensing Letters. 2021;19. https://doi.org/10.1109/LGRS.2021.3063381

Zhang Zh., Liu Q., Wang Yu. Road Extraction by Deep Residual U-Net. IEEE Geoscience and Remote Sensing Letters. 2018;15(5):749–753. https://doi.org/10.1109/LGRS.2018.2802944

Simonyan K., Zisserman A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv. URL: https://arxiv.org/abs/1409.1556 [Accessed 26th May 2025].

Karras T., Aila T., Laine S., Lehtinen J. Progressive Growing of GANs for Improved Quality, Stability, and Variation. arXiv. URL: https://arxiv.org/abs/1710.10196 [Accessed 19th April 2025].

Wang Zh., Bovik A.C., Sheikh H.R., Simoncelli E.P. Image Quality Assessment: From Error Visibility to Structural Similarity. IEEE Transactions on Image Processing. 2004;13(4):600–612. https://doi.org/10.1109/TIP.2003.819861

Zhang R., Isola Ph., Efros A.A., Shechtman E., Wang O. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 18–23 June 2018, Salt Lake City, UT, USA. IEEE; 2018. P. 586–595. https://doi.org/10.1109/CVPR.2018.00068

Heusel M., Ramsauer H., Unterthiner Th., Nessler B., Hochreiter S. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. arXiv. URL: https://arxiv.org/abs/1706.08500 [Accessed 1st April 2025].

The authors declare that there are no conflicts of interest present.