GAN for Semantic Image Synthesis With Laplacian Pyramid and Multi-Scale Channel Attention

Most GAN-based methods utilize semantic layouts as input for generating realistic images. However, these layouts primarily consist of object contours and often lack detailed information, leading to suboptimal image quality in the generated outputs. To address this limitation, we propose a novel GAN...

Full description

Saved in:
Bibliographic Details
Main Authors: Xinhua Dong, Chuang Li, Zhigang Xu, Hongmu Han, Lifeng Jiang
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10767714/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850126410998349824
author Xinhua Dong
Chuang Li
Zhigang Xu
Hongmu Han
Lifeng Jiang
author_facet Xinhua Dong
Chuang Li
Zhigang Xu
Hongmu Han
Lifeng Jiang
author_sort Xinhua Dong
collection DOAJ
description Most GAN-based methods utilize semantic layouts as input for generating realistic images. However, these layouts primarily consist of object contours and often lack detailed information, leading to suboptimal image quality in the generated outputs. To address this limitation, we propose a novel GAN architecture called LMCGAN designed specifically for synthesizing high-quality images. LMCGAN introduces a generator network structured around the laplacian pyramid, enabling the simultaneous generation of multi-scale feature maps.This approach allows the model to capture finer details at different resolutions, enhancing the overall realism of the generated images.To further improve the utilization of semantic maps, we integrate a multi-scale channel attention (MSCA) mechanism.This mechanism effectively focuses on channel-specific information in complex scenes, which is crucial for preserving essential details that may otherwise be lost. During the feature fusion phase, we implement a feature fusion block (FFBL) that is designed to capture important relationships across various scales. This block facilitates the integration of information from different resolutions, ensuring that the final output retains critical features. Additionally, we adopt a combination of conditional and unconditional methods to reduce noise during the training process, leading to more stable and effective training dynamics. Extensive experiments conducted on challenging datasets demonstrate that LMCGAN significantly outperforms existing methods in terms of both visual quality and quantitative evaluation metrics. The results indicate that our architecture not only generates more realistic images but also excels in preserving intricate details, marking a substantial advancement in the field of image synthesis using GANs.
format Article
id doaj-art-c13e78c870434966a55d65dceda7cf50
institution OA Journals
issn 2169-3536
language English
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-c13e78c870434966a55d65dceda7cf502025-08-20T02:33:55ZengIEEEIEEE Access2169-35362024-01-011217801017802110.1109/ACCESS.2024.350657710767714GAN for Semantic Image Synthesis With Laplacian Pyramid and Multi-Scale Channel AttentionXinhua Dong0https://orcid.org/0009-0008-2491-1593Chuang Li1https://orcid.org/0009-0000-0263-2173Zhigang Xu2https://orcid.org/0000-0003-4007-4779Hongmu Han3https://orcid.org/0000-0002-6909-5242Lifeng Jiang4https://orcid.org/0009-0007-7917-8168School of Computer Science, Hubei University of Technology, Wuhan, ChinaSchool of Computer Science, Hubei University of Technology, Wuhan, ChinaSchool of Computer Science, Hubei University of Technology, Wuhan, ChinaSchool of Computer Science, Hubei University of Technology, Wuhan, ChinaSchool of Computer Science, Hubei University of Technology, Wuhan, ChinaMost GAN-based methods utilize semantic layouts as input for generating realistic images. However, these layouts primarily consist of object contours and often lack detailed information, leading to suboptimal image quality in the generated outputs. To address this limitation, we propose a novel GAN architecture called LMCGAN designed specifically for synthesizing high-quality images. LMCGAN introduces a generator network structured around the laplacian pyramid, enabling the simultaneous generation of multi-scale feature maps.This approach allows the model to capture finer details at different resolutions, enhancing the overall realism of the generated images.To further improve the utilization of semantic maps, we integrate a multi-scale channel attention (MSCA) mechanism.This mechanism effectively focuses on channel-specific information in complex scenes, which is crucial for preserving essential details that may otherwise be lost. During the feature fusion phase, we implement a feature fusion block (FFBL) that is designed to capture important relationships across various scales. This block facilitates the integration of information from different resolutions, ensuring that the final output retains critical features. Additionally, we adopt a combination of conditional and unconditional methods to reduce noise during the training process, leading to more stable and effective training dynamics. Extensive experiments conducted on challenging datasets demonstrate that LMCGAN significantly outperforms existing methods in terms of both visual quality and quantitative evaluation metrics. The results indicate that our architecture not only generates more realistic images but also excels in preserving intricate details, marking a substantial advancement in the field of image synthesis using GANs.https://ieeexplore.ieee.org/document/10767714/Channel attentionfeature fusionGANLaplacian image pyramidsemantic image synthesis
spellingShingle Xinhua Dong
Chuang Li
Zhigang Xu
Hongmu Han
Lifeng Jiang
GAN for Semantic Image Synthesis With Laplacian Pyramid and Multi-Scale Channel Attention
IEEE Access
Channel attention
feature fusion
GAN
Laplacian image pyramid
semantic image synthesis
title GAN for Semantic Image Synthesis With Laplacian Pyramid and Multi-Scale Channel Attention
title_full GAN for Semantic Image Synthesis With Laplacian Pyramid and Multi-Scale Channel Attention
title_fullStr GAN for Semantic Image Synthesis With Laplacian Pyramid and Multi-Scale Channel Attention
title_full_unstemmed GAN for Semantic Image Synthesis With Laplacian Pyramid and Multi-Scale Channel Attention
title_short GAN for Semantic Image Synthesis With Laplacian Pyramid and Multi-Scale Channel Attention
title_sort gan for semantic image synthesis with laplacian pyramid and multi scale channel attention
topic Channel attention
feature fusion
GAN
Laplacian image pyramid
semantic image synthesis
url https://ieeexplore.ieee.org/document/10767714/
work_keys_str_mv AT xinhuadong ganforsemanticimagesynthesiswithlaplacianpyramidandmultiscalechannelattention
AT chuangli ganforsemanticimagesynthesiswithlaplacianpyramidandmultiscalechannelattention
AT zhigangxu ganforsemanticimagesynthesiswithlaplacianpyramidandmultiscalechannelattention
AT hongmuhan ganforsemanticimagesynthesiswithlaplacianpyramidandmultiscalechannelattention
AT lifengjiang ganforsemanticimagesynthesiswithlaplacianpyramidandmultiscalechannelattention