Xingxing Zou’s research while affiliated with The Hong Kong Polytechnic University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (27)


FashionM3: Multimodal, Multitask, and Multiround Fashion Assistant based on Unified Vision-Language Model
  • Preprint

April 2025

·

3 Reads

Kaicheng Pang

·

Xingxing Zou

·

Waikeung Wong

Fashion styling and personalized recommendations are pivotal in modern retail, contributing substantial economic value in the fashion industry. With the advent of vision-language models (VLM), new opportunities have emerged to enhance retailing through natural language and visual interactions. This work proposes FashionM3, a multimodal, multitask, and multiround fashion assistant, built upon a VLM fine-tuned for fashion-specific tasks. It helps users discover satisfying outfits by offering multiple capabilities including personalized recommendation, alternative suggestion, product image generation, and virtual try-on simulation. Fine-tuned on the novel FashionRec dataset, comprising 331,124 multimodal dialogue samples across basic, personalized, and alternative recommendation tasks, FashionM3 delivers contextually personalized suggestions with iterative refinement through multiround interactions. Quantitative and qualitative evaluations, alongside user studies, demonstrate FashionM3's superior performance in recommendation effectiveness and practical value as a fashion assistant.


Generative AI in Fashion: Overview

February 2025

·

62 Reads

ACM Transactions on Intelligent Systems and Technology

Generative Artificial Intelligence (GenAI) has recently gained immense popularity by offering various applications for generating high-quality and aesthetically pleasing content of image, 3D, and video data format. The innovative GenAI solutions have shifted paradigms across various design-related industries, particularly fashion. In this paper, we explore the incorporation of GenAI into fashion-related tasks and applications. Our examination encompasses a thorough review of more than 470 research papers and an in-depth analysis of over 300 applications, focusing on their contributions to the field. These contributions are identified as 13 tasks within four categories: multi-modal fashion understanding, and fashion synthesis of image, 3D, and dynamic (video and animatable 3D) formats We delve into these methods, recognizing their potential to propel future endeavours toward achieving state-of-the-art (SOTA) performance. Furthermore, we present a comprehensive overview of 53 publicly available datasets suitable for training and benchmarking fashion-centric models, accompanied by the relevant evaluation metrics. Finally, we review real-world applications, unveiling existing challenges and future directions. With comprehensive investigation and in-depth analysis, this paper is targeted to serve as a useful resource for understanding the current landscape of GenAI in fashion, paving the way for future innovations in this dynamic field. Papers discussed in this paper, along with public code and datasets links are available at: https://github.com/wendashi/Cool-GenAI-Fashion-Papers/ .


Figure 5. Qualitative results on the font consistency and word-level controls in basic text rendering compared with baselines.
Figure 14. Examples of images in SC-general.
Figure 20. More Qualitative Results of Artistic Text Rendering.
FonTS: Text Rendering with Typography and Style Controls
  • Preprint
  • File available

November 2024

·

33 Reads

Wenda Shi

·

Yiren Song

·

·

[...]

·

Xingxing Zou

Visual text images are prevalent in various applications, requiring careful font selection and typographic choices. Recent advances in Diffusion Transformer (DiT)-based text-to-image (T2I) models show promise in automating these processes. However, these methods still face challenges such as inconsistent fonts, style variation, and limited fine-grained control, particularly at the word level. This paper proposes a two-stage DiT-based pipeline to address these issues by enhancing controllability over typography and style in text rendering. We introduce Typography Control (TC) finetuning, an efficient parameter fine-tuning method, and enclosing typography control tokens (ETC-tokens), which enable precise word-level application of typographic features. To further enhance style control, we present a Style Control Adapter (SCA) that injects style information through image inputs independent of text prompts. Through comprehensive experiments, we demonstrate the effectiveness of our approach in achieving superior word-level typographic control, font consistency, and style consistency in Basic and Artistic Text Rendering (BTR and ATR) tasks. Our results mark a significant advancement in the precision and adaptability of T2I models, presenting new possibilities for creative applications and design-oriented tasks.

Download






Appearance and Pose-Guided Human Generation: A Survey

December 2023

·

69 Reads

·

9 Citations

ACM Computing Surveys

Appearance and pose-guided human generation is a burgeoning field that has captured significant attention. This subject’s primary objective is to transfer pose information from a target source to a reference image, enabling the generation of high-resolution images or videos that seamlessly link the virtual and real worlds, leading to novel trends and applications. This survey thoroughly illustrates the task of appearance and pose-guided human generation and comprehensively reviews mainstream methods. Specifically, it systematically discusses prior information, pose-based transformation modules, and generators, offering a comprehensive understanding and discussion of each mainstream pose transformation and generation process. Furthermore, the survey explores current applications and future challenges in the domain. Its ultimate goal is to serve as quick guidelines, providing practical assistance in human generation and its diverse applications.



Citations (13)


... To address these limitations, researchers have explored various inversion techniques for GANs and DMs, including GAN inversion [25,33,40,48,51,54,55] and DDIM/DPM inversion [20,43], which estimates the initial random noise in DMs by reversing the U-Net denoising process [14,38]. Null-Text Inversion, for instance, guides the diffusion model toward consistent edits by performing a pivotal inversion combined with a null-text optimization, improving edit fidelity [30]. ...

Reference:

Training-Free Consistency Pipeline for Fashion Repose
LoopNet for fine-grained fashion attributes editing
  • Citing Article
  • September 2024

Expert Systems with Applications

... Subsequently, researchers [29], [30], [31], [32] utilized collaborative filtering to model user preferences based on user interaction data for personalized recommendation problem. The third stage incorporates contextual information to better understand users' needs, where researchers consider either user characteristics (e.g., physical attributes [33], [34], hair-style [35], [36]) or environmental factors (e.g., occasion [37], weather [38]) to deliver more targeted recommendations. In practical applications, existing outfit recommendation systems [39], [30], [40], [7] typically operate as ranking-based methods. ...

Learning Visual Body-shape-Aware Embeddings for Fashion Compatibility
  • Citing Conference Paper
  • January 2024

... The creation of realistic virtual human images constitutes a pivotal area of research within the domains of computer vision and generative models. Among these, Pose-Guided Person Image Synthesis (PGPIS) is particularly focused on generating images that preserve the visual characteristics of a source image while adapting it to a specified target pose [14]. This technique finds applications in various fields, including virtual reality (VR) and ecommerce [28], and holds significance for tasks such as person re-identification [35] and the generation of sign language videos. ...

Appearance and Pose-Guided Human Generation: A Survey
  • Citing Article
  • December 2023

ACM Computing Surveys

... Simulation-based datasets [Bertiche et al. 2020;Black et al. 2023;Gundogdu et al. 2019;Jiang et al. 2020,?;Narain et al. 2012;Patel et al. 2020;Santesteban et al. 2019;Xiang et al. 2020;Zou et al. 2023] often include simulation-based datasets [Bertiche et al. 2020;Black et al. 2023;Gundogdu et al. 2019;Jiang et al. 2020;Narain et al. 2012;Patel et al. 2020;Santesteban et al. 2019;Xiang et al. 2020;Zou et al. 2023], which use physics engines to simulate and enhance the physical plausibility of synthetic 3D garments. While these datasets are more efficient to produce than 3D scanning datasets, they generally suffer from limited garment style diversity, poor garment deformation, and low-quality paired images, reducing their practical use for realworld image data tasks. ...

CLOTH4D: A Dataset for Clothed Human Reconstruction
  • Citing Conference Paper
  • June 2023

... For faster recognition, Kaur et al. [23] proposed a framework that combines CNNs with SURF for efficient clothing recognition. Zhu et al. [24] proposed sRA-Net to accurately obtain attributes representations by utilizing multiple latent relationships in clothing images to improve the performance of fashion attributes recognition. This paper treats fine-grained clothing attributes recognition as an object detection task. ...

Learning Structured Relation Embeddings for Fine-Grained Fashion Attribute Recognition
  • Citing Article
  • January 2023

IEEE Transactions on Multimedia

... In contrast, some researchers have aimed to capture a holistic view of outfit-level representations through bidirectional LSTMs ( [5]) or graph neural networks ( [4], [10]). Some studies also have shifted to attention-based methods, including the Transformer architecture for personalized outfit recommendations and complementary item retrieval ( [12], [21], [15]). ...

Towards private stylists via personalized compatibility learning
  • Citing Article
  • February 2023

Expert Systems with Applications

... The network uses gradient information to make its predictions, allowing it to take into account the relationships between different items in an outfit (Wang et al. 2019). Mo et al. (2022) presented a model for fashion compatibility assessment utilizing low and high-level features based on multilayer convolutional networks and Transformer for explainable evaluation and recommendation. Yang et al. (2021) uses the attribute information of fashion items to explain their compatibility. ...

Neural stylist: Towards online styling service
  • Citing Article
  • May 2022

Expert Systems with Applications

... The shape editing network operates with fashion image semantic parsing, while the appearance editing network generates the final RGB image in the subsequent stage. Wang et al. 23 proposed a clothing image attribute editing scheme tailored for fashion images, using a coarse-to-fine approach and landmark-based precise attribute region localization. UFE-Net 24 translates real clothing into design sketches and, guided by sketches, achieves realistic editing results through an alignment-driven editing module and a refinement module. ...

Coarse-to-Fine Attribute Editing for Fashion Images
  • Citing Chapter
  • January 2021

Lecture Notes in Computer Science

... AI technology is mainly utilized in marketing and virtual fitting in the field of fashion. In fashion marketing, previous researches have focused on AI techniques that detect consumers' preferences and lifestyles and recommend appropriate merchandise (Gong & Khalid, 2021;Kim & Lee, 2018;Liu et al., 2019;Zou & Wong, 2021). Alternatively, virtual fitting has lately been utilized to provide data on body shape and clothing prior to making online purchases or trying on clothes on in stores (Jiang et al., 2022;Seo & Lee, 2022;Thomas et al., 2022). ...

fAshIon after fashion: A Report of AI in Fashion
  • Citing Preprint
  • May 2021