Zhen Gao’s research while affiliated with Beijing Institute of Technology and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (261)


Token Communications: A Unified Framework for Cross-modal Context-aware Semantic Communications
  • Preprint
  • File available

February 2025

·

3 Reads

·

·

Zhen Gao

·

[...]

·

Dusit Niyato

In this paper, we introduce token communications (TokCom), a unified framework to leverage cross-modal context information in generative semantic communications (GenSC). TokCom is a new paradigm, motivated by the recent success of generative foundation models and multimodal large language models (GFM/MLLMs), where the communication units are tokens, enabling efficient transformer-based token processing at the transmitter and receiver. In this paper, we introduce the potential opportunities and challenges of leveraging context in GenSC, explore how to integrate GFM/MLLMs-based token processing into semantic communication systems to leverage cross-modal context effectively, present the key principles for efficient TokCom at various layers in future wireless networks. We demonstrate the corresponding TokCom benefits in a GenSC setup for image, leveraging cross-modal context information, which increases the bandwidth efficiency by 70.8% with negligible loss of semantic/perceptual quality. Finally, the potential research directions are identified to facilitate adoption of TokCom in future wireless networks.

Download

Pre-Equalization Aided Grant-Free Massive Access in Massive MIMO System

February 2025

·

6 Reads

The spatial diversity and multiplexing advantages of massive multi-input-multi-output (mMIMO) can significantly improve the capacity of massive non-orthogonal multiple access (NOMA) in machine type communications. However, state-of-the-art grant-free massive NOMA schemes for mMIMO systems require accurate estimation of random access channels to perform activity detection and the following coherent data demodulation, which suffers from excessive pilot overhead and access latency. To address this, we propose a pre-equalization aided grant-free massive access scheme for mMIMO systems, where an iterative detection scheme is conceived. Specifically, the base station (BS) firstly activates one of its antennas (i.e., beacon antenna) to broadcast a beacon signal, which facilitates the user equipment (UEs) to perform downlink channel estimation and pre-equalize the uplink random access signal with respect to the channels associated with the beacon antenna. During the uplink transmission stage, the BS detects UEs' activity and data by using the proposed iterative detection algorithm, which consists of three modules: coarse data detection (DD), data-aided channel estimation (CE), and fine DD. In the proposed algorithm, the joint activity and DD is firstly performed based on the signals received by the beacon antenna. Subsequently, the DD is further refined by iteratively performing data-aided CE module and fine DD module using signals received by all BS antennas. Our simulation results demonstrate that the proposed scheme outperforms state-of-the-art mMIMO-based grant-free massive NOMA schemes with the same access latency.


Token-Domain Multiple Access: Exploiting Semantic Orthogonality for Collision Mitigation

February 2025

·

30 Reads

Token communications is an emerging generative semantic communication concept that reduces transmission rates by using context and transformer-based token processing, with tokens serving as universal semantic units. In this paper, we propose a semantic multiple access scheme in the token domain, referred to as ToDMA, where a large number of devices share a tokenizer and a modulation codebook for source and channel coding, respectively. Specifically, the source signal is tokenized into sequences, with each token modulated into a codeword. Codewords from multiple devices are transmitted simultaneously, resulting in overlap at the receiver. The receiver detects the transmitted tokens, assigns them to their respective sources, and mitigates token collisions by leveraging context and semantic orthogonality across the devices' messages. Simulations demonstrate that the proposed ToDMA framework outperforms context-unaware orthogonal and non-orthogonal communication methods in image transmission tasks, achieving lower latency and better image quality.


SCSC: A Novel Standards-Compatible Semantic Communication Framework for Image Transmission

January 2025

·

12 Reads

Joint source-channel coding (JSCC) is a promising paradigm for next-generation communication systems, particularly in challenging transmission environments. In this paper, we propose a novel standard-compatible JSCC framework for the transmission of images over multiple-input multiple-output (MIMO) channels. Different from the existing end-to-end AI-based DeepJSCC schemes, our framework consists of learnable modules that enable communication using conventional separate source and channel codes (SSCC), which makes it amenable for easy deployment on legacy systems. Specifically, the learnable modules involve a preprocessing-empowered network (PPEN) for preserving essential semantic information, and a precoder \& combiner-enhanced network (PCEN) for efficient transmission over a resource-constrained MIMO channel. We treat existing compression and channel coding modules as non-trainable blocks. Since the parameters of these modules are non-differentiable, we employ a proxy network that mimics their operations when training the learnable modules. Numerical results demonstrate that our scheme can save more than 29\% of the channel bandwidth, and requires lower complexity compared to the constrained baselines. We also show its generalization capability to unseen datasets and tasks through extensive experiments.


Sensing-Enhanced Channel Estimation for Near-Field XL-MIMO Systems

January 2025

·

20 Reads

·

5 Citations

IEEE Journal on Selected Areas in Communications

Future sixth-generation (6G) systems are expected to leverage extremely large-scale multiple-input multiple-output (XL-MIMO) technology, which significantly expands the range of the near-field region. The spherical wavefront characteristics in the near field introduce additional degrees of freedom (DoFs), namely distance and angle, into the channel model, which leads to unique challenges in channel estimation (CE). In this paper, we propose a new sensing-enhanced uplink CE scheme for near-field XL-MIMO, which notably reduces the required quantity of baseband samples and the dictionary size . In particular, we first propose a sensing method that can be accomplished in a single time slot. It employs power sensors embedded within the antenna elements to measure the received power pattern rather than baseband samples. A time inversion algorithm is then proposed to precisely estimate the locations of users and scatterers, which offers a substantially lower computational complexity. Based on the estimated locations from sensing, a novel dictionary is then proposed by considering the eigen-problem based on the near-field transmission model, which facilitates efficient near-field CE with less baseband sampling and a more lightweight dictionary. Moreover, we derive the general form of the eigenvectors associated with the near-field channel matrix, revealing their noteworthy connection to the discrete prolate spheroidal sequence (DPSS). Simulation results unveil that the proposed time inversion algorithm achieves accurate localization with power measurements only, and remarkably outperforms various widely-adopted algorithms in terms of computational complexity. Furthermore, the proposed eigen-dictionary considerably improves the accuracy in CE with a compact dictionary size and a drastic reduction in baseband samples by up to 66%.


SCSC: A Novel Standards-Compatible Semantic Communication Framework for Image Transmission

January 2025

·

5 Reads

IEEE Transactions on Communications

Joint source-channel coding (JSCC) is a promising paradigm for next-generation communication systems, particularly in challenging transmission environments. In this paper, we propose a novel standard-compatible JSCC framework for the transmission of images over multiple-input multiple-output (MIMO) channels. Different from the existing end-to-end AI-based DeepJSCC schemes, our framework consists of learnable modules that enable communication using conventional separate source and channel codes (SSCC), which makes it amenable for easy deployment on legacy systems. Specifically, the learnable modules involve a preprocessing-empowered network (PPEN) for preserving essential semantic information, and a precoder & combiner-enhanced network (PCEN) for efficient transmission over a resource-constrained MIMO channel. We treat existing compression and channel coding modules as non-trainable blocks. Since the parameters of these modules are non-differentiable, we employ a proxy network that mimics their operations when training the learnable modules. Numerical results demonstrate that our scheme can save more than 29% of the channel bandwidth, and requires lower complexity compared to the constrained baselines. We also show its generalization capability to unseen datasets and tasks through extensive experiments.


Pre-Equalization Aided Grant-Free Massive Access in Massive MIMO System

January 2025

·

1 Read

IEEE Transactions on Vehicular Technology

The spatial diversity and multiplexing advantages of massive multi-input-multi-output (mMIMO) can significantly improve the capacity of massive non-orthogonal multiple access (NOMA) in machine type communications. However, state-of-the-art grant-free massive NOMA schemes for mMIMO systems require accurate estimation of random access channels to perform activity detection and the following coherent data demodulation, which suffers from excessive pilot overhead and access latency. To address this, we propose a pre-equalization aided grant-free massive access scheme for mMIMO systems, where an iterative detection scheme is conceived. Specifically, the base station (BS) firstly activates one of its antennas (i.e., beacon antenna) to broadcast a beacon signal, which facilitates the user equipment (UEs) to perform downlink channel estimation and pre-equalize the uplink random access signal with respect to the channels associated with the beacon antenna. During the uplink transmission stage, the BS detects UEs' activity and data by using the proposed iterative detection algorithm, which consists of three modules: coarse data detection (DD), data-aided channel estimation (CE), and fine DD. In the proposed algorithm, the joint activity and DD is firstly performed based on the signals received by the beacon antenna. Subsequently, the DD is further refined by iteratively performing data-aided CE module and fine DD module using signals received by all BS antennas. Our simulation results demonstrate that the proposed scheme outperforms state-of-the-art mMIMO-based grant-free massive NOMA schemes with the same access latency.


Integrated Location Sensing and Communication for Ultra-Massive MIMO With Hybrid-Field Beam-Squint Effect

January 2025

·

13 Reads

IEEE Journal on Selected Areas in Communications

The advent of ultra-massive multiple-input-multiple-output systems holds great promise for next-generation communications, yet their channels exhibit hybrid far- and near- field beam-squint (HFBS) effect. In this paper, we not only overcome but also harness the HFBS effect to propose an integrated location sensing and communication (ILSC) framework. During the uplink training stage, user terminals (UTs) transmit reference signals for simultaneous channel estimation and location sensing. This stage leverages an elaborately designed hybrid-field projection matrix to overcome the HFBS effect and estimate the channel in compressive manner. Subsequently, the scatterers’ locations can be sensed from the spherical wavefront based on the channel estimation results. By treating the sensed scatterers as virtual anchors, we employ a weighted least-squares approach to derive UT’s location. Moreover, we propose an iterative refinement mechanism, which utilizes the accurately estimated time difference of arrival of multipath components to enhance location sensing precision. In the following downlink data transmission stage, we leverage the acquired location information to further optimize the hybrid beamformer, which combines the beam broadening and focusing to mitigate the spectral efficiency degradation resulted from the HFBS effect. Extensive simulation experiments demonstrate that the proposed ILSC scheme has superior location sensing and communication performance than conventional methods.


Decentralized Likelihood Ascent Search-Aided Detection For Distributed Large-Scale MIMO Systems

January 2025

·

1 Read

IEEE Transactions on Wireless Communications

In this paper, we propose the decentralized likelihood ascent search (DLAS)-aided detection for the distributed large-scale multiple-input multiple-output (MIMO) systems to achieve more remarkable performance gains. With the help of DLAS, traditional distributed iterative methods are able to achieve better performance than the linear detection schemes such as ZF and MMSE. According to analysis, we derive the equivalent noise and the post-processing SNR for DLAS. More importantly, based on them, we demonstrate that the proposed DLAS-aided detection achieves the full received diversity. To further facilitate its implementation in practice, we design the decentralized effective ring (DER) architecture with significantly reduced bandwidth requirement and better parallel computation. Finally, simulation results demonstrate that the proposed DLAS-aided detection attains the same received diversity as ML detection while surpassing state-of-the-art decentralized schemes in terms of BER performance, with reduced complexity and bandwidth costs.


Emerging Space Communication and Network Technologies for Sixth-Generation Ubiquitous Connectivity

December 2024

·

7 Reads


Citations (38)


... Over the past two decades, multiple-input multiple-output (MIMO) technology has emerged as a key enabler for enhancing both the throughput and reliability of wireless communication systems [1], [2]. It is widely recognized that increasing the number of antennas can lead to higher spectral efficiency due to greater spatial multiplexing gains. ...

Reference:

Fluid Antenna Meets RIS: Random Matrix Analysis and Two-Timescale Design for Multi-User Communications
Sensing-Enhanced Channel Estimation for Near-Field XL-MIMO Systems
  • Citing Article
  • January 2025

IEEE Journal on Selected Areas in Communications

... As fifth-generation (5G) networks commercially deploys and research on sixth-generation (6G) progresses, conventional communication systems have revealed notable shortcomings in meeting the increasingly diverse and growing demands [1], [2]. In particularly, as the low-altitude economy continues to evolve [3], leveraging airspace for activities such as logistics and rescue operations, the need for seamless communication and real-time sensing becomes even more critical. Therefore, it is necessary to adopt a new approach that moves beyond traditional communication-centric network designs and embraces integrated sensing-communication networks. ...

Unauthorized UAV Countermeasure for Low-Altitude Economy: Joint Communications and Jamming Based on MIMO Cellular Systems

IEEE Internet of Things Journal

... LLM4CP is a pre-trained large language model (LLM) empowered method for predicting future downlink CSI based on historical uplink data, utilizing tailored modules for cross-modality knowledge transfer, achieving high accuracy with low training and inference costs in massive MIMO systems [43]. CSI-GPT integrates a swin transformerbased channel acquisition network with a variational autoencoder-based channel sample generator and federated-tuning to efficiently acquire downlink CSI in massive MIMO systems [44]. CSI-LLM is an LLM-based channel prediction approach, which models the historical variable-step sequences, utilizing the next-token generation ability of LLM to predict the CSI of the next step [45]. ...

CSI-GPT: Integrating Generative Pre-Trained Transformer With Federated-Tuning to Acquire Downlink Massive MIMO Channels

IEEE Transactions on Vehicular Technology

... Nevertheless, they usually rely on intricate mechanical movement systems and high-precision motion control and optimization algorithms, whereas directly regulating the antenna's radiative properties offers a more efficient alternative. In this paper, ERA refers specifically to antennas with reconfigurable radiation patterns, a class of systems also called reconfigurable massive MIMO [7] or electronically steerable passive array radiators (ESPAR) [8]. ...

Reconfigurable Massive MIMO: Precoding Design and Channel Estimation in the Electromagnetic Domain

IEEE Transactions on Communications

... Semantic communication is considered a revolutionary paradigm with the potential to transform the design and operation of 6G wireless communication systems [1]- [4]. Whereas, extracting semantics from source signals and redesigning wireless communication systems present significant challenges. ...

Deep Joint Semantic Coding and Beamforming for Near-Space Airship-Borne Massive MIMO Network
  • Citing Article
  • January 2024

IEEE Journal on Selected Areas in Communications

... [eess.SP] 6 Mar 2025 hybrid message passing (HMP) algorithm to recover sparse channel parameters. Separately, [15] designed a discrete prolate spheroidal sequence (DPSS)-based codebook and a twostep CE scheme, though it requires prior rough UE location estimation. Differing from these approaches, [16] explored the block sparsity shared between near-field and far-field angular domains and proposed a complex simultaneous logit-weighted block OMP (CSLW-BOMP) algorithm to exploit this structural characteristic. ...

DPSS-Based Codebook Design for Near-Field XL-MIMO Channel Estimation
  • Citing Conference Paper
  • June 2024

... In order to improve communication efficiency in FL, over-the-air computation (AirComp) has been introduced [13]- [20]. Using the superposition property of wireless signals, AirComp enables the simultaneous transmission and aggregation of model updates, reducing latency and improving scalability [21]- [23]. However, over-theair FL-the integration of AirComp and FL-remains susceptible to signal interference and noise, limiting its practical effectiveness. ...

Massive Digital Over-the-Air Computation for Communication-Efficient Federated Edge Learning
  • Citing Article
  • November 2024

IEEE Journal on Selected Areas in Communications

... Furthermore, D represents the proposed PSCNs size. In Figure 1, the STAR-RIS relay UAV elements are designed to operate in two simultaneous modes transmission and reflection, as outlined in references [63] and [13]. In the transmission mode, the STAR-RIS-UAV facilitates the passage of the incoming observation UAV signal by reconfiguring the signal propagation towards the EBS-T. ...

Reconfigurable Intelligent Surface Empowered Full Duplex Systems: Opportunities and Challenges

IEEE Communications Standards Magazine

... Findings indicate that the proposed algorithm enhances resource distribution between S&C in ISAC-enabled UAV networks. Similarly, this paper [74] presents a novel ISAC waveform design for mmWave UAV communications through OCDM, utilizing orthogonal chirp signals for dual S&C capabilities. The holistic design integrates OCDM with advanced FMCW, dedicating one subcarrier for sensing while enhancing communication data rates through others. ...

Orthogonal Chirp Division Multiplexing Waveform Design for 6G mmWave UAV Integrated Sensing and Communication

... For example, textual descriptions like "Panda plays ukulele at home" can now guide the creation of visually coherent short videos with precise semantic alignment [17], [18]. Building on these models' strengths in semantic understanding and content authenticity, recent studies [19]- [21] have demonstrated ultra-low bitrate image transmissions (e.g., <0.1 bit per pixel) by transmitting only highly compressed multi-modal semantics, such as text, edge maps, and embeddings. However, existing diffusion-based approaches for image semantic transmission fail to account for temporal frame correlations, making them insufficient for efficient video semantic communication. ...

Latency-Aware Generative Semantic Communications With Pre-Trained Diffusion Models

IEEE Wireless Communications Letters