Baoxin Li’s research while affiliated with Arizona State University and other places


Ad

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (270)


Flow diagram documenting the study selection process. Used with permission from Barrow Neurological Institute, Phoenix, Arizona.
Bar chart illustrating the yearly distribution of publications exploring the application of machine learning (ML) and artificial intelligence algorithms in surgical practice through hand and instrument tracking. Since 2018, there has been a shift in the preferred types of ML algorithms used in these studies, with deep learning techniques gaining prominence over traditional machine learning methods. Although applications of these technologies to surgical practice through hand and instrument tracking were relatively sparse between 2013 and 2017, research interest in these technologies has increased substantially since 2018. Used with permission from Barrow Neurological Institute, Phoenix, Arizona.
Categorization of artificial intelligence and machine learning. Artificial intelligence algorithms can be categorized into 2 groups: machine learning and non-machine learning. ML algorithms can be further categorized in 3 main groups: deep learning, reinforcement learning, and traditional machine learning. Furthermore, all these groups can be further divided into various subgroups. CNN, convolutional neural network; GAN, generative adversarial network; GNN, graph neural network; MLP, multilayer perceptron; RNN, recurrent neural network. Used with permission from Barrow Neurological Institute, Phoenix, Arizona.
Bar chart showing the surgical fields in which machine learning (ML) algorithms are most commonly applied through hand and instrument tracking methods. As illustrated in the chart, the use of ML algorithms in hand and instrument tracking methods is most widespread in general surgery, neurosurgery, and ophthalmology. However, research has also begun in various surgical fields, including oral surgery, orthopedics, plastic surgery, otolaryngology, urology, cardiac surgery, and thoracic surgery. The nonspecific surgery group includes studies focused on surgical simulation training, hand pose estimation, suturing, knot tying, and basic laparoscopic and endoscopic training, which do not belong to any particular surgical field. Used with permission from Barrow Neurological Institute, Phoenix, Arizona.
Vision-based method for surgical-instrument tracking. Loukas et al. (98) have proposed an adaptive color update for the endoscopic instrument-tracking process. This figure depicts the mean color values, including hue, saturation, and lightness, in relation to the instrument's movement away from the endoscope's camera. The marker's color significantly changes as the instrument moves in the training box. Used with permission from Loukas C, Lahanas V, Georgiou E. An integrated approach to endoscopic instrument tracking for augmented reality applications in surgical simulation training. Int J Med Robot. 2013;9(4):e34-51.

+7

Artificial intelligence integration in surgery through hand and instrument tracking: a systematic literature review
  • Literature Review
  • Full-text available

February 2025

·

99 Reads

Frontiers in Surgery

·

·

·

[...]

·

Mark C. Preul

Objective This systematic literature review of the integration of artificial intelligence (AI) applications in surgical practice through hand and instrument tracking provides an overview of recent advancements and analyzes current literature on the intersection of surgery with AI. Distinct AI algorithms and specific applications in surgical practice are also examined. Methods An advanced search using medical subject heading terms was conducted in Medline (via PubMed), SCOPUS, and Embase databases for articles published in English. A strict selection process was performed, adhering to PRISMA guidelines. Results A total of 225 articles were retrieved. After screening, 77 met inclusion criteria and were included in the review. Use of AI algorithms in surgical practice was uncommon during 2013–2017 but has gained significant popularity since 2018. Deep learning algorithms (n = 62) are increasingly preferred over traditional machine learning algorithms (n = 15). These technologies are used in surgical fields such as general surgery (n = 19), neurosurgery (n = 10), and ophthalmology (n = 9). The most common functional sensors and systems used were prerecorded videos (n = 29), cameras (n = 21), and image datasets (n = 7). The most common applications included laparoscopic (n = 13), robotic-assisted (n = 13), basic (n = 12), and endoscopic (n = 8) surgical skills training, as well as surgical simulation training (n = 8). Conclusion AI technologies can be tailored to address distinct needs in surgical education and patient care. The use of AI in hand and instrument tracking improves surgical outcomes by optimizing surgical skills training. It is essential to acknowledge the current technical and social limitations of AI and work toward filling those gaps in future studies.

Download

End‐to‐End 3D CycleGAN Model for Amyloid PET Harmonization

January 2025

·

2 Reads

Background Amyloid PET (Positron Emission Tomography) is crucial in detecting amyloid burden within the brain. However, the diversity of amyloid tracers and the scarcity of paired data significantly challenge the collaboration between cross‐center studies. In this research, we introduce a novel patch‐based 3D end‐to‐end image transformation model. This model works as a harmonization strategy, transferring the amyloid PET images from one tracer type to another. Method 51 florbetapir (FBP) and 604 PiB images from the Australian Imaging, Biomarkers and Lifestyle Study of Ageing (AIBL) were processed using established pipelines to extract regional standard uptake value ratios (SUVRs), mean cortical SUVRs (mcSUVRs), and SUVR images. 3D Cycle‐Consistent Generative Adversarial Networks (CycleGAN) was used to learn the end‐to‐end 3D image transformation using adversarial training strategies in conjunction with Resnet generators and multilayer discriminators within different tracer domains. Data augmentation techniques were applied to process the FBP images to balance the training samples and patch‐based learning was used throughout the experiment. The trained CycleGAN model was then applied to an independent dataset with 46 paired images from www.gaain.org/centiloid‐project for performance evaluation. Correlation analyses were conducted voxel‐wise and on mcSUVR, comparing the FBP/synthetic PiB to the true PiB data. The Structural Similarity Index Measure (SSIM) and Peak Signal‐to‐Noise Ratio (PSNR) were also evaluated between the synthetic and real PiB SUVR images. Result The synthetic PiB SUVR images were visually more similar to real PiB SUVR images than FBP. Voxel‐wise correlation improved from 0.942 between FBP and real PiB to 0.958 between the virtual and real PiB SUVR image (p < 0.0001). The agreement of mcSUVR improved from r = 0.909 to r = 0.954 (p<0.001) in the independent test dataset. The SSIM and PSNR between synthetic and real PiB are 0.762 and 25.370 in the independent dataset. Conclusion We proposed a novel end‐to‐end image transformation model for 3D PET image synthesis. The model finds the nonlinear mapping between different tracers and eliminates the requirement for paired training images. The result was confirmed using an independent dataset to demonstrate its effectiveness.



Deep Learning Detection of Hand Motion During Microvascular Anastomosis Simulations Performed by Expert Cerebrovascular Neurosurgeons

December 2024

·

59 Reads

World Neurosurgery

Objective: Deep learning enables precise hand tracking without the need for physical sensors, allowing for unsupervised quantitative evaluation of surgical motion and tasks. We quantitatively assessed the hand motions of experienced cerebrovascular neurosurgeons during simulated microvascular anastomosis using deep learning. We explored the extent to which surgical motion data differed among experts. Methods: A deep learning detection system tracked 21 landmarks corresponding to digit joints and the wrist on each hand of 5 expert cerebrovascular neurosurgeons. Tracking data for each surgeon was analyzed over long and short time intervals to examine gross movements and micromovements, respectively. Quantitative algorithms assessed the economy and flow of motion by calculating mean movement distances from the baseline median landmark coordinates and median times between sutures, respectively. Results: Tracking data correlated with specific surgical actions observed in microanastomosis video analysis. Economy of motion during suturing was calculated as 19, 26, 29, 27, and 28 pixels for surgeons 1, 2, 3, 4, and 5, respectively. Flow of motion during microanastomosis was 31.96, 29.40, 28.90, 7.37, and 47.21 secs for surgeons 1, 2, 3, 4, and 5, respectively. Conclusions: Hand tracking data showed similarities among experts, with low movements from baseline, minimal excess motion, and rhythmic suturing patterns. The data revealed unique patterns related to each expert’s habits and techniques. The results showed that surgical motion can be correlated with hand motion and assessed using mathematical algorithms. We also demonstrated the feasibility and potential of deep learning–based motion detection to enhance surgical training.


Enhancing Amyloid PET Quantification: MRI-Guided Super-Resolution Using Latent Diffusion Models

December 2024

·

19 Reads

Amyloid PET imaging plays a crucial role in the diagnosis and research of Alzheimer's disease (AD), allowing non-invasive detection of amyloid-β plaques in the brain. However, the low spatial resolution of PET scans limits the accurate quantification of amyloid deposition due to partial volume effects (PVE). In this study, we propose a novel approach to addressing PVE using a latent diffusion model for resolution recovery (LDM-RR) of PET imaging. We leverage a synthetic data generation pipeline to create high-resolution PET digital phantoms for model training. The proposed LDM-RR model incorporates a weighted combination of L 1 , L 2 , and MS-SSIM losses at both noise and image scales to enhance MRI-guided reconstruction. We evaluated the model's performance in improving statistical power for detecting longitudinal changes and enhancing agreement between amyloid PET measurements from different tracers. The results demonstrate that the LDM-RR approach significantly improves PET quantification accuracy, reduces inter-tracer variability, and enhances the detection of subtle changes in amyloid deposition over time. We show that deep learning has the potential to improve PET quantification in AD, effectively contributing to the early detection and monitoring of disease progression.


Overall architecture of kV2CTConverter
a Workflow of the proposed method. The raw kV images were augmented by GRSS to get adequate samples for model training. Then the processed images simultaneously went through dual models (i.e., primary model and secondary model) to generate the whole CT and the fractional CT that covered only the head region, respectively. Lastly, the full-size synthesized CT was achieved by overlaying and concatenating the outputs from two models according to their spatial relationship. b The model structure of both primary and secondary model. c. The details of the hierarchical ViT blocks in the encoder Ek. d The details of the hierarchical ViT blocks in the decoder Dr. e The detailed illustration of the window-based Multi-Head Attention (W-MHA), the tokenized patches were first spat to nW non-overlapped windows of a size of w × w and the attention was only calculated on the windows instead of the whole inputs.
Experimental results from a typical patient
a CDVH of the sCT from a typical patient. b.HU number profile comparison between sCT and gCT in both R-L and A-P directions.The green box showed the area where the discrepancy was large. c. Dose profile comparison between the doses calculated on the sCT and on the gCT in both both R-L and A-P directions. d Enlarged head region comparison between the kV2CTConverter and primary model only.
The orthogonal kV x-ray system used at our proton center and exemplary kV and DRR images
The orthogonal kV x-ray system is used for patient alignment at our proton center (left). An exemplary kV image (middle) captured by this system and its corresponding DRR image (right). Compared to the DRR image, the daily-used kV image is often noisy and contains unwanted artifacts from essential medical devices/accessories, such as dental implants or treatment couch attachments.
The histogram of shift error (SE) (in mm) from all 10 patients
The Y-axis shows the bin edges for the histogram. It is clear that the majority of the SEs are less than 0.4 mm, which is far smaller than the clinically acceptable patient alignment tolerance for H&N patients at our institution, set at 2-3 mm.
Accurate patient alignment without unnecessary imaging using patient-specific 3D CT images synthesized from 2D kV images

November 2024

·

17 Reads

Communications Medicine

Background In radiotherapy, 2D orthogonally projected kV images are used for patient alignment when 3D-on-board imaging (OBI) is unavailable. However, tumor visibility is constrained due to the projection of patient’s anatomy onto a 2D plane, potentially leading to substantial setup errors. In treatment room with 3D-OBI such as cone beam CT (CBCT), the field of view (FOV) of CBCT is limited with unnecessarily high imaging dose. A solution to this dilemma is to reconstruct 3D CT from kV images obtained at the treatment position. Methods We propose a dual-models framework built with hierarchical ViT blocks. Unlike a proof-of-concept approach, our framework considers kV images acquired by 2D imaging devices in the treatment room as the solo input and can synthesize accurate, full-size 3D CT within milliseconds. Results We demonstrate the feasibility of the proposed approach on 10 patients with head and neck (H&N) cancer using image quality (MAE: < 45HU), dosimetric accuracy (Gamma passing rate ((2%/2 mm/10%): > 97%) and patient position uncertainty (shift error: < 0.4 mm). Conclusions The proposed framework can generate accurate 3D CT faithfully mirroring patient position effectively, thus substantially improving patient setup accuracy, keeping imaging dose minimal, and maintaining treatment veracity.


Figure 4. Visual comparison of generated FBP scans using RL-RR and our LDM-RR to real FBP and T1-MRI for a sample from OASIS-3 cohort.
Summary of demographic information of the three cohorts included in this study.
Enhancing PET Quantification: MRI-Guided Super-Resolution Using Latent Diffusion Models

November 2024

·

8 Reads

Amyloid PET imaging plays a crucial role in the diagnosis and research of Alzheimer's disease (AD), allowing non-invasive detection of amyloid-β plaques in the brain. However, the low spatial resolution of PET scans limits accurate quantification of amyloid deposition due to partial volume effects (PVE). In this study, we propose a novel approach to addressing PVE using a latent diffusion model for resolution recovery (LDM-RR) of PET imaging. We leverage a synthetic data generation pipeline to create high-resolution PET digital phantoms for model training. The proposed LDM-RR model incorporates a weighted combination of L1, L2, and MS-SSIM losses at both noise and image scales to enhance MRI-guided reconstruction. We evaluated the model's performance in improving statistical power for detecting longitudinal changes and enhancing agreement between amyloid PET measurements from different tracers. Results demonstrate that the LDM-RR approach significantly improves PET quantification accuracy, reduces inter-tracer variability, and enhances the detection of subtle changes in amyloid deposition over time. We show that deep learning has the potential to improve PET quantification in AD, effectively contributing to early detection and monitoring of disease progression.


Figure 2: Overall architecture of Select and Packing Transformer (SPT). The hierarchical structure can generate features with various scales as common backbone networks. The SPA blocks in the last two stages can improve both efficiency and accuracy by disregarding uninformative tokens.
Figure 3: (a) Our SPA computes attention only for informative tokens. (b) Our SnP block selects informative tokens under multi-scale supervision and packs selected tokens for batch training and inference. The packed tokens attend to only tokens from the same image.
Figure 4: Under ground truth (GT) supervision, attending to only informative tokens can achieve better performance and efficiency.
Context-Aware Token Selection and Packing for Enhanced Vision Transformer

October 2024

·

9 Reads

In recent years, the long-range attention mechanism of vision transformers has driven significant performance breakthroughs across various computer vision tasks. However, the traditional self-attention mechanism, which processes both informative and non-informative tokens, suffers from inefficiency and inaccuracies. While sparse attention mechanisms have been introduced to mitigate these issues by pruning tokens involved in attention, they often lack context-awareness and intelligence. These mechanisms frequently apply a uniform token selection strategy across different inputs for batch training or optimize efficiency only for the inference stage. To overcome these challenges, we propose a novel algorithm: Select and Pack Attention (SPA). SPA dynamically selects informative tokens using a low-cost gating layer supervised by selection labels and packs these tokens into new batches, enabling a variable number of tokens to be used in parallelized GPU batch training and inference. Extensive experiments across diverse datasets and computer vision tasks demonstrate that SPA delivers superior performance and efficiency, including a 0.6 mAP improvement in object detection and a 16.4% reduction in computational costs.




Ad

Citations (54)


... For example, AI models trained on large sets of MRI data can identify hippocampal sclerosis, cortical dysplasia, and other structural brain anomalies associated with epilepsy, which are often missed by traditional visual analysis. [32]. ...

Reference:

Unlocking new frontiers in epilepsy through AI: From seizure prediction to personalized medicine
Brainomaly: Unsupervised Neurologic Disease Detection Utilizing Unannotated T1-weighted Brain MR Images

... Given the remarkable success of transformers in natural language processing (NLP), this architectural paradigm is progressively permeating diverse computer vision tasks (Vaswani et al. 2017;Bao et al. 2021;Touvron et al. 2021;He et al. 2022;Zhang et al. 2024;Konstantinidis et al. 2023;Yang, Kang, and Yang 2022;Kang et al. 2022;Ni et al. 2024a,b;Zhou et al. 2024;Fan, Tao, and Zhao 2024;. For instance, Vision Transformer (ViT) divides input images into 16 × 16 patches, which are subsequently treated as tokens for the application of the attention mechanism (Dosovitskiy et al. 2020). ...

Patch-based Selection and Refinement for Early Object Detection
  • Citing Conference Paper
  • January 2024

... Towards this goal, Video Scene Graph Generation (VidSGG) [18,52] has emerged as a critical task for capturing multi-object relationships across video frames. In particular, VidSGG enables high-level tasks such as event forecasting [36,43,45], video captioning [24,38,40], and video question answering [20,30,32,41] by constructing detailed representations of entities and their interactions. ...

Graph(Graph): A Nested Graph-Based Framework for Early Accident Anticipation
  • Citing Conference Paper
  • January 2024

... This design is different from the manner of using different numbers of convolutions for various spatial positions [21]. Unlike methods such as patch-based regions of interest [22,23], MFBlock aims at pixel-based selective computation. By using MFBlock, the local features can be adequately obtained and the computational costs remain low. ...

Transformer-Based Selective Super-resolution for Efficient Image Refinement
  • Citing Article
  • March 2024

Proceedings of the AAAI Conference on Artificial Intelligence

... Existing well-established general synthesizing methods [15], [16], struggle to synthesize CECTI from NCCTI due to the following limitations ( Fig. 1 feature representation of various soft-tissues (Challenge I and II). Although CNN-based feature extractors [17]- [20] and Transformer [21]- [23] have shown remarkable capabilities in local or global feature representation learning, the intrinsic locality of the convolution operator cannot establish the longrange dependencies between pixels, and Transformers only exploit the information between isolated pixel-pixel (querykey pairs) but leave rich surrounding contexts under-exploited. Therefore, enhancing the model's capability to detect and capture subtle variations in soft tissues, ligaments, and organs is essential. ...

PRNet: Pyramid Restoration Network for RAW Image Super-Resolution
  • Citing Article
  • January 2024

IEEE Transactions on Computational Imaging

... More recent studies have also explored deep neural networks to predict brain age using raw neuroimaging data [9,23,37,22,42,43,3] and results demonstrate that deep neural networks outperform traditional machine learning approaches given sufficient training data [3,9,21]. Since deep learning methods perform automatic feature extraction from raw structural MRI data, it allows capturing previously unseen imaging signatures related to aging in the brain and makes the model less prone to any biases from pre-processing steps, making it more generalizable. ...

A multi‐class deep learning model to estimate brain age while addressing systematic bias of regression to the mean

... We leverage the recently developed deep learning models, including TSMixer [86][87][88][89][90], FEDformer [91][92][93][94][95], LSTM [96][97][98], PatchTST [99,100], TimesNet [101][102][103], Transformer [104][105][106], MLP [63,[107][108][109], TCN [110][111][112][113], and iTransformer [114][115][116][117][118] for exchange rate prediction. All the models were trained using the MAE loss function. ...

Improving the Efficiency of CMOS Image Sensors through In-Sensor Selective Attention
  • Citing Conference Paper
  • May 2023

... More recent studies have also explored deep neural networks to predict brain age using raw neuroimaging data [9,23,37,22,42,43,3] and results demonstrate that deep neural networks outperform traditional machine learning approaches given sufficient training data [3,9,21]. Since deep learning methods perform automatic feature extraction from raw structural MRI data, it allows capturing previously unseen imaging signatures related to aging in the brain and makes the model less prone to any biases from pre-processing steps, making it more generalizable. ...

MRI signatures of Brain Age in the Alzheimer’s disease continuum

... To minimize the variability of amyloid PET measurements from different analytical pipelines, acquisition protocols, and tracers, the Centiloid scale was defined to linearly transform a particular measurement to this scale [5]. However, this Centiloid approach is designed for standardizing global measures and does not improve the between-measure agreements in terms of their shared variance [6][7][8]. We hypothesize that effective methods for spatial resolution recovery will improve PET quantification and reduce inter-tracer variabilities in amyloid PET measurements, and in this research, we propose a deep learning approach to achieve the goal. ...

Transfer learning based deep encoder decoder network for amyloid PET harmonization with small datasets

... Consequently, adaptive radiotherapy (ART) and rapid quality assurance (QA) were crucial. [5][6][7][8] ART often failed to capture a patient's real-time condition, [9][10][11] whereas online adaptive radiotherapy (OART) enabled real-time monitoring and adjustments, enhancing treatment precision, adaptability, efficiency, safety, and customization. 5,7 However, OART required more technical expertise and resources. ...

Deep‐learning based fast and accurate 3D CT deformable image registration in lung cancer