National University of Singapore
Recent publications
Electrical bioadhesive interfaces (EBIs) are standing out in various applications, including medical diagnostics, prosthetic devices, rehabilitation, and human-machine interactions. Nonetheless, crafting a reliable and advanced EBI with comprehensive properties spanning electrochemical, electrical, mechanical, and self-healing capabilities remains a formidable challenge. Herein, we develop a self-healing EBI by thoughtfully integrating conducting polymer nanofibers and a typical bioadhesive within a robust hydrogel matrix. The accomplished EBI demonstrates extraordinary adhesion (lap shear strength of 197 kPa), exceptional electrical conductivity (2.18 S * Corresponding author at: Jiangxi Key 640 m − 1), and outstanding self-healing performance. Taking advantage of these attributes, we integrated the EBI into flexible skin electrodes for surface electromyography (sEMG) signal recording from forearm muscles. The engineered skin electrodes exhibit robust adhesion to the skin even when sweating, rapid self-healing from damage, and seamless real-time signal recording with a higher signal-to-noise ratio (39 dB). Our EBI, along with its skin electrodes, offers a promising platform for tissue-device integration, health monitoring, and an array of bioelectronic applications.
The miniatured two-photon microscope is promising for monitoring brain activities of free-behaving animals or in-situ endoscope diagnosis. Ultrathin and lightweight metalens presents a potential alternative to bulky refractive lenses for a miniaturized optical system. However, designing and fabricating achromatic metalens covering the near-infrared and visible light spectrum range is challenging. This paper proposes a metalens focusing at one wavelength while capable of multichannel microscopy due to the nonlinear properties of two-photon excitation. The metalens was designed solely to focus near-infrared excitation laser (1040 nm) using geometric phase regardless of the chromatic correction for visible emission light. Our method simplifies the design process, reduces manufacturing costs, and increases the focusing efficiency for the excitation light while maintaining a high transmittance for visible light. Using the metalens, two-photon biological imaging with three different labels was experimentally demonstrated with subcellular resolution.
Cognitive impairment is common in patients with end-stage renal disease (ESRD) and is associated with compromised quality of life and functional capacity, as well as worse clinical outcomes. Most previous research and reviews in this area were focused on objective cognitive impairment, whereas patients’ subjective cognitive complaints (SCCs) have been less well-understood. This systematic review aimed to provide a broad overview of what is known about SCCs in adult ESRD patients. Electronic databases were searched from inception to January 2022, which identified 221 relevant studies. SCCs appear to be highly prevalent in dialysis patients and less so in those who received kidney transplantation. A random-effects meta-analysis also shows that haemodialysis patients reported significantly more SCCs than peritoneal dialysis patients (standardised mean difference -0.20, 95% confidence interval -0.38 to -0.03). Synthesis of longitudinal studies suggests that SCCs remain stable on maintenance dialysis treatment but may reduce upon receipt of kidney transplant. Furthermore, SCCs in ESRD patients have been consistently associated with hospitalisation, depression, anxiety, fatigue, and poorer quality of life. There is limited data supporting a strong relation between objective and subjective cognition but preliminary evidence suggests that this association may be domain-specific. Methodological limitations and future research directions are discussed.
The global hospitality industry is fast-turning sustainable and environmentally friendly. Behaviour-driven energy conservation is an emerging green hotel operation strategy to support this change. The long-stay accommodation services have gained momentum in the hospitality sector since the COVID-19 pandemic. However, the characteristics of long-stay hotel guests are often overlooked in sustainable interventions. Based on an empirical survey in China, this study aims to explore the factors driving energy-saving behaviours of long-stay hotel guests and to compare their effects on guests for different visiting purposes (leisure, business, and extended-stay resident). The analysis indicates that attitude, personal norm and place attachment present a direct contribution to energy-saving behaviour. Besides, the results support that attitude and personal norm connect environmental values and energy-saving behaviour. Both altruistic and biospheric values have positive effects, while egoistic values seem to play a negative role. Biospheric values have stronger impact on attitude and personal norm of business guests. Place attachment has a stronger influence on extended-stay residents while its contribution to energy-saving behaviours of business guests is smaller than other guests. Besides, leisure guests are more sensitive to moral obligations. This research sheds novel lights on the psychological perspectives of the observed heterogeneity of energy-saving behaviours of hotel guests with different visiting purposes. The findings provide hotel operators with a novel theoretical reference for targeted energy-saving interventions to promote energy-saving actions of long-term hotel guests. The study, therefore, can contribute to sustainable tourism policymaking and behaviour-driven hotel energy management.
Continuum manipulators have infinite degrees of freedom and high flexibility, making it challenging for accurate modeling and control. Some common modeling methods include mechanical modeling strategy, neural network strategy, constant curvature assumption, etc. However, the inverse kinematics of the mechanical modeling strategy is difficult to obtain while a strategy using neural networks may not converge in some applications. For algorithm implementation, the constant curvature assumption is used as the basis to design the controller. When the driving wire is tight, the linear controller under constant curvature assumption works well in manipulator position control. However, this assumption of linearity between the deformation angle and the driving input value breaks upon repeated use of the driving wires which get inevitably lengthened. This degrades the accuracy of the controller. In this work, the Koopman theory is proposed to identify the nonlinear model of the continuum manipulator. Under the linearized model, the control input is obtained through model predictive control (MPC). As the lifted function can affect the effectiveness of the Koopman operator-based MPC (K-MPC), a novel design method of the lifted function through the Legendre polynomial is proposed. To attain higher control efficiency and computational accuracy, a selective control scheme according to the state of the driving wires is proposed. When the driving wire is tight, the linear controller is employed; otherwise, the K-MPC is adopted. Finally, a set of static and dynamic experiments has been conducted using an experimental prototype. The results demonstrate high effectiveness and good performance of the selective control scheme.
In the context of COVID-19, numerous people present their opinions through social networks. It is thus highly desired to conduct sentiment analysis towards COVID-19 tweets to learn the public's attitudes, and facilitate the government to make proper guidelines for avoiding the social unrest. Although many efforts have studied the text-based sentiment classification from various domains ( e.g. , delivery and shopping reviews), it is hard to directly use these classifiers for the sentiment analysis towards COVID-19 tweets due to the domain gap. In fact, developing the sentiment classifier for COVID-19 tweets is mainly challenged by the limited annotated training dataset, as well as the diverse and informal expressions of user-generated posts. To address these challenges, we construct a large-scale COVID-19 dataset from Weibo and propose a dual COnsistency-enhanced semi-superVIseD network for Sentiment Anlaysis (COVID-SA). In particular, we first introduce a knowledge-based augmentation method to augment data and enhance the model's robustness. We then employ BERT as the text encoder backbone for both labeled data, unlabeled data, and augmented data. Moreover, we propose a dual consistency ( i.e. , label-oriented consistency and instance-oriented consistency) regularization to promote the model performance. Extensive experiments on our self-constructed dataset and three public datasets show the superiority of COVID-SA over state-of-the-art baselines on various applications.
This paper presents a systematic and comprehensive survey on blockchain interoperability, where interoperability is defined as the ability of blockchains to flexibly transfer assets, share data, and invoke smart contracts across a mix of public, private, and consortium blockchains without any changes to the underlying blockchain systems. Analyzing the vast landscape of both research papers and industry projects, we classify the existing works into five categories, namely, (1) sidechains, (2) notary schemes, (3) hashed time lock contracts (HTLC), (4) relays, and (5) blockchain agnostic protocols. We analyze the existing works under a taxonomy that consists of system and safety characteristics, such as decentralization, direction of communication, locking mechanism, verification mechanism, trust, safety, liveness, and atomicity. Different from other surveys, we are the first to evaluate the performance of some representative interoperability approaches between Bitcoin and Ethereumcovering sidechains, notary schemes, and HTLCs. Even though the performance of cross-chain transactions is low (typically fewer than 10 transactions per second), the main reason is the underlying blockchain ( e.g. , Bitcoin and Ethereum) and not the interoperability approach. Finally, we discuss existing challenges and possible research directions inblockchain interoperability. For example, we identify challenges in interoperability acrosspermissioned and permissionless blockchains, in interacting with scripting blockchains, in security and privacy.
Diseases affecting the esophagus are common. However, targeted drug delivery to the esophagus is challenging due to the anatomy and physiology of this organ. Current pharmacological treatment for esophageal diseases predominantly relies on the off-label use of drugs in various dosage forms, including those for systemic drug delivery (e.g. oral tablets, sublingual tablets, and injections) and topical drug delivery (e.g. metered dose inhaler, viscous solution or suspension, and endoscopic injection into the esophagus). In general, systemic therapy has shown the most efficacy but requires the use of high drug doses to achieve effective concentrations in the esophagus, which increases the risk of adverse effects and toxicity. Topical drug delivery has enormous potential in improving the way we treat patients with acute and chronic esophageal diseases, especially those requiring drugs that have low therapeutic index and/or significant adverse effects to non-targeted organs and tissues. This review will address the physiological, pathophysiological, and pharmaceutical considerations influencing topical drug delivery in the esophagus. The main conventional (e.g. liquid formulations, orodispersible tablets, lozenges, pastilles, troches, chewing gum) and innovative (e.g. stent-based, film-based, nanoparticulate-based) drug delivery approaches will be comprehensively discussed, along with the developments to improve their effectiveness for topical esophageal drug delivery. The translational challenges and future clinical advances of this research will also be discussed.
In this article, we explore the potential of enhancing academic prose and idea generation by fine-tuning a large language model (here, GPT-3) on one’s own previously published writings: AUTOGEN (‘AI Unique Tailored Output GENerator’). We develop, test, and describe three distinct AUTOGEN models trained on the prior scholarly output of three of the current authors (SBM, BDE, JS), with a fourth model trained on the combined works of all three. Our AUTOGEN models demonstrate greater variance in quality than the base GPT-3 model, with many outputs outperforming the base model in format, style, overall quality, and novel idea generation. As proof of principle, we present and discuss examples of AUTOGEN-written sections of existing and hypothetical research papers. We further discuss ethical opportunities, concerns, and open questions associated with personalized academic prose and idea generators. Ethical opportunities of personalized LLMs such as AUTOGEN include increased productivity, preservation of writing styles and cultural traditions, and aiding consensus building. However, ethical concerns arise due to the potential for personalized LLMs to reduce output diversity, violate privacy and intellectual property rights, and facilitate plagiarism or fraud. The use of co-authored or multiple-source trained models further complicates issues surrounding ownership and attribution. Open questions concern a potential credit-blame asymmetry for LLM outputs, the legitimacy of licensing agreements in authorship ascription, and the ethical implications of co-authorship attribution for data contributors. Ensuring the output is sufficiently distinct from the source material is crucial to maintaining ethical standards in academic writing. These opportunities, risks, and open issues highlight the intricate ethical landscape surrounding the use of personalized LLMs in academia. We also discuss open technical questions concerning the integration of AUTOGEN-style personalized LLMs with other LLMs, such as GPT-4, for iterative refinement and improvement of generated text. In conclusion, we argue that AUTOGEN-style personalized LLMs offer significant potential benefits in terms of both prose generation and, to a lesser extent, idea generation. If associated ethical issues are appropriately addressed, AUTOGEN alone or in combination with other LLMs can be seen as a potent form of academic enhancement. As a note to readers, this abstract was generated by AUTOGEN and edited for accuracy by the authors. The rest of the text was written manually.
Objective: To evaluate the relative importance of overall and period-specific postnatal growth and their interaction with fetal growth on cognition in a generally well-nourished population. Study design: We included 1052 children from Project Viva, a prospective cohort in Boston, Massachusetts. Using linear spline mixed-effects models, we modeled length/height and body mass index (BMI) trajectories from birth to seven years and estimated standardized overall (0-7 years) and period-specific growth velocities ie, early infancy (0-4 months), late infancy (4-15 months), toddlerhood (15-37 months), and early childhood (37-84 months). We investigated associations of growth velocities as well as their interactions with birthweight-for-gestational age on mid-childhood (mean age: 7.9 years) intelligence quotient (IQ), visual memory and learning, and visual motor ability. Results: Greater overall height velocity was associated with modestly higher design memory score, (adjusted β [95% CI] 0.19 [-0.01,0.38] p=0.057])points per standard deviation (SD) increase but lower verbal IQ (-0.88 [-1.76,0.00} p=0.051]. Greater early infancy height velocity was associated with higher visual motor score (1.92 {0.67,3.18]). Greater overall BMI velocity was associated with lower verbal IQ (-0.71 [-1.52,0.11] p=0.090). Greater late infancy BMI velocity was associated with lower verbal IQ (-1.21 [-2.07,-0.34]), design memory score (-0.22 [-0.42,-0.03)], but higher picture memory score [0.22 (0.01,0.43)]. Greater early infancy height velocity (-1.5 SD vs. 1.5 SD) was associated with higher non-verbal IQ (margins [95% CI] 102.6 [98.9,106.3] vs. 108.2 [104.9,111.6]) among small-for-gestational age infants (P-interaction=0.04). Conclusions: Among generally well-nourished children, there might not be clear cognitive gains with faster linear growth except for those with lower birthweight-for-gestational age, revealing the potential importance of early infancy compensatory growth.
In this article, we study the adaptive stability for parabolic partial differential equation (PDE)-ordinary differential equation (ODE) cascade systems with actuator dynamics, where the actuator dynamics are nonlinear subject to unknown parameters. Compared with a class of PDE–ODE coupled systems that the control input only acts on the PDE boundary and the linear sandwiched system without uncertainty, the structure of such systems is more complex. First of all, infinite-dimensional backstepping transformation is adopted. The original PDE-ODE cascade system is changed to a new system that is easier to design. On this basis, finite-dimensional backstepping transformation and adaptive compensation technology are combined to develop a state-feedback controller. Then, the boundedness of all the signals in the closed-loop system is proved by the Lyapunov functional analysis. Furthermore, the control law and the original system states eventually converge to zero. Finally, different simulation data are presented to illustrate the validity of the theoretical results.
In this letter, a new design scheme for rectangular patch antenna with harmonic suppression is proposed and analyzed. Firstly, field distributions of the resonant modes on rectangular patch are calculated and used to provide theoretical guidance for arranging feeding strategy. Subsequently, by reasonably placing two pairs of differential exciting signals at the virtual electrical wall of partial high-order modes on the rectangular patch antenna, harmonic suppression can be initially obtained. Moreover, to implement this equal amplitude and out-of-phase signals, a 1-to-4 power divider based on microstrip to slotline transitions is conceived and realized. Finally, a prototype antenna is fabricated and tested to prove the feasibility of the theoretical predictions. Measured results reveal the demonstrator realizes an impedance bandwidth of 8.6% (3.12 GHz-3.4 GHz). Most importantly, compared with the traditional filtering antenna, the suppression of harmonics around 3.4 f <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">0</sub> ( f <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">0</sub> is the center frequency of the antenna) is successfully acquired.
We investigate the procedure of semi-parametric maximum likelihood estimation under constraints on summary statistics. Such a procedure results in a discrete probability distribution supported on the data points that maximizes the likelihood among all distributions supported on the data points satisfying the specified constraints (called estimating equations). The resultant distribution is an approximation of the underlying population distribution. The study of such empirical likelihood estimation originates from the seminal work of Owen [1], [2]. We investigate this procedure in the setting of misspecified (or biased) constraints, i.e., when the null hypothesis is not true. We establish that the behavior of the optimal weight distribution under such misspecification differ markedly from their properties under the null, i.e., when the estimating equations are correctly specified (or unbiased). This is manifested by certain “singularities” in the optimal distribution, that are not observed under the null. Furthermore, we establish an anomalous behavior of the log-likelihood based Wilks’ statistic, which, unlike under the null, does not exhibit a chi-squared limit. In the Bayesian setting, we establish the posterior consistency of procedures based on these ideas, where instead of a parametric likelihood, an empirical likelihood is used to define the posterior distribution. In particular, we show that this posterior, as a random probability measure, rapidly converges, with explicit convergence guarantees, to the delta measure at the true parameter value. We also illustrate implications of our results in diverse settings such as degeneracies in exponential random graph models (ERGM) for random networks [3], [4], empirical procedures where the constraints are themselves estimated from data [5], and to approximate Bayesian computation based procedures [6], [7]. A novel feature of our work is to connect the likelihood maximization problem to critical points of random polynomials. This yields the mass of the singular weight in the optimal weight distribution as the leading term in a canonical expansion of a critical point of a random polynomial. Our work unveils the possibility that similar random polynomial based techniques could be effective in analyzing a wide class of problems in related areas.
Inspired by many-body effects, we propose a novel design for Boltzmann machine-based invertible logic using probabilistic bits. A CMOS-based XNOR gate is derived to serve as the hardware implementation of many-body interactions and an invertible logic family is built based on this design. Compared to the conventional two-body-based design framework, the many-body-based design enables compact configuration and provides the simplest binarized energy landscape for fundamental invertible logic gates. Furthermore, we demonstrate the composability of the many-body-based invertible logic circuit by merging modular building blocks into large-scale integer factorizers. To optimize the energy landscape of large-scale combinatorial invertible logic circuits, we introduce degeneracy in energy levels which enlarges the probabilities for the lowest states. Circuit simulations of our integer factorizers reveal a significant boost in factorization accuracy. An example of a 2-bit × 2-bit integer factorizer demonstrated an increment of factorization accuracy from 64.99% to 91.44% with a reduction in the number of energy levels from 32 to 9. Similarly, our 6-bit × 6-bit integer factorizer increases the accuracy from 4.430% to 83.65% with the many-body design. Overall, the many-body-based design scheme provides promising results for future invertible logic circuit designs.
In this work, we introduce an effective and versatile technique employing replacement electrode solid phase epitaxy (SPE) to realize grain size reduction of the Zr-doped HfO $_{\text{2}}$ (HZO) ferroelectric layer. This technique achieves a significant reduction in the grain size of the HZO layer by approximately 30% and simultaneously enhances the remnant polarization ( $\textit{P}_{\textit{r}}$ ) by 42% compared to the conventional atomic layer deposition (ALD) growth technique. As a result, a relatively high $\textit{P}_{\textit{r}}$ value of approximately 25 $\mu$ C/cm $^{\text{2}}$ was achieved with a low thermal budget of 400 $^{\circ}$ C. In addition, we propose the underlying mechanism behind the grain size-dependent ferroelectric properties, guided by thermodynamics-included first principle simulations for the nucleation process and kinetics effects included analysis for phase change. It is discovered that the reduction in grain size plays a key role in decreasing the m -phase and enhancing the oxygen vacancy effect, which leads to a significant improvement in P $_{\textit{r}}$ .
Harnessing exotic optical forces promises a plethora of biophysical applications and novel light-matter interactions. The exotic optical pulling force (OPF) and optical lateral force (OLF) have been studied separately, yet synthesizing both candidates simultaneously remains an unsolved challenge and could offer a more powerful manoeuvre of particles. Here, we report a coordinated scheme to harness these two forces together and present a dynamically controlled two-dimensional (2D) microvehicle. The strategy is to leverage unexplored helicity-dependent features of both forces, while the particle size and incident angle of light can also reverse optical forces. The underlying physics of the pulling-lateral force is beyond the dipole approximation, and can be the combined effect from the linear momentum transfer, spin-orbit interactions, etc. Notably, the ratio of both forces can be dynamically and arbitrarily controlled by the ellipticity of incident light solely. The configured 2D microvehicle provides a nontrivial recipe other than using metastructures which require exquisite designs and subtle fabrication processes.
In this paper, we explore the connection between secret key agreement and secure omniscience within the setting of the multiterminal source model with an eavesdropper having side information. While the secret key agreement problem considers the generation of a maximum-rate secret key through public discussion, the secure omniscience problem is concerned with communication protocols for omniscience that minimize the rate of information leakage to the eavesdropper. The starting point of our work is a lower bound on the minimum leakage rate for omniscience, R <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">L</sub> , in terms of the wiretap secret key capacity, C <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">W</sub> . Our interest is in identifying broad classes of sources for which this lower bound is met with equality, in which case we say that there is a duality between secure omniscience and secret key agreement. We show that this duality holds in the case of certain finite linear source (FLS) models, such as two-terminal FLS models and pairwise independent network models on trees with a linear eavesdropper. Duality also holds for any FLS model in which C <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">W</sub> is achieved by a perfect linear secret key agreement scheme. We conjecture that the duality in fact holds unconditionally for any FLS model. On the negative side, we give an example of a (non-FLS) source model for which duality does not hold if we limit ourselves to communication-for-omniscience protocols with at most two (interactive) communications. We also address the secure function computation problem and explore the connection between the minimum leakage rate for computing a function and the wiretap secret key capacity.
  • Ying Zhang
    Ying Zhang
  • Roger Zimmermann
    Roger Zimmermann
  • Zhiwen Yu
    Zhiwen Yu
  • Bin Guo
    Bin Guo
Face recognition is one of the most well-adopted ways to verify someone’s identity, which however might be spoofed by presenting for example a fake image or video. Therefore, it is essential to include additional face liveness detection for a safer application. Among the existing solutions, it is common to plug in a separate model for face liveness detection. Therefore, the current authentication platform will provide two models in order to provide a safe authentication. However, in many practical scenarios, the platform (e.g., IoT devices) has limited resources in terms of computation power and storage, and this may prevent the two-models from being deployed successfully. Observed that both recognition and liveness detection work on the same face image, we believe it is possible to integrate two functions into a unified model, which will reduce the computational workload and storage requirements. To achieve this, we explore two works with different model designs, research focuses, and potential solutions. In the first work, we try to enhance a usual face recognition model with additional task capability without any additional storage cost. Concretely, we first analyze the two task’s relationship, and by a mathematics formulation, we insert the observed dual-task relationship to a novel deep model with distance-ranking feature. The training of the model focuses on the feature-learning and it does not directly use the task ground truth labels, which makes the model has a good generalization capability on new data. We have conducted experiments on a benchmark dataset and the results show that our average performance has a minimal 15% improvement compared to the baselines. In the second work, we adopt the classic multi-task learning model to combine the two tasks. Rather than using a deep multi-task model, we compress the original deep model to a lightweight version. Additionally, in order to compensate the performance degradation due to compression, a multi-teacher assisted knowledge distillation is applied where a good balance between accuracy and model size is achieved.
  • Yuchen Tian
    Yuchen Tian
  • Jiacheng Wang
    Jiacheng Wang
  • Yueming Jin
    Yueming Jin
  • Liansheng Wang
    Liansheng Wang
Federated learning (FL) has recently been applied to skin lesion analysis, but the challenges of huge communication requirements and non-independent and identical distributions have not been fully addressed. The former problem arises from model parameter transfer between the server and clients, and the latter problem is due to differences in imaging protocols and operational customs. To reduce communication costs, dataset distillation methods have been adopted to distill thousands of real images into a few synthetic images (1 image per class) in each local client, which are then used to train a global model in the server. However, these methods often overlook the possible inter-client distribution drifts, limiting the performance of the global model. In this paper, we propose a generalizable dataset distillation-based federated learning (GDD-FL) framework to achieve communication-efficient federated skin lesion classification. Our framework includes the generalization dataset distillation (GDD) method, which explicitly models image features of the dataset into an uncertain Gaussian distribution and learns to produce synthetic images with features close to this distribution. The uncertainty in the mean and variance of the distribution enables the synthetic images to obtain diverse semantics and mitigate distribution drifts. Based on the GDD method, we further develop a communication-efficient FL framework that only needs to transmit a few synthesized images once for training a global model. We evaluate our approach on a large skin lesion classification dataset and compare it with existing dataset distillation methods and several powerful baselines. Our results show that our model consistently outperforms them, particularly in comparison to the classical FL method. All resources can be found at https://github.com/jcwang123/GDD-FL.
  • An Wang
    An Wang
  • Mobarakol Islam
    Mobarakol Islam
  • Mengya Xu
    Mengya Xu
  • [...]
  • Hongliang Ren
    Hongliang Ren
The Segment Anything Model (SAM) serves as a fundamental model for semantic segmentation and demonstrates remarkable generalization capabilities across a wide range of downstream scenarios. In this empirical study, we examine SAM’s robustness and zero-shot generalizability in the field of robotic surgery. We comprehensively explore different scenarios, including prompted and unprompted situations, bounding box and points-based prompt approaches, as well as the ability to generalize under corruptions and perturbations at five severity levels. Additionally, we compare the performance of SAM with state-of-the-art supervised models. We conduct all the experiments with two well-known robotic instrument segmentation datasets from MICCAI EndoVis 2017 and 2018 challenges. Our extensive evaluation results reveal that although SAM shows remarkable zero-shot generalization ability with bounding box prompts, it struggles to segment the whole instrument with point-based prompts and unprompted settings. Furthermore, our qualitative figures demonstrate that the model either failed to predict certain parts of the instrument mask (e.g., jaws, wrist) or predicted parts of the instrument as wrong classes in the scenario of overlapping instruments within the same bounding box or with the point-based prompt. In fact, SAM struggles to identify instruments in complex surgical scenarios characterized by the presence of blood, reflection, blur, and shade. Additionally, SAM is insufficiently robust to maintain high performance when subjected to various forms of data corruption. We also attempt to fine-tune SAM using Low-rank Adaptation (LoRA) and propose SurgicalSAM, which shows the capability in class-wise mask prediction without prompt. Therefore, we can argue that, without further domain-specific fine-tuning, SAM is not ready for downstream surgical tasks.
Institution pages aggregate content on ResearchGate related to an institution. The members listed on this page have self-identified as being affiliated with this institution. Publications listed on this page were identified by our algorithms as relating to this institution. This page was not created or approved by the institution. If you represent an institution and have questions about these pages or wish to report inaccurate content, you can contact us here.
33,337 members
Soo Chin Liew
  • Centre for Remote Imaging, Sensing and Processing (CRISP)
C.T. Lim
  • Department of Biomedical Engineering
Jiwei Qian
  • East Asian Institute
Nilmani Saha
  • Department of Paediatrics
Information
Address
6 Science Drive 2, 117546, Singapore, Singapore