CellSAM: a foundational model for cell segmentation. CellSAMcombines SAM's mask generation and labeling capabilities with an object detection model to achieve automated inference. Input images are divided into regularly sampled patches and passed through a transformer encoder (e.g., a ViT) to generate information-rich image features. These image features are then sent to two downstream modules. The first module, CellFinder, decodes these features into bounding boxes using a transformer-based encoder-decoder pair. The second module combines these image features with prompts to generate masks using SAM's mask decoder. CellSAM integrates these two modules using the bounding boxes generated by CellFinder as prompts for SAM. CellSAM is trained in two stages, using the pre-trained SAM model weights as a starting point. In the first stage, we train the ViT and the CellFinder model together on the object detection task. This yields an accurate CellFinder but results in a distribution shift between the ViT and SAM's mask decoder. The second stage closes this gap by fixing the ViT and SAM mask decoder weights and fine-tuning the remainder of the SAM model (i.e., the model neck) using ground truth bounding boxes and segmentation labels.

CellSAM: a foundational model for cell segmentation. CellSAMcombines SAM's mask generation and labeling capabilities with an object detection model to achieve automated inference. Input images are divided into regularly sampled patches and passed through a transformer encoder (e.g., a ViT) to generate information-rich image features. These image features are then sent to two downstream modules. The first module, CellFinder, decodes these features into bounding boxes using a transformer-based encoder-decoder pair. The second module combines these image features with prompts to generate masks using SAM's mask decoder. CellSAM integrates these two modules using the bounding boxes generated by CellFinder as prompts for SAM. CellSAM is trained in two stages, using the pre-trained SAM model weights as a starting point. In the first stage, we train the ViT and the CellFinder model together on the object detection task. This yields an accurate CellFinder but results in a distribution shift between the ViT and SAM's mask decoder. The second stage closes this gap by fixing the ViT and SAM mask decoder weights and fine-tuning the remainder of the SAM model (i.e., the model neck) using ground truth bounding boxes and segmentation labels.

Source publication
Preprint
Full-text available
Cells are the fundamental unit of biological organization, and identifying them in imaging data - cell segmentation - is a critical task for various cellular imaging experiments. While deep learning methods have led to substantial progress on this problem, models that have seen wide use are specialist models that work well for specific domains. Met...

Contexts in source publication

Context 1
... this work, we developed CellSAM, a foundation model for cell segmentation (Fig. 1). CellSAM extends the SAM methodology to perform automated cellular instance segmentation. To achieve this, we first assembled a comprehensive dataset for cell segmentation spanning five different morphological archetypes. To automate inference with SAM, we took a prompt engineering approach and explored the best ways to prompt SAM to ...
Context 2
... training is complete, we use CellFinder to prompt SAM's mask decoder. We refer to the collective method as CellSAM; Figure 1 outlines an image's full path through CellSAM during inference. We benchmark CellSAM's performance using a suite of metrics (Figure 2c and 2d and Supplemental Figure S2) and find that it outperforms Cellpose models trained on comparable datasets. ...
Context 3
... the maximum number of cells per image is generally no more than 1000, we increased the number of queries q to 3500, 3.5 times the maximum number of cells, based on Fig. 12 in DETR 68 , which provides an estimate of the number of queries needed for a DETR method to detect all objects. We used only one pattern p for the Anchor generation as most objects in cellular detection are usually of similar ...

Similar publications

Article
Full-text available
An effective rehabilitation program can significantly hasten the recovery of patients. It promotes the metabolism of damaged tissues and aids in the seamless integration of morphology, function, and structure. With the emergence of sophisticated computer vision technologies, rehabilitation training has become more feasible. Depth imaging devices or...

Citations

... While there is much effort in the community to improve the process of cell segmentation from multiplexed imaging 36 , fundamental physical limitations such as image resolution (>1 μm), signal spillover, and 3D tissue slicing effects mean a perfect segmentation is likely impossible. Therefore, we have introduced an algorithm that explicitly acknowledges the input segmentation likely contains errors and uses knowledge of how these imperfections likely impact the quantified expression profiles to effectively denoise the data and learn the underlying cellular phenotypes. ...
Article
Full-text available
Spatial protein expression technologies can map cellular content and organization by simultaneously quantifying the expression of >40 proteins at subcellular resolution within intact tissue sections and cell lines. However, necessary image segmentation to single cells is challenging and error prone, easily confounding the interpretation of cellular phenotypes and cell clusters. To address these limitations, we present STARLING, a probabilistic machine learning model designed to quantify cell populations from spatial protein expression data while accounting for segmentation errors. To evaluate performance, we develop a comprehensive benchmarking workflow by generating highly multiplexed imaging data of cell line pellet standards with controlled cell content and marker expression and additionally established a score to quantify the biological plausibility of discovered cellular phenotypes on patient-derived tissue sections. Moreover, we generate spatial expression data of the human tonsil—a densely packed tissue prone to segmentation errors—and demonstrate cellular states captured by STARLING identify known cell types not visible with other methods and enable quantification of intra- and inter- individual heterogeneity.
... While substantial progress has been made in segmenting well-defined objects such as organs in computed tomography (CT) scans [1,2,3,4], cell segmentation presents unique challenges due to the complexity and variability of cell shapes [5,6,7,8]. Existing datasets for cell segmentation [9,10,11,12] often fall short in covering the full spectrum of cell types, hindering the effective training of deep learning models for diverse research applications. ...
Preprint
Deep learning has revolutionized medical and biological imaging, particularly in segmentation tasks. However, segmenting biological cells remains challenging due to the high variability and complexity of cell shapes. Addressing this challenge requires high-quality datasets that accurately represent the diverse morphologies found in biological cells. Existing cell segmentation datasets are often limited by their focus on regular and uniform shapes. In this paper, we introduce a novel benchmark dataset of Ntera-2 (NT2) cells, a pluripotent carcinoma cell line, exhibiting diverse morphologies across multiple stages of differentiation, capturing the intricate and heterogeneous cellular structures that complicate segmentation tasks. To address these challenges, we propose an uncertainty-aware deep learning framework for complex cellular morphology segmentation (MorphoSeg) by incorporating sampling of virtual outliers from low-likelihood regions during training. Our comprehensive experimental evaluations against state-of-the-art baselines demonstrate that MorphoSeg significantly enhances segmentation accuracy, achieving up to a 7.74% increase in the Dice Similarity Coefficient (DSC) and a 28.36% reduction in the Hausdorff Distance. These findings highlight the effectiveness of our dataset and methodology in advancing cell segmentation capabilities, especially for complex and variable cell morphologies. The dataset and source code is publicly available at https://github.com/RanchoGoose/MorphoSeg.
... Additionally, advanced deep learning methods, namely segment anything model (SAM) (2023) (23), had recently demonstrated impressive results in segmentation tasks. Inspiring SAM, a method called "CellSAM" (2023) was proposed by Israel et al. (24). Initially, CellSAM gets the prompts from a CellFinder which is a transformerbased object detector (DETR). ...
... Therefore, a direct and automatic segmentation of cells and nuclei from such images will have practical advantages, including (I) timely diagnosis due to no extra effort for fluorescent staining, (II) better results due to high resolution even for tiny sizes of subcellular structures, and (III) conformity for live-cell detection and tracking. Comparable to (24) and (25), our proposed segmentation model is also based on the SAM. However, it distinguishes itself from existing methods in the literature through two main contributions: (I) first we apply the most recent stateof-the-art object detection model, You Only Look at Once version 9 (YOLOv9), for the generation of bounding box prompts for SAM. ...
... Specifically, points, boxes, and texts are termed as sparse prompts while masks are referred to as dense prompts. In cellular structure segmentations proposed by (24), the use of box prompts was found while in (25) mask prompts were used. A prompt generator is simply an auxiliary model to find the potential locations of cells and nuclei. ...
Article
Full-text available
Background Light microscopy is a widely used technique in cell biology due to its satisfactory resolution for cellular structure analysis, prevalent availability of fluorescent probes for staining, and compatibility for the dynamic analysis of live cells. However, the segmentation of cells and nuclei from microscopic images is not a straightforward process because it has several challenges such as high variation in morphology and shape, the presence of noise and diverse contrast in backgrounds, clustering or overlapping nature of cells. Dealing with these challenges and facilitating more reliable analysis necessitates the implementation of computer-aided methods that leverage image processing techniques and deep learning algorithms. The major goal of this study is to propose a model, for instance segmentation of cells and nuclei, applying the most cutting-edge deep learning techniques. Methods A fine-tuned You Only Look at Once version 9 extended (YOLOv9-E) model is initially applied as a prompt generator to generate bounding box prompts. Using the generated prompts, a pre-trained segment anything model (SAM) is subsequently applied through zero-short inferencing to produce raw segmentation masks. These segmentation masks are then refined using non-max suppression and simple image processing methods such as image addition and morphological processing. The proposed method is developed and evaluated using an open-sourced dataset called Expert Visual Cell Annotation (EVICAN), which is relatively large and contains 4,738 microscopy images extracted from cross organs using different protocols. Results Based on the evaluation results on three different levels of EVICAN test sets, the proposed method demonstrates noticeable performances showing average mAP50 [mean average precision at intersection over union (IoU) =0.50] scores of 96.25, 95.05, and 94.18 for cell segmentation, and 68.04, 54.66, and 38.29 for nucleus segmentation on easy, medium, and difficult test sets, respectively. Conclusions Our proposed method for instance segmentation of cells and nuclei provided favorable performance compared to the existing methods in literature, indicating its potential utility as an assistive tool for cell culture experts, facilitating prompt and reliable analysis.
... "This helps the model focus on relevant cellular or tissue structures while ignoring some of the noise," Wang says. Numerous groups are now developing foundation models; this month, for example, Van Valen and his team posted a preprint describing their CellSAM algorithm 6 . Wang is optimistic that first-generation solutions will emerge in the next few years. ...
Article
Full-text available
Artificial intelligence (AI) is becoming a transformative force in the life sciences, pushing the boundaries of possibility. Imagine AI automating time-consuming tasks, uncovering hidden patterns in vast datasets, designing proteins in minutes instead of years, and even predicting disease outbreaks before they occur. This review explores the latest AI tools revolutionizing scientific fields, including research and data analysis, healthcare, and tools supporting scientific writing. Beyond data processing, AI is reshaping how scientists draft and share their findings, enhancing processes ranging from literature reviews to citation management. However, with great power comes great responsibility. Are we prepared for this leap? This review delves into the forefront of AI in the life sciences, where innovation meets responsibility.
Article
Full-text available
Generalist methods for cellular segmentation have good out-of-the-box performance on a variety of image types; however, existing methods struggle for images that are degraded by noise, blurring or undersampling, all of which are common in microscopy. We focused the development of Cellpose3 on addressing these cases and here we demonstrate substantial out-of-the-box gains in segmentation and image quality for noisy, blurry and undersampled images. Unlike previous approaches that train models to restore pixel values, we trained Cellpose3 to output images that are well segmented by a generalist segmentation model, while maintaining perceptual similarity to the target images. Furthermore, we trained the restoration models on a large, varied collection of datasets, thus ensuring good generalization to user images. We provide these tools as ‘one-click’ buttons inside the graphical interface of Cellpose as well as in the Cellpose API.
Article
Full-text available
The integration of artificial intelligence (AI) in medical diagnostics represents a significant advancement in managing upper gastrointestinal (GI) cancer, which is a major cause of global cancer mortality. Specifically for gastric cancer (GC), chronic inflammation causes changes in the mucosa such as atrophy, intestinal metaplasia (IM), dysplasia, and ultimately cancer. Early detection through endoscopic regular surveillance is essential for better outcomes. Foundation models (FMs), which are machine or deep learning models trained on diverse data and applicable to broad use cases, offer a promising solution to enhance the accuracy of endoscopy and its subsequent pathology image analysis. This review explores the recent advancements, applications, and challenges associated with FMs in endoscopy and pathology imaging. We started by elucidating the core principles and architectures underlying these models, including their training methodologies and the pivotal role of large-scale data in developing their predictive capabilities. Moreover, this work discusses emerging trends and future research directions, emphasizing the integration of multimodal data, the development of more robust and equitable models, and the potential for real-time diagnostic support. This review aims to provide a roadmap for researchers and practitioners in navigating the complexities of incorporating FMs into clinical practice for the prevention/management of GC cases, thereby improving patient outcomes.