[Show abstract][Hide abstract] ABSTRACT: This paper introduces a new algorithm for recognizing surgical tasks in real-time in a video stream. The goal is to communicate information to the surgeon in due time during a video-monitored surgery. The proposed algorithm is applied to cataract surgery, which is the most common eye surgery. To compensate for eye motion and zoom level variations, cataract surgery videos are first normalized. Then, the motion content of short video subsequences is characterized with spatiotemporal polynomials: a multiscale motion characterization based on adaptive spatiotemporal polynomials is presented. The proposed solution is particularly suited to characterize deformable moving objects with fuzzy borders, which are typically found in surgical videos. Given a target surgical task, the system is trained to identify which spatiotemporal polynomials are usually extracted from videos when and only when this task is being performed. These key spatiotemporal polynomials are then searched in new videos to recognize the target surgical task. For improved performances, the system jointly adapts the spatiotemporal polynomial basis and identifies the key spatiotemporal polynomials using the multipleinstance learning paradigm. The proposed system runs in realtime and outperforms the previous solution from our group, both for surgical task recognition (Az = 0:851 on average, as opposed to Az = 0:794 previously) and for the joint segmentation and recognition of surgical tasks (Az = 0:856 on average, as opposed to Az = 0:832 previously).
IEEE Transactions on Medical Imaging 10/2014; 34(4). DOI:10.1109/TMI.2014.2366726 · 3.39 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The automatic detection of exudates in color eye fundus images is an important task in applications such as diabetic retinopathy screening. The presented work has been undertaken in the framework of the TeleOphta project, whose main objective is to automatically detect normal exams in a tele-ophthalmology network, thus reducing the burden on the readers. A new clinical database, e-ophtha EX, containing precisely manually contoured exudates, is introduced. As opposed to previously available databases, e-ophtha EX is very heterogeneous. It contains images gathered within the OPHDIAT telemedicine network for diabetic retinopathy screening. Image definition, quality, as well as patients condition or the retinograph used for the acquisition, for example, are subject to important changes between different examinations. The proposed exudate detection method has been designed for this complex situation. We propose new preprocessing methods, which perform not only normalization and denoising tasks, but also detect reflections and artifacts in the image. A new candidates segmentation method, based on mathematical morphology, is proposed. These candidates are characterized using classical features, but also novel contextual features. Finally, a random forest algorithm is used to detect the exudates among the candidates. The method has been validated on the e-ophtha EX database, obtaining an AUC of 0.95. It has been also validated on other databases, obtaining an AUC between 0.93 and 0.95, outperforming state-of-the-art methods.
Medical Image Analysis 10/2014; 18(7). DOI:10.1016/j.media.2014.05.004 · 3.65 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: We investigate the maximal performance that can be measured for automated binary decision systems in terms of area under the ROC curve (AUC), against a reference standard provided by human readers. The goal is to determine the required characteristics of the reference standard to assess and compare automated decision systems with a given degree of confidence, or, to determine what degree of confidence can be obtained given the characteristics of the reference standard. We modeled the expected value of the AUC that can be measured for a perfect decision system, given a reference standard provided either by a single human reader or by multiple human readers (consensus, majority vote). The proposed model was applied to diabetic retinopathy screening in a dataset of 874 eye fundus examinations graded by three readers. The expected value of the AUC for a perfect decision system was estimated at 0.956 against a single human reader, and 0.990 against a `majority wins' vote of three human readers. The Iowa detection program has reached the maximal performance measurable by a single human reader (0.929, CI: [0.897-0.962]) and is close to the maximal performance measurable by a `majority wins' vote (0.955, CI: [0.939-0.972]).
[Show abstract][Hide abstract] ABSTRACT: Huge amounts of surgical data are recorded during video-monitored surgery. Content-based video retrieval systems intent to reuse those data for computer-aided surgery. In this paper, we focus on real-time recognition of cataract surgery steps: the goal is to retrieve from a database surgery videos that were recorded during the same surgery step. The proposed system relies on motion features for video characterization. Motion features are usually impacted by eye motion or zoom level variations, which are not necessarily relevant for surgery step recognition. Those problems certainly limit the performance of the retrieval system. We therefore propose to refine motion feature extraction by applying pre-processing steps based on a novel pupil center and scale tracking method. Those pre-processing steps are evaluated for two different motion features. In this paper, a similarity measure adapted from Piciarelli's video surveillance system is evaluated for the first time in a surgery dataset. This similarity measure provides good results and for both motion features, the proposed preprocessing steps improved the retrieval performance of the system significantly.
[Show abstract][Hide abstract] ABSTRACT: Anterior eye segment surgeries are usually video-recorded. If we are able to efficiently analyze surgical videos in real-time, new decision support tools will emerge. The main anatomical landmarks in these videos are the pupil boundaries and the limbus, but segmenting them is challenging due to the variety of colors and textures in the pupil, the iris, the sclera and the lids. In this paper, we present a solution to reliably normalize the center and the scale in videos, without explicitly segmenting these landmarks. First, a robust solution to track the pupil center is presented: it uses the fact that the pupil boundaries, the limbus and the sclera / lid interface are concentric. Second, a solution to estimate the zoom level is presented: it relies on the illumination pattern reflected on the cornea. The proposed solution was assessed in a dataset of 186 real-live cataract surgery videos. The distance between the true and estimated pupil centers was equal to 8.0 ± 6.9% of the limbus radius. The correlation between the estimated zoom level and the true limbus size in images was high: R = 0.834.
[Show abstract][Hide abstract] ABSTRACT: In ophthalmology, it is now common practice to record every surgical procedure and to archive the resulting videos for documentation purposes. In this paper, we present a solution to automatically segment and categorize surgical tasks in real-time during the surgery, using the video recording. The goal would be to communicate information to the surgeon in due time, such as recommendations to the less experienced surgeons. The proposed solution relies on the content-based video retrieval paradigm: it reuses previously archived videos to automatically analyze the current surgery, by analogy reasoning. Each video is segmented, in real-time, into an alternating sequence of idle phases, during which no clinically-relevant motions are visible, and action phases. As soon as an idle phase is detected, the previous action phase is categorized and the next action phase is predicted. A conditional random field is used for categorization and prediction. The proposed system was applied to the automatic segmentation and categorization of cataract surgery tasks. A dataset of 186 surgeries, performed by ten different surgeons, was manually annotated: ten possibly overlapping surgical tasks were delimited in each surgery. Using the content of action phases and the duration of idle phases as sources of evidence, an average recognition performance of Az = 0.832 ± 0.070 was achieved.
IEEE Transactions on Medical Imaging 07/2014; 33(12). DOI:10.1109/TMI.2014.2340473 · 3.39 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Nowadays, many surgeries, including eye surgeries, are video-monitored. We present in this paper an automatic video analysis system able to recognize surgical tasks in real-time. The proposed system relies on the Content-Based Video Retrieval (CBVR) paradigm. It characterizes short subsequences in the video stream and searches for video subsequences with similar structures in a video archive. Fixed-length feature vectors are built for each subsequence: the feature vectors are unchanged by variations in duration and temporal structure among the target surgical tasks. Therefore, it is possible to perform fast nearest neighbor searches in the video archive. The retrieved video subsequences are used to recognize the current surgical task by analogy reasoning. The system can be trained to recognize any surgical task using weak annotations only. It was applied to a dataset of 23 epiretinal membrane surgeries and a dataset of 100 cataract surgeries. Three surgical tasks were annotated in the first dataset. Nine surgical tasks were annotated in the second dataset. To assess its generality, the system was also applied to a dataset of 1,707 movie clips in which 12 human actions were annotated. High task recognition scores were measured in all three datasets. Real-time task recognition will be used in future works to communicate with surgeons (trainees in particular) or with surgical devices.
Medical image analysis 02/2014; 18(3):579-590. DOI:10.1016/j.media.2014.02.007 · 3.65 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Breast mass segmentation in mammography plays a crucial role in Computer-Aided Diagnosis (CAD) systems. In this paper a Bidimensional Emperical Mode Decomposition (BEMD) method is introduced for the mass segmentation in mammography images. This method is used to decompose images into a set of functions named Bidimensional Intrinsic Mode Functions (BIMF) and a residue. Our approach consists of three steps: 1) the regions of interest (ROIs) were identified by using iterative thresholding; 2) the contour of the regions of interest (ROI) was extracted from the first BIMF by using the (BEMD) method; 3) the region of interest was finally refined by the extracted contour. The proposed approach is tested on (MIAS) database and the obtained results demonstrate the efficacy of the proposed approach.
Conference proceedings: ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference 07/2013; 2013:5441-5444. DOI:10.1109/EMBC.2013.6610780
[Show abstract][Hide abstract] ABSTRACT: This paper presents TeleOphta, an automatic system for screening diabetic retinopathy in teleophthalmology networks. Its goal is to reduce the burden on ophthalmologists by automatically detecting non referable examination records, i.e. examination records presenting no image quality problems and no pathological signs related to diabetic retinopathy or any other retinal pathology. TeleOphta is an attempt to put into practice years of algorithmic developments from our groups. It combines image quality metrics, specific lesion detectors and a generic pathological pattern miner to process the visual content of eye fundus photographs. This visual information is further combined with contextual data in order to compute an abnormality risk for each examination record. The TeleOphta system was trained and tested on a large dataset of 25,702 examination records from the OPHDIAT screening network in Paris. It was able to automatically detect 68% of the non referable examination records while achieving the same sensitivity as a second ophthalmologist. This suggests that it could safely reduce the burden on ophthalmologists by 56%.
Conference proceedings: ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference 07/2013; 2013:7144-7147. DOI:10.1109/EMBC.2013.6611205
[Show abstract][Hide abstract] ABSTRACT: This paper concerns the validation of automatic retinal image analysis (ARIA) algorithms. For reasons of space and consistency, we concentrate on the validation of algorithms processing color fundus camera images, currently the largest section of the ARIA literature. We sketch the context (imaging instruments and target tasks) of ARIA validation, summarizing the main image analysis and validation techniques. We then present a list of recommendations focusing on the creation of large repositories of test data created by international consortia, easily accessible via moderated Web sites, including multicenter annotations by multiple experts, specific to clinical tasks, and capable of running submitted software automatically on the data stored, with clear and widely agreed-on performance criteria, to provide a fair comparison.
[Show abstract][Hide abstract] ABSTRACT: A complete prototype for the automatic detection of normal examinations on a teleophthalmology network for diabetic retinopathy screening is presented. The system combines pathological pattern mining methods, with specific lesion detection methods, to extract information from the images. This information, plus patient and other contextual data, is used by a classifier to compute an abnormality risk. Such a system should reduce the burden on readers on teleophthalmology networks.
[Show abstract][Hide abstract] ABSTRACT: IMPORTANCE The diagnostic accuracy of computer detection programs has been reported to be comparable to that of specialists and expert readers, but no computer detection programs have been validated in an independent cohort using an internationally recognized diabetic retinopathy (DR) standard. OBJECTIVE To determine the sensitivity and specificity of the Iowa Detection Program (IDP) to detect referable diabetic retinopathy (RDR). DESIGN AND SETTING In primary care DR clinics in France, from January 1, 2005, through December 31, 2010, patients were photographed consecutively, and retinal color images were graded for retinopathy severity according to the International Clinical Diabetic Retinopathy scale and macular edema by 3 masked independent retinal specialists and regraded with adjudication until consensus. The IDP analyzed the same images at a predetermined and fixed set point. We defined RDR as more than mild nonproliferative retinopathy and/or macular edema. PARTICIPANTS A total of 874 people with diabetes at risk for DR. MAIN OUTCOME MEASURES Sensitivity and specificity of the IDP to detect RDR, area under the receiver operating characteristic curve, sensitivity and specificity of the retinal specialists' readings, and mean interobserver difference (κ). RESULTS The RDR prevalence was 21.7% (95% CI, 19.0%-24.5%). The IDP sensitivity was 96.8% (95% CI, 94.4%-99.3%) and specificity was 59.4% (95% CI, 55.7%-63.0%), corresponding to 6 of 874 false-negative results (none met treatment criteria). The area under the receiver operating characteristic curve was 0.937 (95% CI, 0.916-0.959). Before adjudication and consensus, the sensitivity/specificity of the retinal specialists were 0.80/0.98, 0.71/1.00, and 0.91/0.95, and the mean intergrader κ was 0.822. CONCLUSIONS The IDP has high sensitivity and specificity to detect RDR. Computer analysis of retinal photographs for DR and automated detection of RDR can be implemented safely into the DR screening pipeline, potentially improving access to screening and health care productivity and reducing visual loss through early treatment.
[Show abstract][Hide abstract] ABSTRACT: This paper introduces a novel retrieval framework for surgery videos. Given a query video, the goal is to retrieve videos in which similar surgical gestures appear. In this framework, the motion content of short video subsequences is modeled, in real-time, using spatiotemporal polynomials. The retrieval engine needs to be trained: key spatiotemporal polynomials, characterizing semantically-relevant surgical gestures, are identified through multiple-instance learning. Then, videos are compared in a high-level space spanned by these key spatiotemporal polynomials. The framework was applied to a dataset of 900 manually-delimited clips from 100 cataract surgery videos. High classification performance (Az=0.816±0.118) and retrieval performance (MAP=0.358) were observed.
Proceedings of the Third MICCAI international conference on Medical Content-Based Retrieval for Clinical Decision Support; 10/2012
[Show abstract][Hide abstract] ABSTRACT: Purpose This work introduces ongoing research on computer-aided retinal surgery. A content-based video retrieval system is presented: given a video stream captured by a digital camera monitoring the current surgery, the system retrieves similar videos in video archives. These informations could guide the surgery steps or generate surgical alerts if the current surgery shares complications with archived videos. Methods We propose to use data compression to extract video features. 1: motion vectors are derived from MPEG-4 stream. 2: image sequence segmentation is performed by a k-means clustering. 3: we used Kalman filter to track region displacements between consecutive frames and therefore characterize region trajectories. Finally, we combined this motion information with residual consisting of the difference between original input images and predicted images. To compare videos, we adopted an extension of fast dynamic time warping. Results The system was applied to a small dataset of 24 video-recorded retinal surgeries (621s +- 299s). Images have a definition of 720x576 pixels. An ophthalmic surgeon has divided each video into three new videos, each corresponding to one step of the membrane peeling procedure: Injection, Coat, Vitrectomy. The effectiveness of the proposed method, measured by ROC curve, is interesting (Az ≅ 0.73). Conclusion A novel CBVR system, allowing retrieval of medical video, has been presented. Experiments on the dataset of retinal surgery steps validate the semantic relevance of retrieved results in ophthalmic applications.
[Show abstract][Hide abstract] ABSTRACT: In recent years, many image analysis algorithms have been presented to assist Diabetic Retinopathy (DR) screening. The goal was usually to detect healthy examination records automatically, in order to reduce the number of records that should be analyzed by retinal experts. In this paper, a novel application is presented: these algorithms are used to 1) discover image characteristics that sometimes cause an expert to disagree with his/her peers and 2) warn the expert whenever these characteristics are detected in an examination record. In a DR screening program, each examination record is only analyzed by one expert, therefore analyzing disagreements among experts is challenging. A statistical framework, based on Parzen-windowing and the Patrick-Fischer distance, is presented to solve this problem. Disagreements among eleven experts from the Ophdiat screening program were analyzed, using an archive of 25,702 examination records.
Conference proceedings: ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference 08/2012; 2012:5959-62. DOI:10.1109/EMBC.2012.6347351
[Show abstract][Hide abstract] ABSTRACT: In this paper, we address the problem of computer-aided ophthalmic surgery. In particular, a novel Content-Based Video Retrieval (CBVR) system is presented : given a video stream captured by a digital camera monitoring the current surgery, the system retrieves, within digital archives, videos that resemble the current surgery monitoring video. The search results may be used to guide surgeons' decisions, for example, let the surgeon know what a more experienced fellow worker would do in a similar situation. With this goal, we propose to use motion information contained in MPEG- 4 AVC/H.264 video standard to extract features from videos. We propose two approaches, one of which is based on motion histogram created for every frame of a compressed video sequence to extract motion direction and intensity statistics. The other combine segmentation and tracking to extract region displacements between consecutive frames and therefore characterize region trajectories. To compare videos, an extension of the fast dynamic time warping to multidimensional time series was adopted. The system is applied to a dataset of 69 video-recorded retinal surgery steps. Results are promising: the retrieval efficiency is higher than 69%.
Conference proceedings: ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference 08/2012; 2012:4962-5. DOI:10.1109/EMBC.2012.6347106
[Show abstract][Hide abstract] ABSTRACT: A novel multiple-instance learning framework, for automated image classification, is presented in this paper. Given reference images marked by clinicians as relevant or irrelevant, the image classifier is trained to detect patterns, of arbitrary size, that only appear in relevant images. After training, similar patterns are sought in new images in order to classify them as either relevant or irrelevant images. Therefore, no manual segmentations are required. As a consequence, large image datasets are available for training. The proposed framework was applied to diabetic retinopathy screening in 2-D retinal image datasets: Messidor (1200 images) and e-ophtha, a dataset of 25,702 examination records from the Ophdiat screening network (107,799 images). In this application, an image (or an examination record) is relevant if the patient should be referred to an ophthalmologist. Trained on one half of Messidor, the classifier achieved high performance on the other half of Messidor (A(z)=0.881) and on e-ophtha (A(z)=0.761). We observed, in a subset of 273 manually segmented images from e-ophtha, that all eight types of diabetic retinopathy lesions are detected.
Medical image analysis 07/2012; 16(6):1228-40. DOI:10.1016/j.media.2012.06.003 · 3.65 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: A novel image characterization based on the wavelet transform is presented in this paper. Previous works on wavelet-based image characterization have focused on adapting a wavelet basis to an image or an image dataset. We propose in this paper to take one step further: images are characterized with all possible wavelet bases, with a given support. A simple image signature based on the standardized moments of the wavelet coefficient distributions is proposed. This signature can be computed for each possible wavelet filter fast. An image signature map is thus obtained. We propose to use this signature map as an image characterization for Content-Based Image Retrieval (CBIR). High retrieval performance was achieved on a medical, a face detection and a texture dataset: a precision at five of 62.5%, 97.8% and 64.0% was obtained for these datasets, respectively.
Content-Based Multimedia Indexing (CBMI), 2012 10th International Workshop on; 06/2012
[Show abstract][Hide abstract] ABSTRACT: A weakly supervised image classification framework is presented in this paper. Given reference images marked by clinicians as relevant or irrelevant, we learn to automatically detect relevant patterns, i.e. patterns that only appear in relevant images. After training, relevant patterns are sought in unseen images in order to classify each image as relevant or irrelevant. No manual segmentations are required. Because manual segmentation of medical images is extremely time-consuming, existing classification algorithms are usually trained on limited reference datasets. With the proposed framework, much larger medical datasets are now available for training. The proposed approach has been successfully applied to diabetic retinopathy detection in the Messidor dataset (Az =0.855). Moreover, we observed, in a new dataset of 473 manually segmented images, that all eight types of diabetic retinopathy lesions are detected.
Computer-Based Medical Systems (CBMS), 2012 25th International Symposium on; 06/2012