Conference Paper

OF-MSRN: Optical Flow-Auxiliary Multi-Task Regression Network for Direct Quantitative Measurement, Segmentation and Motion Estimation

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... As a powerful alternative to traditional methods, deep learning based approaches take the research of optical flow into a new level. Current optical flow methods have made great advances in: i) developing powerful data-driven, endto-end learning paradigms (Dosovitskiy et al. 2015;Ilg et al. 2017); ii) designing multiple refinement strategies (Sun et al. 2018;Hur and Roth 2019;Zhao et al. 2020b); iii) exploiting auxiliary information from related tasks (Zhao et al. 2020a); and iv) modeling pixel-wise relations for all pairs (Teed and Deng 2020;Jiang et al. 2021b). Although these deep learning based approaches have shown the strong capability of matching across frames, they are subject to a significant limitation: current methods largely focus on addressing the matching similarity between features, lacking a holistic motion understanding of the given scene. ...
... To improve results on optical flow, many recent works introduce stronger learning paradigms that can enable iterative refinement (Ranjan and Black 2017;Sun et al. 2018;Yang and Ramanan 2019;Hui, Tang, and Loy 2018), explicit pixel-wise-relation modeling, and joint representation learning with other tasks (Zhao et al. 2020a). Although remarkable progress have been achieved by these developments, there is still a large room for improvement over existing approaches that largely focus on addressing the matching similarity between features without considering how to achieve a holistic motion understanding. ...
Article
Estimating per-pixel motion between video frames, known as optical flow, is a long-standing problem in video understanding and analysis. Most contemporary optical flow techniques largely focus on addressing the cross-image matching with feature similarity, with few methods considering how to explicitly reason over the given scene for achieving a holistic motion understanding. In this work, taking a fresh perspective, we introduce a novel graph-based approach, called adaptive graph reasoning for optical flow (AGFlow), to emphasize the value of scene/context information in optical flow. Our key idea is to decouple the context reasoning from the matching procedure, and exploit scene information to effectively assist motion estimation by learning to reason over the adaptive graph. The proposed AGFlow can effectively exploit the context information and incorporate it within the matching procedure, producing more robust and accurate results. On both Sintel clean and final passes, our AGFlow achieves the best accuracy with EPE of 1.43 and 2.47 pixels, outperforming state-of-the-art approaches by 11.2% and 13.6%, respectively. Code is publicly available at https://github.com/megvii-research/AGFlow.
... As a powerful alternative to traditional methods, deep learning based approaches take the research of optical flow into a new level. Current optical flow methods have made great advances in: i) developing powerful data-driven, endto-end learning paradigms (Dosovitskiy et al. 2015;Ilg et al. 2017); ii) designing multiple refinement strategies (Sun et al. 2018;Hur and Roth 2019;Zhao et al. 2020b); iii) exploiting auxiliary information from related tasks (Zhao et al. 2020a); and iv) modeling pixel-wise relations for all pairs (Teed and Deng 2020;Jiang et al. 2021b). Although these deep learning based approaches have shown the strong capability of matching across frames, they are subject to a significant limitation: current methods largely focus on addressing the matching similarity between features, lacking a holistic motion understanding of the given scene. ...
... To improve results on optical flow, many recent works introduce stronger learning paradigms that can enable iterative refinement (Ranjan and Black 2017;Sun et al. 2018;Yang and Ramanan 2019;Hui, Tang, and Loy 2018), explicit pixel-wise-relation modeling, and joint representation learning with other tasks (Zhao et al. 2020a). Although remarkable progress have been achieved by these developments, there is still a large room for improvement over existing approaches that largely focus on addressing the matching similarity between features without considering how to achieve a holistic motion understanding. ...
Preprint
Estimating per-pixel motion between video frames, known as optical flow, is a long-standing problem in video understanding and analysis. Most contemporary optical flow techniques largely focus on addressing the cross-image matching with feature similarity, with few methods considering how to explicitly reason over the given scene for achieving a holistic motion understanding. In this work, taking a fresh perspective, we introduce a novel graph-based approach, called adaptive graph reasoning for optical flow (AGFlow), to emphasize the value of scene/context information in optical flow. Our key idea is to decouple the context reasoning from the matching procedure, and exploit scene information to effectively assist motion estimation by learning to reason over the adaptive graph. The proposed AGFlow can effectively exploit the context information and incorporate it within the matching procedure, producing more robust and accurate results. On both Sintel clean and final passes, our AGFlow achieves the best accuracy with EPE of 1.43 and 2.47 pixels, outperforming state-of-the-art approaches by 11.2% and 13.6%, respectively.
... During the past decades, several different types of approaches have been proposed for this task by locating the edges of NW, LI, and MA, including the 30 gradient-based edge detection methods (Pignoli & Longo, 1988;Liguori et al., 2001;Stein et al., 2005;Golemati et al., 2007;Faita et al., 2008), the active contour-based methods (Gutierrez et al., 2002;Cheng et al., 2002;Loizou et al., 2007;Petroudi et al., 2012;Zhao et al., 2017b), the machine learning-based methods (Menchón-Lara et al., 2014;Menchón-Lara & Sancho-Gómez, 2015; 35 Shin et al., 2016;Qian & Yang, 2018;Biswas et al., 2018;Xie et al., 2019;Zhou et al., 2019;Zhao et al., 2020;Vila et al., 2020). Although these methods have made significant advances, they still suffer the issue of low measuring accuracy (measured by mean absolute error) and poor stability (measured by the magnitude of change in absolute error), mainly due to the following disadvantages: 1) 40 These methods are mostly appearance-based, which do not consider the carotid anatomical information, and prone to give anatomical incorrect estimation and thus lower the accuracy and stability. ...
... During the past decades, researchers proposed various methods for tackling the CALD or CIMT estimation tasks from the following three different cate-95 gories: gradient-based edge detection methods (Liguori et al., 2001;Stein et al., 2005;Golemati et al., 2007;Faita et al., 2008), active contour-based methods (Gutierrez et al., 2002;Cheng et al., 2002;Loizou et al., 2007;Petroudi et al., 2012), and machine learning-based methods (Menchón-Lara et al., 2014;Menchón-Lara & Sancho-Gómez, 2015;Shin et al., 2016;Qian & Yang, 2018; 100 Biswas et al., 2018;Zhou et al., 2019;Zhao et al., 2020;Vila et al., 2020). In this section, we will briefly review these three different types of methods. ...
Article
Full-text available
Carotid artery lumen diameter (CALD) and carotid artery intima-media thickness (CIMT) are essential factors for estimating the risk of many cardiovascular diseases. The automatic measurement of them in ultrasound (US) images is an efficient assisting diagnostic procedure. Despite the advances, existing methods still suffer the issue of low measuring accuracy and poor prediction stability , mainly due to the following disadvantages: 1) ignore anatomical prior and prone to give anatomically inaccurate estimation; 2) require carefully designed post-processing, which may introduce more estimation errors; 3) rely on massive pixel-wise annotations during training; 4) can not estimate the uncertainty of the predictions. In this study, we propose the Anatomical Prior-guided ReInforcement Learning model (APRIL), which innovatively formulate the measurement of CALD & CIMT as an RL problem and dynamically incorporate anatomical prior (AP) into the system through a novel reward. With the guidance of AP, the designed keypoints in APRIL can avoid various anatomy impossible mis-locations, and accurately measure the CALD & CIMT based on their corresponding locations. Moreover, this formulation significantly re-* duces human annotation effort by only using several keypoints and can help to eliminate the extra post-processing steps. Further, we introduce an uncertainty module for measuring the prediction variance, which can guide us to adaptively rectify the estimation of those frames with considerable uncertainty. Experiments on a challenging carotid US dataset show that APRIL can achieve MAE (in pixel/mm) of 3.02 ± 2.23 / 0.18 ± 0.13 for CALD, and 0.96 ± 0.70 / 0.06 ± 0.04 for CIMT, which significantly surpass popular approaches that use more annotations.
... Unsupervised learning with photometric loss was also well investigated in computer vision [19]- [23]. Without the requirement of ground truth labels, optical flow estimation can be flexibly embedded into a multi-task configuration, and bring mutual benefits to other tasks like video segmentation and measurement quantification [24], [25]. Despite their success in optical flow estimation, the above-mentioned unsupervised methods only optimize the photometric loss between the corresponding location of two images, and ignore the con-sistency of the optical flow itself across the whole sequence. ...
Article
Full-text available
Quantification of left ventricular (LV) ejection fraction (EF) from echocardiography depends upon the identification of endocardium boundaries as well as the calculation of end-diastolic (ED) and end-systolic (ES) LV volumes. It's critical to segment the LV cavity for precise calculation of EF from echocardiography. Most of the existing echocardiography segmentation approaches either only segment ES and ED frames without leveraging the motion information, or the motion information is only utilized as an auxiliary task. To address the above drawbacks, in this work, we propose a novel echocardiography segmen-tation method which can effectively utilize the underlying motion information by accurately predicting optical flow (OF) fields. First, we devised a feature extractor shared by the segmentation and the optical flow sub-tasks for efficient information exchange. Then, we proposed a new orientation congruency constraint for the OF estimation sub-task by promoting the congruency of optical flow orientation between successive frames. Finally, we design a motion-enhanced segmentation module for the final seg-mentation. Experimental results show that the proposed method achieved state-of-the-art performance for EF estimation , with a Pearson correlation coefficient of 0.893 and a Mean Absolute Error of 5.20% when validated with echo sequences of 450 patients.
... In [61], The authors developed a method to jointly segment the lumen and thickness of the intima-media and estimate carotid wall motion. Their network received as input a whole sequence of 2D images and provided as output the motion of the carotid through a pyramidal Siamese sub-network and the segmentation of the different structures through a multi-task regression network. ...
Thesis
Ultrasound is the most widely used imaging modality in clinical practice because it is fast, non-invasive and less expensive than other modalities. In echocardiography, several metrics characterizing the cardiac function can be extracted from these acquisitions, among which the global longitudinal strain (GLS) plays an important role in establishing a diagnosis. However, the estimation of this index suffers from a lack of reproducibility due to the specific ultrasound’s characteristics. Indeed, traditional methods such as optical flow or block matching do not handle typical artifacts such as ultrasound texture decorrelation. Recently, deep learning approaches have beaten state-of-the-art methods in motion estimation, driven by applications to robotics or autonomous cars. In the first part of this thesis, we present a pilot study to evaluate the ability of deep learning methods to estimate motion in ultrasound imaging despite the many underlying artifacts. To do so, we created a database composed of simulated and in-vitro ultrasound images including a rotating disk with varying speeds. In the second part of this thesis, we detail the pyramidal neural network that we have developed to estimate the deformation of the myocardial muscle and that significantly improves the performances of the state-of-the-art methods. To train and evaluate our learning method, we also implemented a simulation pipeline to generate realistic echocardiographic image sequences with a dense reference field and with high anatomical and functional variability.
Chapter
Current deep learning methods for optical flow estimation often use spatial feature pyramids to extract image features. To get the correlation between images, they directly compute the cost volume of the obtained image features. In this process, fine object details tend to be ignored. To solve this fundamental problem, an object-scale adaptive optical flow estimation network is proposed, in which multi-scale features are selectively extracted and exploited using our developed feature selectable block (FSB). As a result, we can obtain the multi-scale receptive fields of objects at different scales in the image. To consolidate all image features generated from all scales, a new cost volume generation scheme called multi-scale cost volume generation block (MCVGB) is further proposed to aggregate information from different scales. Extensive experiments conducted on the Sintel and KITTI2015 datasets show that our proposed method can capture fine details of different scale objects with high accuracy and thus deliver superior performance over a number of state-of-the-art methods.KeywordsOptical flow estimationDeep learningFeature pyramidsReceptive fieldsMulti-scale
Article
Full-text available
Hepatocellular Carcinoma (HCC) detection, size grading, and quantification (i.e. the center point coordinates, max-diameter, and area) by using multi-modality magnetic resonance imaging (MRI) are clinically significant tasks for HCC assessment and treatment. However, delivering the three tasks simultaneously is extremely challenging due to: 1) the lack of effective an mechanism to capture the relevance among multi-modality MRI information for multi-modality feature fusion and selection; 2) the lack of effective mechanism and constraint strategy to achieve mutual promotion of multi-task. In this paper, we proposed a task relevance driven adversarial learning framework (TrdAL) for simultaneous HCC detection, size grading, and multi-index quantification using multi-modality MRI (i.e. in-phase, out-phase, T2FS, and DWI). The TrdAL first obtains expressive feature of dimension reduction via using a CNN-based encoder. Secondly, the proposed modality-aware Transformer is utilized for multi-modality MRI features fusion and selection, which solves the challenge of multi-modality information diversity via capturing the relevance among multi-modality MRI. Then, the innovative task relevance driven and radiomics guided discriminator (Trd-Rg-D) is used for united ad-versarial learning. The Trd-Rg-D captures the internal high-order relationships to refine the performance of multi-task simultaneously. Moreover, adding the radiomics feature as the prior knowledge into Trd-Rg-D enhances the detailed feature extraction. Lastly, a novel task interaction loss function is used for constraining the TrdAL, which enforces the higher-order consistency among multi-task labels to enhance mutual promotion. The TrdAL is validated on a corresponding multi-modality MRI of 135 subjects. The experiments demonstrate that TrdAL achieves high accuracy of (1) HCC detection: specificity of 93.71%, sensitivity of 93.15%, accuracy of 93.33%, and IoU of 82.93%; (2) size grading: accuracy of large size, medium size, small size, tiny size, and healthy subject are 90.38%, 87.74%, 80.68%, 77.78%, and 96.87%; (3) multi-index quantifi-cation: the mean absolute error of center point, max-diameter, and area are 2.74mm, 3.17mm, and 144.51mm 2. All of these results indicate that the proposed TrdAL provides an efficient, accurate, and reliable tool for HCC diagnosis in clinical.
Article
Full-text available
Fully automated comprehensive analysis of carotid artery (localization of range of interest (ROI), direct quantitative measurement and segmentation of lumen diameter (CALD) and intima-media thickness (CIMT), and motion estimation of the carotid wall) is a reliable auxiliary diagnosis of cardiovascular diseases, which relieves physicians from laborious workloads. No work has achieved fully automated comprehensive analysis of carotid artery due to five intractable challenges: 1) The heavy reliance on experienced carotid physicians for the selection of ROI limits fully automated studies. 2) The weak structural information of intima-media thickness increases the difficulty of feature encoding. 3) The radial motion of the carotid wall results in the lack of discriminant features of boundaries. 4) Diseased carotid arteries lose many expression features. 5) Optimal weights of multitask regression are hard to tune manually. In this paper, we propose a novel uncertainty-guided multitask regression network aided by optical flow named OF-UMRN to solve the intractable challenges. The four modules and three innovations of the OF-UMRN take their responsibility. OF-* UMRN takes localization and mapping of ROI as a pre-processing. It achieves direct quantitative measurement and segmentation by a multitask regression network. And we creatively model homoscedastic uncertainty to automated tune the weights of the two tasks optimally. The OF-UMRN adopts a bidirec-tional mechanism to encode the optical flow used to predict the carotid wall's motion fields. More importantly, we creatively propose a dual optimization module based on the co-promotion between segmentation and motion estimation to improve the performance of radially moving and diseased carotid arteries. Therefore, the OF-UMRN makes the most of the pathological relationship between multiple objects and co-promotion between various tasks. Extensive experiments on US sequences of 101 patients have demonstrated the superior performance of OF-UMRN on the fully automated comprehensive analysis of the carotid artery. Therefore the OF-UMRN has excellent potential in clinical disease diagnoses and assessments of the carotid artery.
ResearchGate has not been able to resolve any references for this publication.