Figure 1 - uploaded by Gao Huang
Content may be subject to copyright.
The transformations within a layer in DenseNets (left), and CondenseNets at training time (middle) and at test time (right). The Index and Permute operations are explained in Section 3.1 and 4.1, respectively. (L-Conv: learned group convolution; G-Conv: group convolution)

The transformations within a layer in DenseNets (left), and CondenseNets at training time (middle) and at test time (right). The Index and Permute operations are explained in Section 3.1 and 4.1, respectively. (L-Conv: learned group convolution; G-Conv: group convolution)

Source publication
Article
Full-text available
Deep neural networks are increasingly used on mobile devices, where computational resources are limited. In this paper we develop CondenseNet, a novel network architecture with unprecedented efficiency. It combines dense connectivity between layers with a mechanism to remove unused connections. The dense connectivity facilitates feature re-use in t...

Similar publications

Conference Paper
Full-text available
We demonstrate a new convolutional neural network architecture to perform Fourier ptychographic Microscopy (FPM) reconstruction, which achieves high-resolution phase recovery with considerably less data than standard FPM.
Preprint
Full-text available
We proposed a TV priori information guided deep learning method for single image super-resolution(SR). The new alogorithm up-sample method based on TV priori, new learning method and neural networks architecture are embraced in our TV guided priori Convolutional Neural Network which diretcly learns an end to end mapping between the low level to hig...

Citations

... The representative organizations of the Abrahamic religious discussed above appear motivated by their leaders, equipped by their faith, informed by climate science, and engaged in addressing human-forced climate change in all facets of human life. Though this essay is limited to four Abrahamic religions that have issued key statements on climate change, other denominations, religions and affiliated groups deserve attention [58,59]. ...
... We discuss in more detail the article by Gómez-Ríos et al. [56], since it is the most recent one, dealing with corals, and it resolves more corals than previous studies. In this study three CNNs were applied: Inception v3 [57], ResNet [58], and DenseNet [59] (Table 1). Two datasets were analyzed: EILAT and RSMAS both comprised of patches of coral images. ...
... 2) Group Convolution: Group convolution aims to reduce the number of parameters in a convolution layer by dividing the feature channels into different groups, and then convolve on each group independently [140,141], as shown in Fig. 10 (d). If we evenly divide the features into m groups, without changing other configurations, the computation will be theoretically reduced to 1/m of that before. ...
Article
Full-text available
Object detection, as of one the most fundamental and challenging problems in computer vision, has received great attention in recent years. Over the past two decades, we have seen a rapid technological evolution of object detection and its profound impact on the entire computer vision field. If we consider today’s object detection technique as a revolution driven by deep learning, then, back in the 1990s, we would see the ingenious thinking and long-term perspective design of early computer vision. This article extensively reviews this fast-moving research field in the light of technical evolution, spanning over a quarter-century’s time (from the 1990s to 2022). A number of topics have been covered in this article, including the milestone detectors in history, detection datasets, metrics, fundamental building blocks of the detection system, speedup techniques, and recent state-of-the-art detection methods.
... The overall network architecture consists of an initial convolutional, the average pooling layer, and three ShuffleNet units afterward. Instead of using random shuffling, CondenseNet [76] learns the grouping during training. It also integrates the parameter pruning and removal of less important features. ...
Article
Full-text available
Successful integration of deep neural networks (DNNs) or deep learning (DL) has resulted in breakthroughs in many areas. However, deploying these highly accurate models for data-driven, learned, automatic, and practical machine learning (ML) solutions to end-user applications remains challenging. DL algorithms are often computationally expensive, power-hungry, and require large memory to process complex and iterative operations of millions of parameters. Hence, training and inference of DL models are typically performed on high-performance computing (HPC) clusters in the cloud. Data transmission to the cloud results in high latency, round-trip delay, security and privacy concerns, and the inability of real-time decisions. Thus, processing on edge devices can significantly reduce cloud transmission cost. Edge devices are end devices closest to the user, such as mobile phones, cyber–physical systems (CPSs), wearables, the Internet of Things (IoT), embedded and autonomous systems, and intelligent sensors. These devices have limited memory, computing resources, and power-handling capability. Therefore, optimization techniques at both the hardware and software levels have been developed to handle the DL deployment efficiently on the edge. Understanding the existing research, challenges, and opportunities is fundamental to leveraging the next generation of edge devices with artificial intelligence (AI) capability. Mainly, four research directions have been pursued for efficient DL inference on edge devices: 1) novel DL architecture and algorithm design; 2) optimization of existing DL methods; 3) development of algorithm–hardware codesign; and 4) efficient accelerator design for DL deployment. This article focuses on surveying each of the four research directions, providing a comprehensive review of the state-of-the-art tools and techniques for efficient edge inference.
... Huang et al. proposed DenseNet, in which each layer obtains additional inputs from all preceding layers and passes on its feature maps to all subsequent layers [4]. The follow-up called CondenseNet combines dense connectivity with learned group convolution [28]. The learned group convolution can prune filters associated with superfluous feature re-use [28]. ...
... The follow-up called CondenseNet combines dense connectivity with learned group convolution [28]. The learned group convolution can prune filters associated with superfluous feature re-use [28]. ShiftNet proposes a parameter-free and FLOP-free shift operation as an alternative to the spatial convolution for aggregation of spatial information [19]. ...
... Existing methods to decompose convolutional filters involve sequential separable filters [23], [27], a learned small basis filters of different shapes [24], consecutive 1D filters over all directions [25], and single intra-channel convolution (SIC) [12], [26]. Factorized convolution differs from depthwise separable convolution in that it only reduces the ranks of filters without performing grouping on the channels of input and output tensors [28]. Most efficient convolutions only perform feature abstraction on the spatial domain while neglecting to consider convolutions over other domains, such as the one formed by the channel and height/width dimensions. ...
Conference Paper
Full-text available
It is acknowledged that the depthwise separable convolution effectively reduces the computational complexity of a standard convolution. However, its depthwise convolution only performs on the spatial domain while neglecting to consider other domains such as the one formed by the channel and width/height dimensions. This paper specifically bridges the gaps by proposing the generalwise separable convolution to generalize the depthwise separable convolution beyond the spatial domain to recruit the widthwise and heightwise convolutions. A sequential combination of pointwise group convolution, channel shuffling, channel splitting, and dimension transposing is required to implement the generalwise separable convolution. By embedding the generalwise separable convolution into a stack of inverted residuals with linear bottlenecks, we propose GSCNet as a lightweight neural backbone for various embedded vision tasks. Our empirical evidence indicates that the generalwise separable convolution is superior to the depthwise separable convolution by feature extraction from domains complementing the spatial domain. Experimental results show that GSCNet outperforms other state-of-the-art mobile CNNs over multiply vision tasks. On ImageNet object classification benchmark, GSCNet achieves 75.5% top-1 accuracy with 216.98M multiply-adds, which is 28.1% fewer than that of MobileNetv2 and 29.3% fewer than that of HBONet. GSCNet also yields better mAP quality than MobileNetv1/v2 and MnasNet on COCO object detection benchmark.
... Dynamic architecture networks perform inference with specific architectures conditioned on each sample. Specifically, they adaptively adjust the network depth (Wang et al., 2018), width (Mullapudi et al., 2018), or route based on the input (Huang et al., 2018). ...
Preprint
Full-text available
Dynamic networks have been extensively explored as they can considerably improve the model's representation power with acceptable computational cost. The common practice in implementing dynamic networks is to convert given static layers into fully dynamic ones where all parameters are dynamic and vary with the input. Recent studies empirically show the trend that the more dynamic layers contribute to ever-increasing performance. However, such a fully dynamic setting 1) may cause redundant parameters and high deployment costs, limiting the applicability of dynamic networks to a broader range of tasks and models, and more importantly, 2) contradicts the previous discovery in the human brain that \textit{when human brains process an attention-demanding task, only partial neurons in the task-specific areas are activated by the input, while the rest neurons leave in a baseline state.} Critically, there is no effort to understand and resolve the above contradictory finding, leaving the primal question -- to make the computational parameters fully dynamic or not? -- unanswered. The main contributions of our work are challenging the basic commonsense in dynamic networks, and, proposing and validating the \textsc{cherry hypothesis} -- \textit{A fully dynamic network contains a subset of dynamic parameters that when transforming other dynamic parameters into static ones, can maintain or even exceed the performance of the original network.} Technically, we propose a brain-inspired partially dynamic network, namely PAD-Net, to transform the redundant dynamic parameters into static ones. Also, we further design Iterative Mode Partition to partition the dynamic- and static-subnet, which alleviates the redundancy in traditional fully dynamic networks. Our hypothesis and method are comprehensively supported by large-scale experiments with typical advanced dynamic methods.
... construction of ResNet were not shown in this study, and overfitting is assumed to be the reason. Calculations of MobileNetV1 would be less than half of the original convolution structure [22][23][24]. MobileNetV2 is based on MobileNetV1, and linear bottlenecks and inverted residuals are added [25] to reduce the information loss caused by rectified linear units (ReLU). In this study, the shortest runtime was created by MobileNetV2. ...
Article
Full-text available
Purpose Early confirmation or ruling out biliary atresia (BA) is essential for infants with delayed onset of jaundice. In the current practice, percutaneous liver biopsy and intraoperative cholangiography (IOC) remain the golden standards for diagnosis. In Taiwan, the diagnostic methods are invasive and can only be performed in selective medical centers. However, referrals from primary physicians and local pediatricians are often delayed because of lacking clinical suspicions. Ultrasounds (US) are common screening tools in local hospitals and clinics, but the pediatric hepatobiliary US particularly requires well-trained imaging personnel. The meaningful comprehension of US is highly dependent on individual experience. For screening BA through human observation on US images, the reported sensitivity and specificity were achieved by pediatric radiologists, pediatric hepatobiliary experts, or pediatric surgeons. Therefore, this research developed a tool based on deep learning models for screening BA to assist pediatric US image reading by general physicians and pediatricians. Methods De-identified hepatobiliary US images of 180 patients from Taichung Veterans General Hospital were retrospectively collected under the approval of the Institutional Review Board. Herein, the top network models of ImageNet Large Scale Visual Recognition Competition and other network models commonly used for US image recognition were included for further study to classify US images as BA or non-BA. The performance of different network models was expressed by the confusion matrix and receiver operating characteristic curve. There were two methods proposed to solve disagreement by US image classification of a single patient. The first and second methods were the positive-dominance law and threshold law. During the study, the US images of three successive patients suspected to have BA were classified by the trained models. Results Among all included patients contributing US images, 41 patients were diagnosed with BA by surgical intervention and 139 patients were either healthy controls or had non-BA diagnoses. In this study, a total of 1,976 original US images were enrolled. Among them, 417 and 1,559 raw images were from patients with BA and without BA, respectively. Meanwhile, ShuffleNet achieved the highest accuracy of 90.56% using the same training parameters as compared with other network models. The sensitivity and specificity were 67.83% and 96.76%, respectively. In addition, the undesired false-negative prediction was prevented by applying positive-dominance law to interpret different images of a single patient with an acceptable false-positive rate, which was 13.64%. For the three consecutive patients with delayed obstructive jaundice with IOC confirmed diagnoses, ShuffleNet achieved accurate diagnoses in two patients. Conclusion The current study provides a screening tool for identifying possible BA by hepatobiliary US images. The method was not designed to replace liver biopsy or IOC, but to decrease human error for interpretations of US. By applying the positive-dominance law to ShuffleNet, the false-negative rate and the specificities were 0 and 86.36%, respectively. The trained deep learning models could aid physicians other than pediatric surgeons, pediatric gastroenterologists, or pediatric radiologists, to prevent misreading pediatric hepatobiliary US images. The current artificial intelligence (AI) tool is helpful for screening BA in the real world.
... Two different sized models are tested: RegNetY-400MF and RegNetY-800MF. Compared baselines include other types of efficient models, e.g., MobileNets-v2 [28], ShuffletNets-v2 [23] and CondenseNets [16]. ...
Preprint
Spatial-wise dynamic convolution has become a promising approach to improving the inference efficiency of deep networks. By allocating more computation to the most informative pixels, such an adaptive inference paradigm reduces the spatial redundancy in image features and saves a considerable amount of unnecessary computation. However, the theoretical efficiency achieved by previous methods can hardly translate into a realistic speedup, especially on the multi-core processors (e.g. GPUs). The key challenge is that the existing literature has only focused on designing algorithms with minimal computation, ignoring the fact that the practical latency can also be influenced by scheduling strategies and hardware properties. To bridge the gap between theoretical computation and practical efficiency, we propose a latency-aware spatial-wise dynamic network (LASNet), which performs coarse-grained spatially adaptive inference under the guidance of a novel latency prediction model. The latency prediction model can efficiently estimate the inference latency of dynamic networks by simultaneously considering algorithms, scheduling strategies, and hardware properties. We use the latency predictor to guide both the algorithm design and the scheduling optimization on various hardware platforms. Experiments on image classification, object detection and instance segmentation demonstrate that the proposed framework significantly improves the practical inference efficiency of deep networks. For example, the average latency of a ResNet-101 on the ImageNet validation set could be reduced by 36% and 46% on a server GPU (Nvidia Tesla-V100) and an edge device (Nvidia Jetson TX2 GPU) respectively without sacrificing the accuracy. Code is available at https://github.com/LeapLabTHU/LASNet.
... However, their allowed maximum group amount is quite small (2 for FBNet and 4 for RCAS). DPP-Net [85] and MONAS [88] also contain a variant of GConv, Learned Group Convolution (LGConv), which is the key operator of CondenseNet [89], in their search space. The ...
... ProxylessNAS [23] accepts the backbone of the residual PyramidNet [101], which has a residual skip connection every two operators, and replaces the original operators with their own tree-structured cells [102]. DPP-Net [85] selects the backbone of CondenseNet [89], which repeats an identical cell abundant times with both residual skip and chain connections, and only searches the operators in the cell. MONAS [88] also reuses the backbone of CondenseNet but it uses the same cell structure and searches the number of stages and growth rate. ...
Preprint
Deep learning technologies have demonstrated remarkable effectiveness in a wide range of tasks, and deep learning holds the potential to advance a multitude of applications, including in edge computing, where deep models are deployed on edge devices to enable instant data processing and response. A key challenge is that while the application of deep models often incurs substantial memory and computational costs, edge devices typically offer only very limited storage and computational capabilities that may vary substantially across devices. These characteristics make it difficult to build deep learning solutions that unleash the potential of edge devices while complying with their constraints. A promising approach to addressing this challenge is to automate the design of effective deep learning models that are lightweight, require only a little storage, and incur only low computational overheads. This survey offers comprehensive coverage of studies of design automation techniques for deep learning models targeting edge computing. It offers an overview and comparison of key metrics that are used commonly to quantify the proficiency of models in terms of effectiveness, lightness, and computational costs. The survey then proceeds to cover three categories of the state-of-the-art of deep model design automation techniques: automated neural architecture search, automated model compression, and joint automated design and compression. Finally, the survey covers open issues and directions for future research.
... When the kernel size is K K  , and the total number of parameters of the kernel is K K C N    ; hence, the computational cost is expensive if C is large. Group convolution reduced this cost, and the number of parameters was compressed to ( / ) K K C G N    by partitioning the input channels into mutually exclusive groups [36]. ...
Preprint
In recent years, convolutional neural networks (CNNs) have shown great potential in synthetic aperture radar (SAR) target recognition. SAR images have a strong sense of granularity and have different scales of texture features, such as speckle noise, target dominant scatterers and target contours, which are rarely considered in the traditional CNN model. This paper proposed two residual blocks, namely EMC2A blocks with multiscale receptive fields(RFs), based on a multibranch structure and then designed an efficient isotopic architecture deep CNN (DCNN), EMC2A-Net. EMC2A blocks utilize parallel dilated convolution with different dilation rates, which can effectively capture multiscale context features without significantly increasing the computational burden. To further improve the efficiency of multiscale feature fusion, this paper proposed a multiscale feature cross-channel attention module, namely the EMC2A module, adopting a local multiscale feature interaction strategy without dimensionality reduction. This strategy adaptively adjusts the weights of each channel through efficient one-dimensional (1D)-circular convolution and sigmoid function to guide attention at the global channel wise level. The comparative results on the MSTAR dataset show that EMC2A-Net outperforms the existing available models of the same type and has relatively lightweight network structure. The ablation experiment results show that the EMC2A module significantly improves the performance of the model by using only a few parameters and appropriate cross-channel interactions.
... The learned fully connected separable convolutional network is effective for real-time applications, particularly in biomedical engineering. 42 Few important layers that are used for training and testing are listed as follows: ...
Article
In recent decades, intracranial hemorrhage detection from computed tomography (CT) scans has gained considerable attention among researchers in the medical community. The major problem in dealing with the Radiological Society of North America (RSNA) dataset is a three dimensional representation of CT scan, where the labeled data are scarce and hard to obtain. To highlight this problem, a novel learned fully connected separable convolutional network is proposed in this research article. After collecting the CT scans, data augmentation is used to generate multiple image variations to improve the capacity of the proposed model generalization. Based on the albumentations library, the transformations are selected for data augmentation such as brightness adjustment, horizontal flipping, shifting, rotation, and scaling. The intracranial hemorrhage subtype classification is accomplished utilizing a learned fully connected separable convolutional network which significantly classifies six classes as any, intraparenchymal, subarachnoid, epidural, intraventricular, and subdural. In the resulting phase, the learned fully connected separable convolutional network obtained an average accuracy of 98.63%, sensitivity of 73.32%, specificity of 99.49%, and area under the curve of 98.98%, where the obtained results are effective compared with ResNet-50, SE-ResNeXt-50, ResNeXt-101, and ResNeXt-101 with bidirectional long short term memory network.