Article
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Food recognition is an emerging topic in computer vision. The problem is being addressed especially in health-oriented systems where it is used as a support for food diary applications. The goal is to improve current food diaries, where the users have to manually insert their daily food intake, with an automatic recognition of the food type, quantity and consequent calories intake estimation. In addition to the classical recognition challenges, the food recognition problem is characterized by the absence of a rigid structure of the food and by large intra-class variations. To tackle such challenges, a food recognition system based on a committee classification is proposed. The aim is to provide a system capable of automatically choosing the optimal features for food recognition out of the existing plethora of available ones (e.g., color, texture, etc.). Following this idea, each committee member, i.e., an Extreme Learning Machine, is trained to specialize on a single feature type. Then, a Structural Support Vector Machine is exploited to produce the final ranking of possible matches by filtering out the irrelevant features and thus merging only the relevant ones. Experimental results show that the proposed system outperforms state-of-the-art works on four publicly available benchmark datasets.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Recently, researchers scrutinized extreme learning machines for food classification. Supervised Extreme Learning Committee [26], uses committee members, "kernel extreme learning machine" trained to specialize single feature type. It has high complexity as it uses all the features of training images for computing kernel matrix. ...
... d. Remove the link between the first best and second-best Equation (24). e. Compute the kernel matrix of the input vector with nodes in the network by using Equations (25) and (26). ...
... As the model adds a new mapping node, the "Growing Hidden Neuron Methodology" will update the kernel matrix by Equations (7)- (11). The Z n and K n in Equations (7)-(11) is computed by Equations (25) and (26). ...
Article
Full-text available
Recently, food recognition has received more research attention for mHealth applications that use automated visual-based methods to assess dietary intake. The goal is to improve the food diaries by addressing the challenges faced by existing methodologies. In addition to the classical challenge of the absence of rigid food structure and intra-class variations, food diaries employing deep networks trained with pristine images are susceptible to quality variations in real-world conditions of image acquisition and transmission. Similarly, existing progressive classifiers that use visual features via a convolutional neural network (CNN) classify food categories and cannot detect food ingredients. We aim to provide a system that selects the optimal subset of features from quality resilient CNNs and subsequently incorporates the parallel type of classification to tackle such challenges. The first progressive classifier recognizes food categories, and its multilabel extension detects food ingredients. Following this idea, after extracting features from the quality resilient category and ingredient CNN models by fine-tuning it on synthetic images generated using the novel online data augmentation method random iterative mixup. Our feature selection strategy uses the Shapley additive explanation (SHAP) values from the gradient explainer to select the best features. Then, novel progressive kernel extreme learning machine (PKELM) is exploited to cater to domain variations due to quality distortions, intra-class variations, and so forth, by remodeling the network structure based on activity value with the nodes. PKELM extension for multilabel classification detects ingredients by employing a bipolar step function to process test output and then selecting the column labels of the resulting matrix with a value of one. Moreover, during online learning, the PKELM novelty detection mechanism can label unlabeled instances and detect noisy samples. Experimental results showed superior performance on an integrated set of measures for seven publicly available food datasets.
... In recent years, most research in food image classification has focused on hand-crafted features that consist of a color histogram [10,21], local binary pattern (LBP) [10,15], scale invariant feature transform (SIFT) [10], histogram of oriented gradients (HOG) [10,21], and speeded up robust feature (SURF) [2]. These handcrafted methods are combined with machine learning algorithms to classify food images. ...
... In recent years, most research in food image classification has focused on hand-crafted features that consist of a color histogram [10,21], local binary pattern (LBP) [10,15], scale invariant feature transform (SIFT) [10], histogram of oriented gradients (HOG) [10,21], and speeded up robust feature (SURF) [2]. These handcrafted methods are combined with machine learning algorithms to classify food images. ...
... In recent years, most research in food image classification has focused on hand-crafted features that consist of a color histogram [10,21], local binary pattern (LBP) [10,15], scale invariant feature transform (SIFT) [10], histogram of oriented gradients (HOG) [10,21], and speeded up robust feature (SURF) [2]. These handcrafted methods are combined with machine learning algorithms to classify food images. ...
Conference Paper
Full-text available
The real-world food image is a challenging problem for food image classification, because food images can be captured from different perspective and patterns. Also, many objects can appear in the image, not just foods. To recognize food images, in this paper, we propose a modified MobileNet architecture that is applies the global average pooling layers to avoid overfitting the food images, batch normalization, rectified linear unit, dropout layers, and the last layer is softmax. The state-of-the-art and the proposed MobileNet architectures are trained according to the fine-tuned model. The experimental results show that the proposed version of the MobileNet architecture achieves significantly higher accuracies than the original MobileNet architecture. The proposed MobileNet architecture significantly outperforms other architectures when the data augmentation techniques are combined.
... In recent years, most research in food image classification has focused on hand-crafted features that consist of a color histogram [10,21], local binary pattern (LBP) [10,15], scale invariant feature transform (SIFT) [10], histogram of oriented gradients (HOG) [10,21], and speeded up robust feature (SURF) [2]. These handcrafted methods are combined with machine learning algorithms to classify food images. ...
... In recent years, most research in food image classification has focused on hand-crafted features that consist of a color histogram [10,21], local binary pattern (LBP) [10,15], scale invariant feature transform (SIFT) [10], histogram of oriented gradients (HOG) [10,21], and speeded up robust feature (SURF) [2]. These handcrafted methods are combined with machine learning algorithms to classify food images. ...
... In recent years, most research in food image classification has focused on hand-crafted features that consist of a color histogram [10,21], local binary pattern (LBP) [10,15], scale invariant feature transform (SIFT) [10], histogram of oriented gradients (HOG) [10,21], and speeded up robust feature (SURF) [2]. These handcrafted methods are combined with machine learning algorithms to classify food images. ...
Conference Paper
Full-text available
The real-world food image is a challenging problem for food image classification, because food images can be captured from different perspective and patterns. Also, many objects can appear in the image, not just foods. To recognize food images, in this paper, we propose a modified MobileNet architecture that is applies the global average pooling layers to avoid overfitting the food images, batch normalization, rectified linear unit, dropout layers, and the last layer is softmax. The state-of-the-art and the proposed MobileNet architectures are trained according to the fine-tuned model. The experimental results show that the proposed version of the MobileNet architecture achieves significantly higher accuracies than the original MobileNet architecture. The proposed MobileNet architecture significantly outperforms other architectures when the data augmentation techniques are combined.
... Manual recognition sometimes becomes tedious as there is a requirement for the prior information regarding the categories of the food and their appearances [6]. Moreover, food recognition and quantity estimation algorithms are employed, and more often, mobiles and systems are used for dietary monitoring purposes [15][16][17][18]. Few of the dietary management systems include FoodLog [19], FoodCam [20], Menu-Match [21], and DietCam [22]. ...
... • The main challenge is the appearance of the food is different foods appear to be the same, and the same foods vary in their appearance. The factor that is responsible for the appearance of the food includes the cooking methods, recipes, and chef's personal preference [18]. • Even though the ingredients in the food could be recognized sharply, the food types could not be recognized properly because of their unstructured and random ingredient distribution [6]. ...
... The accuracy and usability have to be improved Martinel et al. [18] Food recognition Committee classification, structural support vector machine the robust features. The detailed look-up of the segmentation is experienced in this section. ...
Article
Full-text available
The vulnerabilities of the health issues have resulted in the alternative to manage the situation, which ensures the betterment in life. The dietary assessment stands as an effective solution for most of the health vulnerabilities and the automatic assessment takes-off the manual procedure of assessing the food intake. This paper introduces an automatic method of dietary assessment by proposing the Imperialist Competitive Algorithm (IpCA)-based Deep Belief Network (IpCA-DBN) for food category recognition and the calorie estimation of the food. Initially, the food image is pre-processed and subjected to the segmentation process, which is done by the Bayesian Fuzzy Clustering. Then, the features, such as shape, color histogram, wavelet, scattering transform features are generated from the optimal segments. Finally, these features are fed to the IpCA-DBN for recognizing the food category and estimating the calorie of the food. The experimentation performed using the UNIMIB2016 dataset enables the effective analysis of the proposed method in terms of the metrics, such as Macro Average Accuracy (MAA), Standard Accuracy (SA), and Mean Square Error (MSE). The analysis proves that the proposed method outperforms the existing methods and attained 0.9643 for MAA, 0.9877 for SA, and 1816.9 for MSE.
... In this case, classic feature descriptors like speeded up robust features (SURF), histogram of gradient (HOG), spatial pyramidal pooling, bag of scale invariant feature transform (SIFT), color correlogram, etc. can only succeed when used for laboratory generated small datasets. Typically, a SVM is trained using these extracted features or a combination of these features [9]. To improve accuracy for this task, researchers have worked towards estimating the region of the image in which the food item is present. ...
... For comparisons, fine-tuned deep learned models are used from the popular AlexNet [14], GoogLeNet [25], residual network (ResNet) [26], for all the methods, procedures are used as reported in [16]. We also compared the new method (SSGAN) with the recently proposed Lukas et al. [23], Kawano et al. [27], Martinel et al. [9] and the ensemble of networks (Ensemble Net) in [16]. ...
... The idea is to increase the robustness of any classifier trained on the dataset. For our experiments, we follow the same training and testing protocols as discussed in [9], [23]. Fig. 2 (a) shows the accuracy vs ranks plots up to top 10 ranks, where the rank r : r {1, 2, ..., 10} denotes the probability of retrieving at least one correct image among the top r retrieved images. ...
Preprint
Full-text available
Traditional machine learning algorithms using hand-crafted feature extraction techniques (such as local binary pattern) have limited accuracy because of high variation in images of the same class (or intra-class variation) for food recognition task. In recent works, convolutional neural networks (CNN) have been applied to this task with better results than all previously reported methods. However, they perform best when trained with large amount of annotated (labeled) food images. This is problematic when obtained in large volume, because they are expensive, laborious and impractical. Our work aims at developing an efficient deep CNN learning-based method for food recognition alleviating these limitations by using partially labeled training data on generative adversarial networks (GANs). We make new enhancements to the unsupervised training architecture introduced by Goodfellow et al. (2014), which was originally aimed at generating new data by sampling a dataset. In this work, we make modifications to deep convolutional GANs to make them robust and efficient for classifying food images. Experimental results on benchmarking datasets show the superiority of our proposed method as compared to the current-state-of-the-art methodologies even when trained with partially labeled training data.
... In this case, classic feature descriptors like speeded up robust features (SURF), histogram of gradient (HOG), spatial pyramidal pooling, bag of scale invariant feature transform (SIFT), color correlogram, etc. can only succeed when used for laboratory generated small datasets. Typically, a SVM is trained using these extracted features or a combination of these features [9]. To improve accuracy for this task, researchers have worked towards estimating the region of the image in which the food item is present. ...
... For comparisons, fine-tuned deep learned models are used from the popular AlexNet [14], GoogLeNet [25], residual network (ResNet) [26], for all the methods, procedures are used as reported in [16]. We also compared the new method (SSGAN) with the recently proposed Lukas et al. [23], Kawano et al. [27], Martinel et al. [9] and the ensemble of networks (Ensemble Net) in [16]. ...
... The idea is to increase the robustness of any classifier trained on the dataset. For our experiments, we follow the same training and testing protocols as discussed in [9], [23]. Fig. 2 (a) shows the accuracy vs ranks plots up to top 10 ranks, where the rank r : r {1, 2, ..., 10} denotes the probability of retrieving at least one correct image among the top r retrieved images. ...
Article
Full-text available
Traditional machine learning algorithms using hand-crafted feature extraction techniques (such as local binary pattern) have limited accuracy because of high variation in images of the same class (or intraclass variation) for food recognition tasks. In recent works, convolutional neural networks (CNNs) have been applied to this task with better results than all previously reported methods. However, they perform best when trained with large amount of annotated (labeled) food images. This is problematic when obtained in large volume, because they are expensive, laborious, and impractical. This article aims at developing an efficient deep CNN learning-based method for food recognition alleviating these limitations by using partially labeled training data on generative adversarial networks (GANs). We make new enhancements to the unsupervised training architecture introduced by Goodfellow et al. , which was originally aimed at generating new data by sampling a dataset. In this article, we make modifications to deep convolutional GANs to make them robust and efficient for classifying food images. Experimental results on benchmarking datasets show the superiority of our proposed method, as compared to the current state-of-the-art methodologies, even when trained with partially labeled training data.
... A Recent report by the World Health Organisation (WHO) suggests that the tremendous increase in various diseases such as heart problems, lung infection, cancer, and diabetes occur due to the wrong food intake and obesity to various activities. These diseases are occurred due to the inefficient or the excessive intake of the food items in the daily life [1]. Food obesity has tremendously increased due to the wrong intake of the food items. ...
... Once the features are attained, then the food items are classified by the competent classifiers. Such classifiers are named as Support Vector Machine (SVM) [7], Extreme Learning Machine (ELM) [1], etc. The nutritional value of the food item is not known by most of the person [20]. ...
... Consequently, the extracted features were classified by the KNN and SVM classifier which can help to measure the calorie value efficiently. Then, the food recognition system based on a group classification was demonstrated in [1]. Niki Martinel et al [1] provided a system capable of automatically choosing the optimal features for food recognition using Extreme Learning Machine (ELM). ...
Article
Full-text available
The calorie value of the food items taken by the person in everyday life needs to be monitored to reduce the risk of obesity, heart problems, and diabetes, etc. The calorie estimator in the existing models has reduced accuracy since the calorie value of each food varies with mass. This paper introduces a dietary assessment system based on the proposed Cauchy, Generalized T-Student, and Wavelet kernel based Wu-and-Li Index Fuzzy clustering (CSW-WLIFC) based segmentation and the proposed Whale Levenberg Marquardt Neural Network (WLM-NN) classifier. The proposed CSW-WLIFC based segmentation segments the image based on the existing WLI-FC algorithm. A novel CSW based kernel function is utilized in the segmentation process. Feature vectors such as color, shape, and texture are extracted from the segmented image. The Neural Network is trained with the Whale-Levenberg Marquardt (WLM) model to recognize each food item from the tray image. The proposed calorie estimator calculates the calorie value of each food item. From the simulation results, it is evident that the proposed model has the improved performance than the existing models with the values of 0.999, 0.9643, 0.9627, and 0.0184 for the segmentation accuracy, macro average accuracy, standard accuracy, mean square error, respectively.
... DCNN architecture is fine-tuned using Inception model in [9]. Extreme Learning machine(ELM) and Support Vector Machine(SVM) are used for food image recognition in [24]. In [37], attention mechanism is used for enhancing performance of CNN model for food image recognition. ...
... Martinel et al. [24] 84.3 ...
Article
Full-text available
The food consumption has a direct effect on the health of an individual. Eating food without awareness of its ingredients may result in eating style-based diseases such as hypertension, diabetes, and several others. As per recent WHO survey, the number of persons with hypertension is very large in numbers. There is essentially a need of novel technique that can provide food recommendation to hypertensive persons, out of their multi-food items in their meals. In this research work, Indian multi-food items of the meal are recognized using fine-tuned deep convolutional neural network model. Further, in existing research works, only single food image is recognized, which is not relevant to real-life food consumption. In our proposed approach, contour-based image segmentation technique is used for multi-food meal. In existing research works, no dataset is available on Indian food items for hypertensive persons. The key contribution of this research work is the preparation of Indian food dataset of 30 classes for hypertensive patients. There are 15 Recommended food classes for the hypertensive person and 15 classes are not recommended foods to maintain the class balance (as calibrated through a professional dietitian) (Dr. Shuchi Upadhyay, Dietitian and Nutrition expert, UPES, Dehradun). The novel contribution is to present ‘IndianFood30’ dataset of hypertensive patients for research purposes. Further, a novel IndianFoodNet model is presented which is trained on these 30 Indian food classes. Several pre-trained models are available for research purposes, but there is no pre-trained model on Indian food for hypertensive persons. Food ingradients exhibit high intra-class variance, and these complex features are extracted using our proposed approach. The accuracy of the proposed approach is compared with state-of-the-art models such as VGGNet, Inception V3, GoogleNet, and ResNet. Our proposed approach is also compared with some recent techniques on some of the existing datasets such as UEC Food-100, UEC Food-256, and Food-101 datasets to show the performance and effectiveness of the proposed model. Experiment analysis validates that our proposed approach outperforms existing approaches significantly.
... The results were shown in Table 5. Linear SVM and fast χ 2 kernel were proposed to provide real-time food recognition on mobile phones with an accuracy of 53.50% [36]. Martinel et al. suggested Extreme Learning Machine to slightly improve the performance of SVM [37]. Deep CNN was first tested by Bossard et al. on this dataset to achieve an accuracy rate of 56.40% [38]. ...
... The Proposed Model LNAS-NET 75.90% Kawano et al. [36] Liner SVM with a fast χ 2 kernal 53.50% Martinel et al. [37] Extreme Learning Machine 55.89% Bossard et al. [38] Deep CNN 56.40% Yanai et al. [7] Pre-trained DCNN 70.41% Pandey et al. [39] Ensemble CNN 72.12% ...
Article
Full-text available
Healthy eating is an essential element to prevent obesity that will lead to chronic diseases. Despite numerous efforts to promote the awareness of healthy food consumption, the obesity rate has been increased in the past few years. An automated food recognition system is needed to serve as a fundamental source of information for promoting a balanced diet and assisting users to understand their meal consumption. In this paper, we propose a novel Lightweight Neural Architecture Search (LNAS) model to self-generate a thin Convolutional Neural Network (CNN) that can be executed on mobile devices with limited processing power. LNAS has a sophisticated search space and modern search strategy to design a child model with reinforcement learning. Extensive experiments have been conducted to evaluate the model generated by LNAS, namely LNAS-NET. The experimental result shows that the proposed LNAS-NET outperformed the state-of-the-art lightweight models in terms of training speed and accuracy metric. Those experiments indicate the effectiveness of LNAS without sacrificing the model performance. It provides a good direction to move toward the era of AutoML and mobile-friendly neural model design.
... The particular features used in each previous study were more successful on some categories than others and no single type of feature was able to recognise all categories equally well. This situation is also noted in [29]. Evaluation must be performed to investigate the features suitable to provide a holistic feature representation that can cope with the pattern diversity for food recognition. ...
... Hence, it is less practical for large scale-applications that involve many object categories [67]. Fisher Vector has also been used by [69] and the Improved Fisher Vector has been employed by [21], [22], [29]. Aside from Fisher encoding, sparse encoding techniques have also been adopted to encode SIFT descriptors in order to overcome the limitations of hard assignment [70]. ...
... In the earlier years, various hand-crafted features, such as color, texture and SIFT are utilized for food recognition [8], [41], [42]. In the deep learning era, because of its powerful capacity of feature representation, more and more works resort to different deep networks for food recognition [9], [10], [11], [36]. ...
... Top-5 Acc. AlexNet-CNN [8] 56.40 -SELC [42] 55.89 -ResNet-152+SVM-RBF [69] 64.98 -DCNN-FOOD(AlexNet) [70] 70.41 -LMBM(GoogLeNet) [71] 72.11 -EnsembleNet [72] 72.12 91.61 GoogLeNet [73] 78.11 -DeepFOOD(GoogLeNet) [74] 77 ods, such as MSMVFA [12], our method also obtains the highest accuracy 90.74% for Top-1 classification accuracy. When we use the trained backbone model from Food2K, namely PRENet (SENet154+Pretrained), there is further performance improvement. ...
Preprint
Food recognition plays an important role in food choice and intake, which is essential to the health and well-being of humans. It is thus of importance to the computer vision community, and can further support many food-oriented vision and multimodal tasks. Unfortunately, we have witnessed remarkable advancements in generic visual recognition for released large-scale datasets, yet largely lags in the food domain. In this paper, we introduce Food2K, which is the largest food recognition dataset with 2,000 categories and over 1 million images.Compared with existing food recognition datasets, Food2K bypasses them in both categories and images by one order of magnitude, and thus establishes a new challenging benchmark to develop advanced models for food visual representation learning. Furthermore, we propose a deep progressive region enhancement network for food recognition, which mainly consists of two components, namely progressive local feature learning and region feature enhancement. The former adopts improved progressive training to learn diverse and complementary local features, while the latter utilizes self-attention to incorporate richer context with multiple scales into local features for further local feature enhancement. Extensive experiments on Food2K demonstrate the effectiveness of our proposed method. More importantly, we have verified better generalization ability of Food2K in various tasks, including food recognition, food image retrieval, cross-modal recipe retrieval, food detection and segmentation. Food2K can be further explored to benefit more food-relevant tasks including emerging and more complex ones (e.g., nutritional understanding of food), and the trained models on Food2K can be expected as backbones to improve the performance of more food-relevant tasks. We also hope Food2K can serve as a large scale fine-grained visual recognition benchmark.
... Recent studies have explored extreme learning machine for food classification due to fast training time and good performance [44]. Supervised Extreme Learning Committee, uses multiple kernel extreme learning as committee members. ...
... The best accuracy for PFID was reported by SELC which has been improved by 9.1%. The proposed approach has less training and testing time as it takes only 10 percent of initially available class data as compared to SELC which uses all data for computing kernel matrix [44]. The most important part is that proposed method adds new classes Fig. 18 in an incremental fashion shows the comparison of Food101 against the state of the art. ...
Article
Full-text available
State-of-the-art deep learning models for food recognition do not allow data incremental learning and often suffer from catastrophic interference problems during the class incremental learning. This is an important issue in food recognition since real-world food datasets are open-ended and dynamic, involving a continuous increase in food samples and food classes. Model retraining is often carried out to cope with the dynamic nature of the data, but this demands high-end computational resources and significant time. This paper proposes a new open-ended continual learning framework by employing transfer learning on deep models for feature extraction, Relief F for feature selection, and a novel adaptive reduced class incremental kernel extreme learning machine (ARCIKELM) for classification. Transfer learning is beneficial due to the high generalization ability of deep learning features. Relief F reduces computational complexity by ranking and selecting the extracted features. The novel ARCIKELM classifier dynamically adjusts network architecture to reduce catastrophic forgetting. It addresses domain adaptation problems when new samples of the existing class arrive. To conduct comprehensive experiments, we evaluated the model against four standard food benchmarks and a recently collected Pakistani food dataset. Experimental results show that the proposed framework learns new classes incrementally with less catastrophic inference and adapts domain changes while having competitive classification performance.
... Once the features are attained, the food items are classified by the proficient classifiers. Such classifiers are named as Support Vector Machine (SVM) [1], Extreme Learning Machine (ELM) [2], etc. Moreover, peoples are unconscious about evaluating or controlling their daily calorie of food intake due to their nutritional knowledge, irregular eating patterns of their meals or lack of self-control in front of foods [15]. ...
... Therefore, the extracted features of every food items be classified using the KNN and SVM classifier which efficiently measure the calorie value of each input food images. Then, the food detection system based on a committee classification was established in [2]. ...
... In these methods, SVM was most seen and often used collaboratively with other DL methods. Different networks were seen in DL methods, such as GoogLeNet [73,74,78,80,82,[85][86][87][88], MobileNetV2 [69,79], AlexNet [78,85,86], Inception-V3 [82,[88][89][90][91], NutriNet [78,86,92], K-foodNet, and Very deep convolutional neural network [85], DenseNet161 [82], fully convolutional networks (FCN) [86,[92][93][94], YOLO [93,95,96], extreme learning machine (ELM) extreme [97][98][99], neural trees [97], graph convolutional networks(GCN) [100], deep learning PDE model (DPM) [83], SibNet [101], VGG16 or VGG365 [82,85,94,102], ResNet, ResNet50 and ResNet152 [78,80,82,85,86,[88][89][90]103,104], EfficientNet [105], EfficientDet [106], faster-RCNN [46,90,104]. GoogLeNet and ResNet, with their variant, were the most popular. ...
Article
Full-text available
Food and fluid intake monitoring are essential for reducing the risk of dehydration,malnutrition, and obesity. The existing research has been preponderantly focused on dietary moni-toring, while fluid intake monitoring, on the other hand, is often neglected. Food and fluid intakemonitoring can be based on wearable sensors, environmental sensors, smart containers, and thecollaborative use of multiple sensors. Vision-based intake monitoring methods have been widelyexploited with the development of visual devices and computer vision algorithms. Vision-basedmethods provide non-intrusive solutions for monitoring. They have shown promising performancein food/beverage recognition and segmentation, human intake action detection and classification,and food volume/fluid amount estimation. However, occlusion, privacy, computational efficiency,and practicality pose significant challenges. This paper reviews the existing work (253 articles) onvision-based intake (food and fluid) monitoring methods to assess the size and scope of the availableliterature and identify the current challenges and research gaps. This paper uses tables and graphs todepict the patterns of device selection, viewing angle, tasks, algorithms, experimental settings, andperformance of the existing monitoring systems.
... However, identification of food involves challenges due to varying recipes and presentation styles used to prepare food all around the globe, resulting in different feature sets [84]. For instance, the shape and texture of a salad containing vegetables differ from the shape and texture of a salad containing fruits. ...
Article
Full-text available
Dietary studies showed that dietary problems such as obesity are associated with other chronic diseases, including hypertension, irregular blood sugar levels, and increased risk of heart attacks. The primary cause of these problems is poor lifestyle choices and unhealthy dietary habits, which are manageable using interactive mHealth apps. However, traditional dietary monitoring systems using manual food logging suffer from imprecision, underreporting, time consumption, and low adherence. Recent dietary monitoring systems tackle these challenges by automatic assessment of dietary intake through machine learning methods. This survey discusses the best-performing methodologies that have been developed so far for automatic food recognition and volume estimation. Firstly, the paper presented the rationale of visual-based methods for food recognition. Then, the core of the study is the presentation, discussion, and evaluation of these methods based on popular food image databases. In this context, this study discusses the mobile applications that are implementing these methods for automatic food logging. Our findings indicate that around 66.7% of surveyed studies use visual features from deep neural networks for food recognition. Similarly, all surveyed studies employed a variant of convolutional neural networks (CNN) for ingredient recognition due to recent research interest. Finally, this survey ends with a discussion of potential applications of food image analysis, existing research gaps, and open issues of this research area. Learning from unlabeled image datasets in an unsupervised manner, catastrophic forgetting during continual learning, and improving model transparency using explainable AI are potential areas of interest for future studies.
... The revolution of big data and social media analytics technologies provides valuable encouragement that useful knowledge and information can be discovered from the massive volume of food images in social media, including trends of food consumption, eating habits and behaviour, and preferences for foods and restaurants (De Choudhury et al., 2016;Fried et al., 2015;Rich et al., 2016). In previous research, the dense sampling and Different of Gaussian (DoG) are the two common interest points sampling used in earlier studies in food recognition (Kawano & Yanai, 2015;Martinel et al., 2016;Sasano et al., 2016). Inevitably, features will be extracted from irrelevant interest points (i.e. from the background, especially if it is complex) (Altintakan & Yazici, 2015) and will generate less informative descriptions regardless of the sampling techniques being used. ...
Article
Full-text available
The visual analysis of foods on social media by using food recognition algorithm provides valuable insight from the health, cultural and marketing. Food recognition offers a means to automatically recognise foods as well the useful information such as calories and nutritional estimation by using image processing and machine learning technique. The interest points in food image can be detected effectively by using Maximally Stable Extremal Region (MSER). As MSER used global segmentation and many food images have a complex background, there are numerous irrelevant interest points are detected. These interest points are considered as noises that lead to computation burden in the overall recognition process. Therefore, this research proposes an Extremal Region Selection (ERS) algorithm to improve MSER detection by reducing the number of irrelevant extremal regions by using unsupervised learning based on the k-means algorithm. The performance of ERS algorithm is evaluated based on the classification performance metrics by using classification rate (CR), error rate (ERT), precision (Prec.) and recall (rec.) as well as the number of extremal regions produced by ERS. UECFOOD-100 and UNICT-FD1200 are the two food datasets used to benchmark the proposed algorithm. The results of this research have found that the ERS algorithm by using optimum parameters and thresholds, be able to reduce the number of extremal regions with sustained classification performance.
... In 2016, [14] Martinel et al. tackled the problem of food recognition by considering a combination of several features such as colour, shape, and other characteristics strictly related to this specific topic. The initial top-1 performance of 84.3% of on the Food 100 database was further improved, in 2018, when Martinel et al. proposed a Wide-Slice Residual Network (WISeR), which aims to capture the food structure by concatenating two deep CNNs, [15]. ...
... The Supervised Extreme Learning Committee (SELC) takes as many features as possible but shows just the features which are proposed for the classification of the food. Each ELM presented a particular type feature [13]. The classification rate of 55.8 % is reached in the approach of recognizing multiple images by detecting candidate regions and classifying them with various features [12]. ...
Article
Full-text available
Food Recognition is an essential topic in the area of computer of its target applications is to avoid achieving a cashier at the dining place. In this paper, we investigate the application of Deep Transfer Learning for food recognition. We fine-tune three well learning models namely; AlexNet, GoogleNet, and Vgg16. The fine tuning procedure depends on removing the last three layers of each model and adds another five new layers. The training and validation of each model conducted through food a dataset collected from our university's canteen. The dataset contains 39 food types, 20 images for each type. The fine-tuned models show similar training and validation performance and achieved 100% accuracy over the small-scale dataset.
... Food recognition is an emerging topic in the field of computer vision, which is growing rapidly. Martinel et al. [Martinel, Piciarelli and Micheloni (2016)] provided a system called Supervised Extreme Learning Committee (SELC), which extracted as many different features as possible but exploited only a subset of those for food classification. He et al. [He, Kong and Tan (2016)] proposed an automatic food classification approach, DietCam, using a texture verification model and a combination of a deformable partbased model for detecting food ingredients. ...
... features are such as shape-based features [15][16][17], and color and texture [18,19]. According to the variety of foods and their ingredients, food recognition based on local features is limited and usually is time-consuming due to the complexity of the algorithm. ...
Article
Full-text available
Like many countries worldwide, Thailand will become complete aging society shortly. The challenging of an aging society in the digital era is to enhance the quality of life for seniors through the employment of advanced and modern technology. This study proposes a smart care environment with food recognition module for personal healthcare purpose. More specifically, it is the mobile application for promoting personalized support for seniors. With context-aware perspective, the proposed environment employs clinical data and personal data for user modeling. It is designed to have the user-friendly interface providing convenient use for the seniors. Additionally, food recognition module is integrated for gathering real-time energy consumption with less distraction to the seniors. It is trained with a set of Thai food images using a convolution neural network. The case study is conducted with 50 Thai seniors in Chiang Rai, Thailand. Overall, the seniors strongly agree on both provided functional and personalized support. Also, they strongly agree that food recognition module can engage them to use this developed care environment.
... In recent reports from the World Health Organization, there is a quick improvement in life-threatening diseases like heart problems, cancer, and diabetes. Which occurs mainly based on the wrong food intake of humans, and lack of exercises in daily life [1]. Along with this, the obesity problems in humans are rapidly increased due to unplanned food chart, so there is a need to build a dietary management system [2] to calculate the consumption of calorie with every food item by a person. ...
... In [15], a novel framework was proposed by combining CNN and kernel ELM for facial age estimation. Martinel et al. proposed an ensemble way for food image recognition named ELM committee [42]. In the existing approaches for classification, the pre-trained CNN was treated as the feature extractor, and ELM was adopted as the classifier based on the extracted features [11]. ...
Article
Full-text available
Convolutional Neural Networks (CNNs) have been established as a powerful class of models for image classification and related tasks. However, the fully-connected layers in CNN are not robust enough to serve as a classifier to discriminate deep convolutional features, due to the local minima problem of back-propagation. Kernel Extreme Learning Machines (KELMs), known as an outstanding classifier, can not only converge extremely fast but also ensure an outstanding generalization performance. In this paper, we propose a novel image classification framework, in which CNN and KELM are well integrated. In our work, Densely connected network (DenseNet) is employed as the feature extractor, while a radial basis function kernel ELM instead of linear fully connected layer is adopted as a classifier to discriminate categories of extracted features to promote the image classification performance. Experiments conducted on four publicly available datasets demonstrate the promising performance of the proposed framework against the state-of-the-art methods.
... Compared with [Kitamura et al. 2009], ] improved the food image retrieval system by supporting both image-based and text-based query. [Barlacchi et al. 2016] introduced [Anthimopoulos et al. 2014] SIFT, Color -Food recognition [Oliveira et al. 2014] Color, Texture -Mobile food recognition [Kawano and Yanai 2014c] HoG, Color -Mobile food recognition [Farinella et al. 2015a] SIFT, Texture, Color -Food recognition [Martinel et al. 2015] Color, Shape, Texture -Food recognition [Bettadapura et al. 2015] SIFT, Color Location & Menu Restaurant-specific food recognition [Farinella et al. 2015b] SIFT, SPIN -Food recognition SIFT, Color, HoG -Mobile food recognition [Ravl et al. 2015] HoG, Texture, Color -Mobile food recognition [Martinel et al. 2016] SIFT, Color, Shape, Texture -Food recognition [He et al. 2017] Texture -Food recognition SIFT, Color -Food recognition a search engine for restaurant retrieval based on dishes a user would like to taste rather than using the name of food facilities or their general categories. Besides food/recipe retrieval, recently, there are some works on cross-modal recipe-image retrieval. ...
Preprint
Full-text available
Food is essential for human life and it is fundamental to the human experience. Food-related study may support multifarious applications and services, such as guiding the human behavior, improving the human health and understanding the culinary culture. With the fast development of social networks, mobile networks, and Internet of Things (IoT), people commonly upload, share, and record food images, recipes, cooking videos, and food diaries leading to large-scale food data. Large-scale food data offers rich knowledge about food and can help tackle many central issues of human society. Therefore, it is time to group several disparate issues related to food computing. Food computing acquires and analyzes heterogenous food data from disparate sources for perception, recognition, retrieval, recommendation, and monitoring of food. In food computing, computational approaches are applied to address food related issues in medicine, biology, gastronomy and agronomy. Both large-scale food data and recent breakthroughs in computer science are transforming the way we analyze food data. Therefore, a vast amount of research work has been done in the food area, targeting different food-oriented tasks and applications. However, there are very few systematic reviews, which shape this area well and provide a comprehensive and in-depth summary of current efforts or detail open problems in this area. In this paper, we formalize food computing and present such a comprehensive overview of various emerging concepts, methods, and tasks. We summarize key challenges and future directions ahead for food computing. This is the first comprehensive survey that targets the study of computing technology for the food area and also offers a collection of research studies and technologies to benefit researchers and practitioners working in different food-related fields.
... Since these models employ low-level local features with IFV encoding to represent mid-level FPs, their recognition power is limited. Recently, a supervised extreme learning committee (SELC) has been developed which trains a series of ELCs over different image features to automatically choose optimal features for food recognition [24]. Benefiting from the ability to learn and represent powerful feature representations with labelled data, recently developed deep learning approaches achieve state-of-the-art performances in several food image recognition problems. ...
Article
Full-text available
There has been a growing interest in food image recognition for a wide range of applications. Among existing methods, mid-level image part-based approaches show promising performances due to their suitability for modelling deformable food parts (FPs). However, the achievable accuracy is limited by the FP representations based on low-level features. Benefiting from the capacity to learn powerful features with labelled data, deep learning approaches achieved state-of-the-art performances in several food image recognition problems. Both mid-level-based approaches and deep convolutional neural networks (DCNNs) approaches clearly have their respective advantages, but perhaps most importantly these two approaches can be considered complementary. As such, the authors propose a novel framework to better utilise DCNN features for food images by jointly exploring the advantages of both the mid-level-based approaches and the DCNN approaches. Furthermore, they tackle the challenge of training a DCNN model with the unlabelled mid-level parts data. They accomplish this by designing a clustering-based FP label mining scheme to generate part-level labels from unlabelled data. They test on three benchmark food image datasets, and the numerical results demonstrate that the proposed approach achieves competitive performance when compared with existing food image recognition approaches.
... Analysis and understanding of food images is a challenging Computer Vision task which has gathered much interest of the research community due to its potential impact on the quality of life of modern society [1]. In this context, the main problems considered by the community are related to the discrimination of food images vs other images [11,16,17], the detection/localization of food in images [18,24], the recognition and classification of the food depicted in an image [20][21][22], the segmentation of food images to distinguish the different parts and ingredients [23,24,26], the estimation of the volume and nutrients contained in a food plate detected in an image [25,27,28]. A big issue in this application domain is the availability of public datasets, as well as the lack of common procedures for testing and evaluation of the different tasks. ...
Article
Full-text available
Deep learning-based image classification networks heavily rely on the extracted features. However, as the model becomes deeper, important features may be lost, resulting in decreased accuracy. To tackle this issue, this paper proposes an image classification method that enhances low-level features and incorporates an attention mechanism. The proposed method employs EfficientNet as the backbone network for feature extraction. Firstly, the Feature Enhancement Module quantifies and statistically processes low-level features from shallow layers, thereby enhancing the feature information. Secondly, the Convolutional Block Attention Module enhances the high-level features to improve the extraction of global features. Finally, the enhanced low-level features and global features are fused to supplement low-resolution global features with high-resolution details, further improving the model’s image classification ability. Experimental results illustrate that the proposed method achieves a Top-1 classification accuracy of 86.49% and a Top-5 classification accuracy of 96.90% on the ETH-Food101 dataset, 86.99% and 97.24% on the VireoFood-172 dataset, and 70.99% and 92.73% on the UEC-256 dataset. These results demonstrate that the proposed method outperforms existing methods in terms of classification performance.
Conference Paper
This study is one of many that investigate the relationship between determining the nutritional ingredients in food and calculating the calories using data analysis utilizing machine learning techniques. Due to the availability of multifood photos, which must be cropped before processing, the Indian food recipes database is used for the research. The study uses a large dataset of various food photos to train state-of-the-art deep convolutional neural networks (CNNs) to recognize and categorize distinct food items with an amazing 99.89% accuracy. This study’s applicability spans several sectors in addition to food recognition, including calorie measurement, meal planning services, and nutritional monitoring systems. The solution is widely available to a wide range of users thanks to a user-friendly web interface. The system’s 99.89% accuracy in food detection and calorie measurement demonstrates its dependability and distinguishes it from competing options. Its ability to improve individual health, fight obesity, and encourage healthy eating habits makes it a vital tool in today’s health-conscious culture.
Article
Food recognition plays an important role in food choice and intake, which is essential to the health and well-being of humans. It is thus of importance to the computer vision community, and can further support many food-oriented vision and multimodal tasks, e.g., food detection and segmentation, cross-modal recipe retrieval and generation. Unfortunately, we have witnessed remarkable advancements in generic visual recognition for released large-scale datasets, yet largely lags in the food domain. In this paper, we introduce Food2K, which is the largest food recognition dataset with 2,000 categories and over 1 million images. Compared with existing food recognition datasets, Food2K bypasses them in both categories and images by one order of magnitude, and thus establishes a new challenging benchmark to develop advanced models for food visual representation learning. Furthermore, we propose a deep progressive region enhancement network for food recognition, which mainly consists of two components, namely progressive local feature learning and region feature enhancement. The former adopts improved progressive training to learn diverse and complementary local features, while the latter utilizes self-attention to incorporate richer context with multiple scales into local features for further local feature enhancement. Extensive experiments on Food2K demonstrate the effectiveness of our proposed method. More importantly, we have verified better generalization ability of Food2K in various tasks, including food image recognition, food image retrieval, cross-modal recipe retrieval, food detection and segmentation. Food2K can be further explored to benefit more food-relevant tasks including emerging and more complex ones (e.g., nutritional understanding of food), and the trained models on Food2K can be expected as backbones to improve the performance of more food-relevant tasks. We also hope Food2K can serve as a large scale fine-grained visual recognition benchmark, and contributes to the development of large scale fine-grained visual analysis. The dataset, code and models are publicly available at http://123.57.42.89/FoodProject.html .
Article
Food is one of the most important requirements of every living being on earth. The human beings require their food to be fresh, pure and of standard quality. The standards imposed and automation carried out in food processing industry takes care of food quality. Now a day, people across the universe are becoming more sensitive to their diet. Food recognition and calorie measurement project will describe the relationship between nutritional ingredients identification in food and inspecting calories through Machine Learning models to perform the data analysis, the experiments on real life dataset to show that our method improves the performance with efficient accuracy. Also, our System will recommend food for some different age groups. This work is able to identify the Nutrition that we may get effected by lacking of certain nutritional ingredients in our body and recommends the food that can benefit the rehabilitation of those Age groups. To achieve high accuracy and low time complexity, the proposed system implemented using CNN Machine Learning models.
Preprint
Full-text available
One of the most urgent necessities of all individuals is food. Nowadays people mainly focus on diet because of diabetes, and obesity problems. When we have a healthy diet, it leads to a healthy life span. So, this paper gives a solution for this type of common problem. In this paper, we calculate nutritionally and calorie content utilizing image processing, display in the food segmentation and classification process, etc. This is very useful for dieticians despite the reality Ordinary people can benefit from this as well by manipulating their routine food patterns.
Preprint
Full-text available
One of the most urgent necessities of all individuals is food. Nowadays people mainly focus on diet because of diabetes, and obesity problems. When we have a healthy diet, it leads to a healthy life span. So, this paper gives a solution for this type of common problem. In this paper, we calculate nutritionally and calorie content utilizing image processing, display in the food segmentation and classification process, etc. This is very useful for dieticians despite the reality Ordinary people can benefit from this as well by manipulating their routine food patterns.
Article
Recognizing the category and its ingredient composition from food images facilitates automatic nutrition estimation, which is crucial to various health relevant applications, such as nutrition intake management and healthy diet recommendation. Since food is composed of ingredients, discovering ingredient-relevant visual regions can help identify its corresponding category and ingredients. Furthermore, various ingredient relationships like co-occurrence and exclusion are also critical for this task. For that, we propose an ingredient-oriented multi-task food category-ingredient joint learning framework for simultaneous food recognition and ingredient prediction. This framework mainly involves learning an ingredient dictionary for ingredient-relevant visual region discovery and building an ingredient-based semantic-visual graph for ingredient relationship modeling. To obtain ingredient-relevant visual regions, we build an ingredient dictionary to capture multiple ingredient regions and obtain the corresponding assignment map, and then pool the region features belonging to the same ingredient to identify the ingredients more accurately and meanwhile improve the classification performance. For ingredient-relationship modeling, we utilize the visual ingredient representations as nodes and the semantic similarity between ingredient embeddings as edges to construct an ingredient graph, and then learn their relationships via the graph convolutional network to make label embeddings and visual features interact with each other to improve the performance. Finally, fused features from both ingredient-oriented region features and ingredient-relationship features are used in the following multi-task category-ingredient joint learning. Extensive evaluation on three popular benchmark datasets (ETH Food-101, Vireo Food-172 and ISIA Food-200) demonstrates the effectiveness of our method. Further visualization of ingredient assignment maps and attention maps also shows the superiority of our method.
Article
In the last years, several works on automatic image-based food recognition have been proposed, often based on texture feature extraction and classification. However, there is still a lack of proper comparisons to evaluate which approaches are better suited for this specific task. In this work, we adopt a Random Forest classifier to measure the performances of different texture filter banks and feature encoding techniques on three different food image datasets. Comparative results are given to show the performance of each considered approach, as well as to compare the proposed Random Forest classifiers with other feature-based state-of-the-art solutions.
Article
Food recognition systems recently garnered much research attention in the relevant field due to their ability to obtain objective measurements for dietary intake. This feature contributes to the management of various chronic conditions. Challenges such as inter and intraclass variations alongside the practical applications of smart glasses, wearable cameras, and mobile devices require resource-efficient food recognition models with high classification performance. Furthermore, explainable AI is also crucial in health-related domains as it characterizes model performance, enhancing its transparency and objectivity. Our proposed architecture attempts to address these challenges by drawing on the strengths of the transfer learning technique upon initializing MobiletNetV3 with weights from a pre-trained model of ImageNet. The MobileNetV3 achieves superior performance using the squeeze and excitation strategy, providing unequal weight to different input channels and contrasting equal weights in other variants. Despite being fast and efficient, there is a high possibility for it to be stuck in the local optima like other deep neural networks, reducing the desired classification performance of the model. Thus, we overcome this issue by applying the snapshot ensemble approach as it enables the M model in a single training process without any increase in the required training time. As a result, each snapshot in the ensemble visits different local minima before converging to the final solution which enhances recognition performance. On overcoming the challenge of explainability, we argue that explanations cannot be monolithic, since each stakeholder perceive the results’, explanations based on different objectives and aims. Thus, we proposed a user-centered explainable artificial intelligence (AI) framework to increase the trust of the involved parties by inferencing and rationalizing the results according to needs and user profile. Our framework is comprehensive in terms of a dietary assessment app as it detects Food/Non-Food, food categories, and ingredients. Experimental results on the standard food benchmarks and newly contributed Malaysian food dataset for ingredient detection demonstrated superior performance on an integrated set of measures over other methodologies.
Article
The development of an automatic food recognition system has several interesting applications ranging from waste food management, to advertisement, to calorie estimation and daily diet monitoring. Despite the importance of this subject, the number of related studies is still limited. Moreover, the comparison in the literature was currently done over the best-shot performance without considering the most common method of averaging over several trials. This paper surveys the most common deep learning methods used for food classification, it presents the publicly available databases of food, it releases benchmark results for the food classification experiment averaged over 5-trials, and it beats the current best-shot performance experiment reaching the state-of-the-art accuracy of 90.02% on the UEC Food-100 database. The best results have been achieved by the ensemble method averaging the predictions of ResNeXt and DenseNet models. All the experiments are run on the UEC Food-100 database, because it is one of the most used databases, and it is challenging due to the presence of multi-food images, which need to be cropped before processing.
Article
Background There is a large group of deaf-mutes in the world, and sign language is their major communication tool. Therefore, it’s necessary for the deaf-mutes to communicate with hearing-speech people and the hearing-speech people also have needed to understand sign language, which produces a great demand for sign language teaching. Even though there have already been a large number of books for sign language, it is low efficient to learn sign language with books, even teaching videos. To solve this problem, we develop a smartphone-based interactive Chinese sign language teaching systemfor sign language learning. Methods The system provides a learner with some kinds of learning modes and captures the learner’s actions from its front camera of the smartphone. Right now the system provides a vocabulary set with 1000 frequently used words, and the learner can evaluate his/her sign action by subjective or objective comparison. In the mode of word recognition, the users can play any word within the vocabulary and the system will turn the top three retrieved candidates, so it can remind the learners what the sign is. Results This system provides interactive learning for a user to learn sign language high efficiently. The systemadopts an algorithm based on point cloud recognition to evaluate a user’s sign and costs about 700ms inference time for each sample, which meets the real-time requirements. Conclusion This interactive learning system decreases the communication barriers between the deaf-mutes and hearing-speechers.
Article
With the development of industry and technology, the development of the environment and cities has drawn lots of attention. Time series prediction plays a vital role in protecting the environment and improving the level of intelligence and technology in cities, for example prediction of air pollution, water levels, palm oil prices, financial data and grid security. We describe a new algorithm, “Error-output Recurrent Two-layer Extreme Learning Machine” or ERT-ELM: it applied a new recurrent technique, that not only removed the restriction of the prediction horizon problem, but it also used a mean squared error of the current step to update the output weights for the next step. This technique avoided error accumulation in the original recurrent algorithm for multi-step time series prediction. Moreover, the new two-layer structure network improved forecasting compared to conventional single-layer or two-layer ELM models. Quantum behaved Particle Swarm Optimization was used to find suitable ERT-ELM parameters. The ability of our model was assessed on ten data sets—two artificial and eight real-world data sets and performed significantly better than the baselines. Especially for the synthetic data sets, in 1–18 prediction periods, our model achieved mean square errors of 2.64 ×10−3 on the Mackey-Glass data set and 1.49×10−4 on the Lorenz data sets.
Preprint
Full-text available
Food recognition has received more and more attention in the multimedia community for its various real-world applications, such as diet management and self-service restaurants. A large-scale ontology of food images is urgently needed for developing advanced large-scale food recognition algorithms, as well as for providing the benchmark dataset for such algorithms. To encourage further progress in food recognition, we introduce the dataset ISIA Food- 500 with 500 categories from the list in the Wikipedia and 399,726 images, a more comprehensive food dataset that surpasses existing popular benchmark datasets by category coverage and data volume. Furthermore, we propose a stacked global-local attention network, which consists of two sub-networks for food recognition. One subnetwork first utilizes hybrid spatial-channel attention to extract more discriminative features, and then aggregates these multi-scale discriminative features from multiple layers into global-level representation (e.g., texture and shape information about food). The other one generates attentional regions (e.g., ingredient relevant regions) from different regions via cascaded spatial transformers, and further aggregates these multi-scale regional features from different layers into local-level representation. These two types of features are finally fused as comprehensive representation for food recognition. Extensive experiments on ISIA Food-500 and other two popular benchmark datasets demonstrate the effectiveness of our proposed method, and thus can be considered as one strong baseline. The dataset, code and models can be found at http://123.57.42.89/FoodComputing-Dataset/ISIA-Food500.html.
Article
Visual food recognition on mobile devices has attracted increasing attention in recent years due to its roles in individual diet monitoring and social health management and analysis. Existing visual food recognition approaches usually use large server-based networks to achieve high accuracy. However, these networks are not compact enough to be deployed on mobile devices. Even though some compact architectures have been proposed, most of them are unable to obtain the performance of full-size networks. In view of this, this paper proposes a Joint-learning Distilled Network (JDNet) that targets to achieve a high food recognition accuracy of a compact student network by learning from a large teacher network, while retaining a compact network size. Compared to the conventional one-directional knowledge distillation methods, the proposed JDNet has a novel joint-learning framework where the large teacher network and the small student network are trained simultaneously, by leveraging on different intermediate layer features in both network. JDNet introduces a new Multi-Stage Knowledge Distillation (MSKD) for simultaneous student-teacher training at different levels of abstraction. A new Instance Activation Learning (IAL) is also proposed to jointly train student and teacher on instance-level activation map of each training sample. Experimental results show that the trained student model is able to achieve a state-of-the-art Top-1 recognition accuracy on the benchmark UECFood-256 and Food-101 datasets at 84.0% and 91.2%, respectively, and retaining a 4x smaller network size for mobile deployment.
Article
Food is essential for human life and it is fundamental to the human experience. Food-related study may support multifarious applications and services, such as guiding human behavior, improving human health, and understanding the culinary culture. With the rapid development of social networks, mobile networks, and Internet of Things (IoT), people commonly upload, share, and record food images, recipes, cooking videos, and food diaries, leading to large-scale food data. Large-scale food data offers rich knowledge about food and can help tackle many central issues of human society. Therefore, it is time to group several disparate issues related to food computing. Food computing acquires and analyzes heterogenous food data from different sources for perception, recognition, retrieval, recommendation, and monitoring of food. In food computing, computational approaches are applied to address food-related issues in medicine, biology, gastronomy, and agronomy. Both large-scale food data and recent breakthroughs in computer science are transforming the way we analyze food data. Therefore, a series of works has been conducted in the food area, targeting different food-oriented tasks and applications. However, there are very few systematic reviews that shape this area well and provide a comprehensive and in-depth summary of current efforts or detail open problems in this area. In this article, we formalize food computing and present such a comprehensive overview of various emerging concepts, methods, and tasks. We summarize key challenges and future directions ahead for food computing. This is the first comprehensive survey that targets the study of computing technology for the food area and also offers a collection of research studies and technologies to benefit researchers and practitioners working in different food-related fields.
Research
Full-text available
O emprego de imagens está cada vez mais presente nos tempos atuais, especialmente nas áreas médicas e de visão computacional; e segmentação de imagens é uma metodologia largamente empregada para se fazer bom uso dessas imagens. Conceitualmente, segmentação de imagens é parte de quase todos esquemas computacionais para reconhecimento de padrão em imagens como fase de pré-processamento para extrair informações úteis de objetos. Consequentemente, existem muitas pesquisas em torno de algoritmos para segmentação de imagens, mas no presente, uma técnica universal ainda não existe. Posto desta forma, baseando-se na análise da literatura apresentada no relatório, essas metodologias podem ser genericamente classificadas como: abordagens clássicas, Soft Computing ou Inteligência Computacional, ou outras abordagens. Seguindo nessa linha, neste relatório relata-se a experiência acadêmica do estagiário. O relato inclui insights para futuras publicações e projetos; essa experiência pode ser dividida em estudos teóricos e participações em eventos científicos. Desta forma, o objetivo principal assumido pelo estagiário no postdoc reportado foi absorver conhecimento na área de segmentação de imagens com aplicações na área médica com o intuito de aumentar o campo de atuação do estagiário para possíveis trabalhos futuros em modelagem matemática. Como resultado, nesse período de pesquisa foi possível aprender/conhecer uma nova área de pesquisa que tem sido explorado por alguns pesquisadores da área de modelagem matemática aplicada a sistemas biológicos. Em algumas pesquisas, como destacado no relatório, pesquisadores empregam segmentação e processamento de imagens para alimentarem modelos matemáticos, fazendo a estimação de parâmetros desses modelos. Finalmente, conclui-se, através de um estudo da literatura, que a junção de modelagem matemática com técnicas de segmentação de imagens pode criar boas oportunidades de pesquisa e publicações.
Conference Paper
The problem of smoke detection through visual analytics is an open challenging problem. The existing literature has addressed the problem by mainly working on the best feature representation and by exploiting supervised solutions which consider the problem of smoke detection as a binary classification one. Differently from such works, we consider the possibility that in some contexts sensing smokes is a common situation, but want to detect when there are significative fluctuations within this normal situation. In light of such a consideration, we propose an unsupervised solution that leverages on the concept of anomaly detection. Different visual representations have been used together with a temporal smoothing function reduce the effects of noisy measurement. Such temporally smoothed representations are then exploited to learn a robust "normality" model by means of a One-Class Support Vector Machine. A real prototype has been developed and exploited to collect a new dataset which has been considered to evaluate the proposed solution.
Conference Paper
Detection of anomalies in continuous fluids is an open and interesting problem in computer vision and pattern recognition. The problem has many challenges which mainly derive from the highly-deformable shape of the liquid over time. To address these challenges, the existing literature mainly exploited infrared sensors. However, application of such solutions is not only highly-expensive because of the required hardware, but it is often limited to the detection anomalies in the fluid temperature. In this work, we tackle the problem by considering sensors working in the visible range. We leverage on different visual feature representation which are smoothed over time to compensate the temporal changes that might not be due to the presence of anomalies but to noisy measurements,etc. Such temporally smoothed representations are then exploited to learn a robust "normality" model by means of a One-Class Support Vector Machine. A real-world scenario dataset has been collected to evaluate the proposed solution.
Conference Paper
Full-text available
In this paper, we apply a convolutional neural network (CNN) to the tasks of detecting and recognizing food images. Because of the wide diversity of types of food, image recognition of food items is generally very difficult. However, deep learning has been shown recently to be a very powerful image recognition technique, and CNN is a state-of-the-art approach to deep learning. We applied CNN to the tasks of food detection and recognition through parameter optimization. We constructed a dataset of the most frequent food items in a publicly available food-logging system, and used it to evaluate recognition performance. CNN showed significantly higher accuracy than did traditional support-vector-machine-based methods with handcrafted features. In addition, we found that the convolution kernels show that color dominates the feature extraction process. For food image detection, CNN also showed significantly higher accuracy than a conventional method did.
Conference Paper
Full-text available
Assessment of food intake has a wide range of applications in public health and lifestyle related chronic disease management. In this paper, we propose a real-time food recognition platform combined with daily activity and energy expenditure estimation. In the proposed method, food recognition is based on hierarchical classification using multiple visual cues, supported by efficient software implementation suitable for real-time mobile device execution. A Fischer Vector representation together with a set of linear classifiers are used to categorize food intake. Daily energy expenditure estimation is achieved by using the built-in inertial motion sensors of the mobile device. The performance of the vision-based food recognition algorithm is compared to the current state-of-the-art, showing improved accuracy and high computational efficiency suitable for real-time feedback. Detailed user studies have also been performed to demonstrate the practical value of the software environment.
Article
Full-text available
Person re-identification in a non-overlapping multicamera scenario is an open and interesting challenge. While the task can be hardly completed by machines, we, as humans, are inherently able to sample those relevant persons' details that allow us to correctly solve the problem in a fraction of a second. Thus, knowing where a human might fixate to recognize a person is of paramount interest for re-identification. Inspired by the human gazing capabilities, we want to identify the salient regions of a person appearance to tackle the problem. Towards this objective, we introduce the following main contributions. A kernelized graph-based approach is used to detect the salient regions of a person appearance, later used as a weighting tool in the feature extraction process. The proposed person representation combines visual features either considering or not the saliency. These are then exploited in a pairwise-based multiple metric learning framework. Finally, the non-Euclidean metrics that have been separately learned for each feature are fused to re-identify a person. The proposed KErnelized saliency-based Person reidentification through multiple metric LEaRning (KEPLER) has been evaluated on four publicly available benchmark datasets to show its superior performance over state-of-the-art approaches (e.g., it achieves a rank 1 correct recognition rate of 42.41% on the VIPeR dataset).
Conference Paper
Full-text available
It is well-known that people love food. However, an insane diet can cause problems in the general health of the people. Since health is strictly linked to the diet, advanced computer vision tools to recognize food images (e.g. acquired with mobile/wearable cameras), as well as their properties (e.g., calories), can help the diet monitoring by providing useful information to the experts (e.g., nutritionists) to assess the food intake of patients (e.g., to combat obesity). The food recognition is a challenging task since the food is intrinsically deformable and presents high variability in appearance. Image representation plays a fundamental role. To properly study the peculiarities of the image representation in the food application context, a benchmark dataset is needed. These facts motivate the work presented in this paper. In this work we introduce the UNICT-FD889 dataset. It is the first food image dataset composed by over 800 distinct plates of food which can be used as benchmark to design and compare representation models of food images. We exploit the UNICT-FD889 dataset for Near Duplicate Image Retrieval (NDIR) purposes by comparing three standard state-of-the-art image descriptors: Bag of Textons, PRICoLBP and SIFT. Results confirm that both textures and colors are fundamental properties in food representation. Moreover the experiments point out that the Bag of Textons representation obtained considering the color domain is more accurate than the other two approaches for NDIR.
Conference Paper
Full-text available
The classification of food images is an interesting and challenging problem since the high variability of the image content which makes the task difficult for current state-of-the-art classification methods. The image representation to be employed in the classification engine plays an important role. We believe that texture features have been not properly considered in this application domain. This paper points out, through a set of experiments, that textures are fundamental to properly recognize different food items. For this purpose the bag of visual words model (BoW) is employed. Images are processed with a bank of rotation and scale invariant filters and then a small codebook of Textons is built for each food class. The learned class-based Textons are hence collected in a single visual dictionary. The food images are represented as visual words distributions (Bag of Textons) and a Support Vector Machine is used for the classification stage. The experiments demonstrate that the image representation based on Bag of Textons is more accurate than existing (and more complex) approaches in classifying the 61 classes of the Pittsburgh Fast-Food Image Dataset. - See more at: http://iplab.dmi.unict.it/publication/480#sthash.kn7aY8se.dpuf
Conference Paper
Full-text available
Computer-aided food identification and quantity estimation have caught more attention in recent years because of the growing concern of our health. The identification problem is usually defined as an image categorization or classification problem and several researches have been proposed. In this paper, we address the issues of feature descriptors in the food identification problem and introduce a preliminary approach for the quantity estimation using depth information. Sparse coding is utilized in the SIFT and Local binary pattern feature descriptors, and these features combined with gabor and color features are used to represent food items. A multi-label SVM classifier is trained for each feature, and these classifiers are combined with multi-class Adaboost algorithm. For evaluation, 50 categories of worldwide food are used, and each category contains 100 photographs from different sources, such as manually taken or from Internet web albums. An overall accuracy of 68.3% is achieved, and success at top-N candidates achieved 80.6%, 84.8%, and 90.9% accuracy accordingly when N equals 2, 3, and 5, thus making mobile application practical. The experimental results show that the proposed methods greatly improve the performance of original SIFT and LBP feature descriptors. On the other hand, for quantity estimation using depth information, a straight forward method is proposed for certain food, while transparent food ingredients such as pure water and cooked rice are temporarily excluded.
Conference Paper
Full-text available
Recent results indicate that the generic descriptors extracted from the convolutional neural networks are very powerful. This paper adds to the mounting evidence that this is indeed the case. We report on a series of experiments conducted for different recognition tasks using the publicly available code and model of the OverFeat network which was trained to perform object classification on ILSVRC13. We use features extracted from the OverFeat network as a generic image representation to tackle the diverse range of recognition tasks of object image classification, scene recognition, fine grained recognition, attribute detection and image retrieval applied to a diverse set of datasets. We selected these tasks and datasets as they gradually move further away from the original task and data the OverFeat network was trained to solve. Remarkably we report better or competitive results compared to the state-of-the-art in all the tasks on various datasets. The results are achieved using a linear SVM classifier applied to a feature representation of size 4096 extracted from a layer in the net. The results strongly suggest that features obtained from deep learning with convolutional nets should be the primary candidate in most visual classification tasks.
Article
Full-text available
A standard approach to describe an image for classification and retrieval purposes is to extract a set of local patch descriptors, encode them into a high dimensional vector and pool them into an image-level signature. The most common patch encoding strategy consists in quantizing the local descriptors into a finite set of prototypical elements. This leads to the popular Bag-of-Visual words representation. In this work, we propose to use the Fisher Kernel framework as an alternative patch encoding strategy: we describe patches by their deviation from an “universal” generative Gaussian mixture model. This representation, which we call Fisher vector has many advantages: it is efficient to compute, it leads to excellent results even with efficient linear classifiers, and it can be compressed with a minimal loss of accuracy using product quantization. We report experimental results on five standard datasets—PASCAL VOC 2007, Caltech 256, SUN 397, ILSVRC 2010 and ImageNet10K—with up to 9M images and 10K classes, showing that the FV framework is a state-of-the-art patch encoding technique.
Article
Full-text available
There is growing interest in the use of information communication technologies to treat obesity. An intervention delivered by smartphone could be a convenient, potentially cost-effective, and wide-reaching weight management strategy. Although there have been studies of texting-based interventions and smartphone applications (apps) used as adjuncts to other treatments, there are currently no randomized controlled trials (RCT) of a stand-alone smartphone application for weight loss that focuses primarily on self-monitoring of diet and physical activity. The aim of this pilot study was to collect acceptability and feasibility outcomes of a self-monitoring weight management intervention delivered by a smartphone app, compared to a website and paper diary. A sample of 128 overweight volunteers were randomized to receive a weight management intervention delivered by smartphone app, website, or paper diary. The smartphone app intervention, My Meal Mate (MMM), was developed by the research team using an evidence-based behavioral approach. The app incorporates goal setting, self-monitoring of diet and activity, and feedback via weekly text message. The website group used an existing commercially available slimming website from a company called Weight Loss Resources who also provided the paper diaries. The comparator groups delivered a similar self-monitoring intervention to the app, but by different modes of delivery. Participants were recruited by email, intranet, newsletters, and posters from large local employers. Trial duration was 6 months. The intervention and comparator groups were self-directed with no ongoing human input from the research team. The only face-to-face components were at baseline enrollment and brief follow-up sessions at 6 weeks and 6 months to take anthropometric measures and administer questionnaires. Trial retention was 40/43 (93%) in the smartphone group, 19/42 (55%) in the website group, and 20/43 (53%) in the diary group at 6 months. Adherence was statistically significantly higher in the smartphone group with a mean of 92 days (SD 67) of dietary recording compared with 35 days (SD 44) in the website group and 29 days (SD 39) in the diary group (P<.001). Self-monitoring declined over time in all groups. In an intention-to-treat analysis using baseline observation carried forward for missing data, mean weight change at 6 months was -4.6 kg (95% CI -6.2 to -3.0) in the smartphone app group, -2.9 kg (95% CI -4.7 to -1.1) in the diary group, and -1.3 kg (95% CI -2.7 to 0.1) in the website group. BMI change at 6 months was -1.6 kg/m(2) (95% CI -2.2 to -1.1) in the smartphone group, -1.0 kg/m(2) (95% CI -1.6 to -0.4) in the diary group, and -0.5 kg/m(2) (95% CI -0.9 to 0.0) in the website group. Change in body fat was -1.3% (95% CI -1.7 to -0.8) in the smartphone group, -0.9% (95% CI -1.5 to -0.4) in the diary group, and -0.5% (95% CI -0.9 to 0.0) in the website group. The MMM app is an acceptable and feasible weight loss intervention and a full RCT of this approach is warranted. ClinicalTrials.gov NCT01744535; http://clinicaltrials.gov/ct2/show/NCT01744535 (Archived by WebCite at http://www.webcitation.org/6FEtc3PVB).
Article
Full-text available
It is proved that ifY ⊂X are metric spaces withY havingn≧2 points then any mapf fromY into a Banach spaceZ can be extended to a map [^(f)]\hat f fromX intoZ so that || [^(f)] ||lip \leqq c log n|| f ||lip \left\| {\hat f} \right\|_{lip} \leqq c log n\left\| f \right\|_{lip} wherec is an absolute constant. A related result is obtained for the case whereX is assumed to be a finite-dimensional normed space andY is an arbitrary subset ofX.
Article
Full-text available
Image category recognition is important to access visual information on the level of objects and scene types. So far, intensity-based descriptors have been widely used for feature extraction at salient points. To increase illumination invariance and discriminative power, color descriptors have been proposed. Because many different descriptors exist, a structured overview is required of color invariant descriptors in the context of image category recognition. Therefore, this paper studies the invariance properties and the distinctiveness of color descriptors (software to compute the color descriptors from this paper is available from http://www.colordescriptors.com) in a structured way. The analytical invariance properties of color descriptors are explored, using a taxonomy based on invariance properties with respect to photometric transformations, and tested experimentally using a data set with known illumination conditions. In addition, the distinctiveness of color descriptors is assessed experimentally using two benchmarks, one from the image domain and one from the video domain. From the theoretical and experimental results, it can be derived that invariance to light intensity changes and light color changes affects category recognition. The results further reveal that, for light intensity shifts, the usefulness of invariance is category-specific. Overall, when choosing a single descriptor and no prior knowledge about the data set and object and scene categories is available, the OpponentSIFT is recommended. Furthermore, a combined set of color descriptors outperforms intensity-based SIFT and improves category recognition by 8 percent on the PASCAL VOC 2007 and by 7 percent on the Mediamill Challenge.
Conference Paper
Full-text available
State-of-the-art image classification methods require an intensive learning/training stage (using SVM, Boosting, etc.) In contrast, non-parametric nearest-neighbor (NN) based image classifiers require no training time and have other favorable properties. However, the large performance gap between these two families of approaches rendered NN-based image classifiers useless. We claim that the effectiveness of non-parametric NN-based image classification has been considerably undervalued. We argue that two practices commonly used in image classification methods, have led to the inferior performance of NN-based image classifiers: (i) Quantization of local image descriptors (used to generate "bags-of-words ", codebooks). (ii) Computation of 'image-to-image' distance, instead of 'image-to-class' distance. We propose a trivial NN-based classifier - NBNN, (Naive-Bayes nearest-neighbor), which employs NN- distances in the space of the local image descriptors (and not in the space of images). NBNN computes direct 'image- to-class' distances without descriptor quantization. We further show that under the Naive-Bayes assumption, the theoretically optimal image classifier can be accurately approximated by NBNN. Although NBNN is extremely simple, efficient, and requires no learning/training phase, its performance ranks among the top leading learning-based image classifiers. Empirical comparisons are shown on several challenging databases (Caltech-101 ,Caltech-256 and Graz-01).
Conference Paper
Full-text available
Food recognition is difficult because food items are de-formable objects that exhibit significant variations in appearance. We believe the key to recognizing food is to exploit the spatial relationships between different ingredients (such as meat and bread in a sandwich). We propose a new representation for food items that calculates pairwise statistics between local features computed over a soft pixel-level segmentation of the image into eight ingredient types. We accumulate these statistics in a multi-dimensional histogram, which is then used as a feature vector for a discriminative classifier. Our experiments show that the proposed representation is significantly more accurate at identifying food than existing methods.
Conference Paper
Full-text available
The Fisher kernel (FK) is a generic framework which com- bines the benefits of generative and discriminative approaches. In the context of image classification the FK was shown to extend the popular bag-of-visual-words (BOV) by going beyond count statistics. However, in practice, this enriched representation has not yet shown its superiority over the BOV. In the first part we show that with several well-motivated modifications over the original framework we can boost the accuracy of the FK. On PASCAL VOC 2007 we increase the Average Precision (AP) from 47.9% to 58.3%. Similarly, we demonstrate state-of-the-art accuracy on CalTech 256. A major advantage is that these results are obtained us- ing only SIFT descriptors and costless linear classifiers. Equipped with this representation, we can now explore image classification on a larger scale. In the second part, as an application, we compare two abundant re- sources of labeled images to learn classifiers: ImageNet and Flickr groups. In an evaluation involving hundreds of thousands of training images we show that classifiers learned on Flickr groups perform surprisingly well (although they were not intended for this purpose) and that they can complement classifiers learned on more carefully annotated datasets.
Conference Paper
Full-text available
Machine learning is commonly used to improve ranked re- trieval systems. Due to computational diculties, few learn- ing techniques have been developed to directly optimize for mean average precision (MAP), despite its widespread use in evaluating such systems. Existing approaches optimiz- ing MAP either do not find a globally optimal solution, or are computationally expensive. In contrast, we present a general SVM learning algorithm that eciently finds a globally optimal solution to a straightforward relaxation of MAP. We evaluate our approach using the TREC 9 and TREC 10 Web Track corpora (WT10g), comparing against SVMs optimized for accuracy and ROCArea. In most cases we show our method to produce statistically significant im- provements in MAP scores.
Article
It is clear that the learning speed of feedforward neural networks is in general far slower than required and it has been a major bottleneck in their applications for past decades. Two key reasons behind may be: (1) the slow gradient-based learning algorithms are extensively used to train neural networks, and (2) all the parameters of the networks are tuned iteratively by using such learning algorithms. Unlike these conventional implementations, this paper proposes a new learning algorithm called extreme learning machine (ELM) for single-hidden layer feedforward neural networks (SLFNs) which randomly chooses hidden nodes and analytically determines the output weights of SLFNs. In theory, this algorithm tends to provide good generalization performance at extremely fast learning speed. The experimental results based on a few artificial and real benchmark function approximation and classification problems including very large complex applications show that the new algorithm can produce good generalization performance in most cases and can learn thousands of times faster than conventional popular learning algorithms for feedforward neural networks.1
Chapter
This chapter describes some of the most important architectures and algorithms for committee machines. We discuss three reasons for using committee machines. The first is that a committee can achieve a test set performance unobtainable by a single committee member. As typical representative approaches, we describe simple averaging, bagging, and boosting. Second, with committee machines, one obtains modular solutions, which is advantageous in many applications. The prime example given here is the mixture of experts (ME) approach, the goal of which is to autonomously break up a complex prediction task into subtasks which are modeled by the individual committee members. The third reason for using committee machines is a reduction in computational complexity. In the presented Bayesian committee machine, the training data set is partitioned into several smaller data sets, and the different committee members are trained on the different sets. Their predictions are then combined using a covariance-based weighting scheme. The computational complexity of the Bayesian committee machine approach grows only linearly with the size of the training data set, independent of the learning systems used as committee members.
Article
Image category recognition is important to access visual information on the level of objects and scene types. So far, intensity-based descriptors have been widely used for feature extraction at salient points. To increase illumination invariance and discriminative power, color descriptors have been proposed. Because many different descriptors exist, a structured overview is required of color invariant descriptors in the context of image category recognition. Therefore, this paper studies the invariance properties and the distinctiveness of color descriptors (software to compute the color descriptors from this paper is available from http://www.colordescriptors.com) in a structured way. The analytical invariance properties of color descriptors are explored, using a taxonomy based on invariance properties with respect to photometric transformations, and tested experimentally using a data set with known illumination conditions. In addition, the distinctiveness of color descriptors is assessed experimentally using two benchmarks, one from the image domain and one from the video domain. From the theoretical and experimental results, it can be derived that invariance to light intensity changes and light color changes affects category recognition. The results further reveal that, for light intensity shifts, the usefulness of invariance is category-specific. Overall, when choosing a single descriptor and no prior knowledge about the data set and object and scene categories is available, the OpponentSIFT is recommended. Furthermore, a combined set of color descriptors outperforms intensity-based SIFT and improves category recognition by 8 percent on the PASCAL VOC 2007 and by 7 percent on the Mediamill Challenge.
Conference Paper
Nowadays obesity has become one of the most common diseases in many countries. To face it, obese people should constantly monitor their daily meals both for self-limitation and to provide useful statistics for their dietitians. This has led to the recent rise in popularity of food diary applications on mobile devices, where the users can manually annotate their food intake. To overcome the tediousness of such a process, several works on automatic image food recognition have been proposed, typically based on texture features extraction and classification. In this work, we analyze different texture filter banks to evaluate their performances and propose a method to automatically aggregate the best features for food classification purposes. Particular emphasis is put in the computational burden of the system to match the limited capabilities of mobile devices.
Conference Paper
In this paper we address the problem of automatically recognizing pictured dishes. To this end, we introduce a novel method to mine discriminative parts using Random Forests (rf), which allows us to mine for parts simultaneously for all classes and to share knowledge among them. To improve efficiency of mining and classification, we only consider patches that are aligned with image superpixels, which we call components. To measure the performance of our rf component mining for food recognition, we introduce a novel and challenging dataset of 101 food categories, with 101’000 images. With an average accuracy of 50.76%, our model outperforms alternative classification methods except for cnn, including svm classification on Improved Fisher Vectors and existing discriminative part-mining algorithms by 11.88% and 8.13%, respectively. On the challenging mit-Indoor dataset, our method compares nicely to other s-o-a component-based classification methods.
Conference Paper
In the demo, we demonstrate a mobile food recognition system with Fisher Vector and liner one-vs-rest SVMs which enables us to record our food habits easily. In the experiments with 100 kinds of food categories, we have achieved the 79.2% classification rate for the top 5 category candidates when the ground-truth bounding boxes are given. The prototype system is open to the public as an Android-based smartphone application.
Article
The pervasiveness of mobile cameras has resulted in a dramatic increase in food photos, which are pictures reflecting what people eat. In this paper, we study how taking pictures of what we eat in restaurants can be used for the purpose of automating food journaling. We propose to leverage the context of where the picture was taken, with additional information about the restaurant, available online, coupled with state-of-the-art computer vision techniques to recognize the food being consumed. To this end, we demonstrate image-based recognition of foods eaten in restaurants by training a classifier with images from restaurant's online menu databases. We evaluate the performance of our system in unconstrained, real-world settings with food images taken in 10 restaurants across 5 different types of food (American, Indian, Italian, Mexican and Thai).
Article
Person re-identification in a non-overlapping multicamera scenario is an open challenge in computer vision because of the large changes in appearances caused by variations in viewing angle, lighting, background clutter, and occlusion over multiple cameras. As a result of these variations, features describing the same person get transformed between cameras. To model the transformation of features, the feature space is nonlinearly warped to get the “warp functions”. The warp functions between two instances of the same target form the set of feasible warp functions while those between instances of different targets form the set of infeasible warp functions. In this work, we build upon the observation that feature transformations between cameras lie in a nonlinear function space of all possible feature transformations. The space consisting of all the feasible and infeasible warp functions is the warp function space (WFS). We propose to learn a discriminating surface separating these two sets of warp functions in the WFS and to re-identify persons by classifying a test warp function as feasible or infeasible. Towards this objective, a Random Forest (RF) classifier is employed which effectively chooses the warp function components according to their importance in separating the feasible and the infeasible warp functions in the WFS. Extensive experiments on five datasets are carried out to show the superior performance of the proposed approach over state-of-the-art person re-identification methods. We show that our approach outperforms all other methods when large illumination variations are considered. At the same time it has been shown that our method reaches the best average performance over multiple combinations of the datasets, thus, showing that our method is not designed only to address a specific challenge posed by a particular dataset.
Article
Extreme learning machines (ELMs) basically give answers to two fundamental learning problems: (1) Can fundamentals of learning (i.e., feature learning, clustering, regression and classification) be made without tuning hidden neurons (including biological neurons) even when the output shapes and function modeling of these neurons are unknown? (2) Does there exist unified framework for feedforward neural networks and feature space methods? ELMs that have built some tangible links between machine learning techniques and biological learning mechanisms have recently attracted increasing attention of researchers in widespread research areas. This paper provides an insight into ELMs in three aspects, viz: random neurons, random features and kernels. This paper also shows that in theory ELMs (with the same kernels) tend to outperform support vector machine and its variants in both regression and classification applications with much easier implementation.
Article
FoodLog is a multimedia food-recording tool that offers a novel method for recording daily food intake primarily for healthcare purposes. Its novel use of image-processing techniques presents significant potential for the development of new healthcare monitoring apps.
Article
The task of re-identifying a person that moves across cameras fields-of-view is a challenge to the community known as the person re-identification problem. State-of-the art approaches are either based on direct modeling and matching of the human appearance or on machine learning-based techniques. In this work we introduce a novel approach that studies densely localized image dissimilarities in a low dimensional space and uses those to re-identify between persons in a supervised classification framework. To achieve the goal: i) we compute the localized image dissimilarity between a pair of images; ii) we learn the lower dimensional space of such localized image dissimilarities, known as the “local eigen-dissimilarities” (LEDs) space; iii) we train a binary classifier to discriminate between LEDs computed for a positive pair (images are for a same person) from the ones computed for a negative pair (images are for different persons). We show the competitive performance of our approach on two publicly available benchmark datasets.
Article
Designing effective features is a fundamental problem in computer vision. However, it is usually difficult to achieve a great tradeoff between discriminative power and robustness. Previous works shown that spatial co-occurrence can boost the discriminative power of features. However the current existing co-occurrence features are taking few considerations to the robustness and hence suffering from sensitivity to geometric and photometric variations. In this work, we study the Transform Invariance (TI) of co-occurrence features. Concretely we formally introduce a Pairwise Transform Invariance (PTI) principle, and then propose a novel Pairwise Rotation Invariant Co-occurrence Local Binary Pattern (PRICoLBP) feature, and further extend it to incorporate multi-scale, multi-orientation, and multi-channel information. Different from other LBP variants, PRICoLBP can not only capture the spatial context co-occurrence information effectively, but also possess rotation invariance. We evaluate PRICoLBP comprehensively on nine benchmark data sets from five different perspectives, e.g., encoding strategy, rotation invariance, the number of templates, speed, and discriminative power compared to other LBP variants. Furthermore we apply PRICoLBP to six different but related applications—texture, material, flower, leaf, food, and scene classification, and demonstrate that PRICoLBP is efficient, effective, and of a well-balanced tradeoff between the discriminative power and robustness.
Conference Paper
In this paper, we present a simple yet efficient and effective multi-resolution approach to gray-scale and rotation invariant texture classification. Given a texture image, we at first convolve it with J Gabor filters sharing the same parameters except the parameter of orientation. Then by binarizing the obtained responses, we can get J bits at each location. Then, each location can be assigned a unique integer, namely “rotation invariant binary Gabor pattern (BGPri)”, formed from J bits associated with it using some rule. The classification is based on the image's histogram of its BGPris at multiple scales. Using BGPri, there is no need for a pre-training step to learn a texton dictionary, as required in methods based on clustering such as MR8. Extensive experiments conducted on the CUReT database demonstrate the overall superiority of BGPri over the other state-of-the-art texture representation methods evaluated. The Matlab source codes are publicly available at http://sse.tongji.edu.cn/linzhang/IQA/BGP/BGP.htm.
Conference Paper
We propose a mobile food recognition system the poses of which are estimating calorie and nutritious of foods and recording a user's eating habits. Since all the processes on image recognition performed on a smart-phone, the system does not need to send images to a server and runs on an ordinary smartphone in a real-time way. To recognize food items, a user draws bounding boxes by touching the screen first, and then the system starts food item recognition within the indicated bounding boxes. To recognize them more accurately, we segment each food item region by GrubCut, extract a color histogram and SURF-based bag-of-features, and finally classify it into one of the fifty food categories with linear SVM and fast 2 kernel. In addition, the system estimates the direction of food regions where the higher SVM output score is expected to be obtained, show it as an arrow on the screen in order to ask a user to move a smartphone camera. This recognition process is performed repeatedly about once a second. We implemented this system as an Android smartphone application so as to use multiple CPU cores effectively for real-time recognition. In the experiments, we have achieved the 81.55% classification rate for the top 5 category candidates when the ground-truth bounding boxes are given. In addition, we obtained positive evaluation by user study compared to the food recording system without object recognition.
Conference Paper
In this paper, we propose a method to recognize food images which include multiple food items considering co-occurrence statistics of food items. The proposed method employs a manifold ranking method which has been applied to image retrieval successfully in the literature. In the experiments, we prepared co-occurrence matrices of 100 food items using various kinds of data sources including Web texts, Web food blogs and our own food database, and evaluated the final results obtained by applying manifold ranking. As results, it has been proved that co-occurrence statistics obtained from a food photo database is very helpful to improve the classification rate within the top ten candidates.
Conference Paper
In this paper, we propose a two-step method to recognize multiple-food images by detecting candidate regions with several methods and classifying them with various kinds of features. In the first step, we detect several candidate regions by fusing outputs of several region detectors including Felzenszwalb's deformable part model (DPM) [1], a circle detector and the JSEG region segmentation. In the second step, we apply a feature-fusion-based food recognition method for bounding boxes of the candidate regions with various kinds of visual features including bag-of-features of SIFT and CSIFT with spatial pyramid (SP-BoF), histogram of oriented gradient (HoG), and Gabor texture features. In the experiments, we estimated ten food candidates for multiple-food images in the descending order of the confidence scores. As results, we have achieved the 55.8% classification rate, which improved the baseline result in case of using only DPM by 14.3 points, for a multiple-food image data set. This demonstrates that the proposed two-step method is effective for recognition of multiple-food images.
Article
With the increasing amount of data generated in geoscience research, it becomes critical to describe data sets in meaningful ways. A large number of described data sets are described using XML metadata, which has proved a useful means of expressing data characteristics. An ontological representation is another way of representing data sets with the benefit of providing rich semantics, convenient linkage to other data sets, and good interoperability with other data. This study represents geoscience data sets as an ontology based on an existing metadata description and on the nature of the data set. It takes the case of Vortex2 data, a regional weather forecast data set collected in Summer 2010, to showcase how forecast data can be represented in ontology by using the existing metadata information. It supplies another type of representation of the data set with added semantics and potential functionalities compared to the previous metadata representation.
Article
Extreme learning machine (ELM) has been an important research topic over the last decade due to its high efficiency, easy-implementation, unification of classification and regression, and unification of binary and multi-class learning tasks. Though integrating these advantages, existing ELM algorithms pay little attention to optimizing the choice of kernels, which is indeed crucial to the performance of ELM in applications. More importantly, there is the lack of a general framework for ELM to integrate multiple heterogeneous data sources for classification. In this paper, we propose a general learning framework, termed multiple kernel extreme learning machines (MK-ELM), to address the above two issues. In the proposed MK-ELM, the optimal kernel combination weights and the structural parameters of ELM are jointly optimized. Following recent research on support vector machine (SVM) based MKL algorithms, we first design a sparse MK-ELM algorithm by imposing an ℓ1-norm constraint on the kernel combination weights, and then extend it to a non-sparse scenario by substituting the ℓ1-norm constraint with an ℓp-norm (p>1) constraint. After that, a radius-incorporated MK-ELM algorithm which incorporates the radius of the minimum enclosing ball (MEB) is introduced. Three efficient optimization algorithms are proposed to solve the corresponding kernel learning problems. Comprehensive experiments have been conducted on Protein, Oxford Flower17, Caltech101 and Alzheimer's disease data sets to evaluate the performance of the proposed algorithms in terms of classification accuracy and computational efficiency. As the experimental results indicate, our proposed algorithms can achieve comparable or even better classification performance than state-of-the-art MKL algorithms, while incurring much less computational cost.
Article
One of the principal causes for image quality degradation is blur. This frequent phenomenon is usually a result of misfocused optics or camera motion, and it is very difficult to undo. Beyond the impaired visual quality, blurring causes problems to computer vision algorithms. In this paper, we present a simple yet powerful image descriptor, which is robust against the most common image blurs. The proposed method is based on quantizing the phase information of the local Fourier transform and it can be used to characterize the underlying image texture. We show how to construct several variants of our descriptor by varying the technique for local phase estimation and utilizing the proposed data decorrelation scheme. The descriptors are assessed in texture and face recognition experiments, and the results are compared with several state-of-the-art methods. The difference to the baseline is considerable in the case of blurred images, but also with sharp images our method gives a highly competitive performance.
Article
Methods for conducting dietary assessment in the United States date back to the early twentieth century. Methods of assessment encompassed dietary records, written and spoken dietary recalls, FFQ using pencil and paper and more recently computer and internet applications. Emerging innovations involve camera and mobile telephone technology to capture food and meal images. This paper describes six projects sponsored by the United States National Institutes of Health that use digital methods to improve food records and two mobile phone applications using crowdsourcing. The techniques under development show promise for improving accuracy of food records.