Conference PaperPDF Available

Instance-Aware Semantic Segmentation for Food Calorie Estimation using Mask R-CNN

Authors:

Abstract and Figures

Knowing the number of calorie content in the food we consume can help in maintaining body health. By fulfilling the basic calorie need well, it will produce many positive effects to the body, including controlling the ideal body weight and becoming an adequate source of energy for physical activity. Conversely, people who do not care about their calorie needs will face various health problems, including obesity and worsening degenerative diseases such as diabetes or high blood pressure. Calculating the actual number of calories digitally from food requires the parameters of area, volume, and mass of the food. Some previous studies in the field of computer vision have been carried out to get a constant number of calories based on food types and not based on actual food volume measurements. In this research, a system will be developed using a computer vision approach that can be used to calculate the number of food calories automatically based on the size of the food volume using the Deep Learning Mask Region-based Convolutional Neural Network (R-CNN) algorithm. The segmentation technique uses the instance-aware semantic segmentation approach, which is to identify each pixel from instance of objects for each object found in a food image. This work uses the concept of instance-aware data labeling or segmentation detection that distinguishes each instances in a similar class, where this model will be used to recognize each different food object instantaneously in one class so that the number of calories of each food object can be obtained precisely. The expected benefit of the results of this research is to help someone get information about the size of food calories according to the calorie needs of the body with the mean Average Precision (mAP) level obtained at 89.4% and the percentage accuracy in calories calculated at 97.48%.
Content may be subject to copyright.
Instance-Aware Semantic Segmentation for Food
Calorie Estimation using Mask R-CNN
Reza Dea Yogaswara
Dept. of Electrical Engineering
Institut Teknologi Sepuluh
Nopember
Surabaya, Indonesia
reza.yoga@gmail.com
Eko Mulyanto Yuniarno
Dept. of Computer Engineering
Institut Teknologi Sepuluh
Nopember
Surabaya, Indonesia
ekomulyanto@ee.its.ac.id
Adhi Dharma Wibawa
Dept. of Computer Engineering
Institut Teknologi Sepuluh
Nopember
Surabaya, Indonesia
adhiosa@te.its.ac.id
AbstractKnowing the number of calorie content in the
food we consume can help in maintaining body health. By fulfilling
the basic calorie need well, it will produce many positive effects to
the body, including controlling the ideal body weight and becoming
an adequate source of energy for physical activity. Conversely,
people who do not care about their calorie needs will face various
health problems, including obesity and worsening degenerative
diseases such as diabetes or high blood pressure. Calculating the
actual number of calories digitally from food requires the parameters
of area, volume, and mass of the food. Some previous studies in the
field of computer vision have been carried out to get a constant
number of calories based on food types and not based on actual food
volume measurements. In this research, a system will be developed
using a computer vision approach that can be used to calculate the
number of food calories automatically based on the size of the food
volume using the Deep Learning Mask Region-based Convolutional
Neural Network (R-CNN) algorithm. The segmentation technique
uses the instance-aware semantic segmentation approach, which is
to identify each pixel from instance of objects for each object found
in a food image. This work uses the concept of instance-aware data
labeling or segmentation detection that distinguishes each instances
in a similar class, where this model will be used to recognize each
different food object instantaneously in one class so that the number
of calories of each food object can be obtained precisely. The
expected benefit of the results of this research is to help someone
get information about the size of food calories according to the
calorie needs of the body with the mean Average Precision (mAP)
level obtained at 89.4% and the percentage accuracy in calories
calculated at 97.48%.
KeywordsComputer Vision, Deep Learning, Food Calories,
Instance Segmentation, Mask R-CNN
I. INTRODUCTION
Everyone has different caloric needs, not only influenced
by age, but there are several other factors that affect the level
of a person's calorie needs, such as gender, weight or physical
activity, and others. Meet the calorie needs well will have a
lot of positive effects on the body. Among them is controlling
the ideal body weight and becoming an adequate source of
energy for activities. Otherwise, people who do not care
about their calorie needs will be faced with various health
problems. Therefore, it is crucial to get used to calculate the
number of calories to be consumed through food consumed
every day.
Until now, there is still no tool that can be used to measure
food calorie automatically with high precision based on the
volume of the food object. To get the right number of food
calories, it is necessary to know the size of the food volume
first, where it will be difficult and ineffective if we have to
use a measuring instrument manually. In previous research
that focused on the field of computer vision have been carried
out using the technique of object classification or object
detection only to obtain food calorie content constantly based
on food types only and have not measured actual food volume
so that the actual calorie content is not obtained. In this
research, a system with a computer vision approach will be
developed that can calculate the food calorie content based
on the volume of food using the Mask Region-based
Convolutional Neural Network (R-CNN) method.
The image is obtained by taking pictures of food directly
using a smartphone camera, then segmenting it based on the
type or category of food labeled or ground truth. The results
of food segmentation are then used to measure the calorie
levels produced concerning the food calorie reference table.
This research is expected to contribute in helping
someone to get a measure of the number of food calories with
the actual volume of food to be consumed and according to
the body's calorie needs so that a healthy body and ideal body
weight are obtained.
II. RELATED WORKS
Some food recognition techniques used to create features
using a manual process from domain knowledge. Often it is
called handcrafted feature engineering, and this type of
feature engineering has some drawbacks, where this
technique is complicated to use for recognizing an object
because it requires expert knowledge, very time-consuming,
and can be a tedious process. Currently, a method has been
developed in pattern recognition and computer vision using
deep learning so that the manual feature engineering
approach can be replaced by automated feature engineering
where this technique can be done very quickly and can
recognize the domain of a pattern better, fast, and
measurable.
Other studies regarding the introduction of traditional
Thai food has been done using Transfer Learning and
Convolutional Neural Network (CNN). Using 40 food
categories with 2,500 food image data [1]. The author uses
the Inception V3 model to get the weight that has been trained
for combination with the model that will be trained. In this
research, the author only using the technique of classification
and introduction of food using images, so that the predicted
caloric value is also static or constant and adjusted to the class
of objects predicted results and obtained an accuracy of
75.2%.
Research into the food recognition using deep learning
has also been done. In 2017, [2] has used a food object
detection approach using Faster Region-based Convolutional
Neural Network (R-CNN), where this technique is very
suitable only for object detection problems because
computers can be knowing where the food object is in the
image, and the research using multi-label food in one dish. In
this research, the mean average precision (mAP) was 90.7%.
However, the number of calories produced from each food
object is constant because there are no parameters that can be
used by the author to measure the mass of each detected food
object.
Research related to the recognition of traditional
Malaysian food has been carried out [3]. Research on calorie
calculation applications using two layers of Neural Network
and sigmoid activation function with 250 food images
dataset. In this study, the authors used a calorie constant
combined with the results of the classification task so that the
caloric value obtained was still static or constant and got an
overall correction percentage of 80%.
In 2016 a research was conducted on the introduction of
food using Morphological Operations with Watershed
Segmentation [4], in this research the authors used
morphological dilation and erosion techniques, where the
method still requires manual adjustment of the value of
dilation and erosion threshold in order to obtain the right
segmentation on food images, manual edge detection
techniques are also used by the author to get the right
segmentation results.
From the above researches, further research is needed
regarding food calorie counter from food image using broad
segmentation parameters and can distinguish between
detected instance of objects and provide solutions to multi-
label classification problems where it is possible to have more
than one object in one food image, more than one similar
object in one food image, and the computer can distinguish
it.
III. METHODS
In this research, the author will develop a system using
the computer vision approach that can calculate the food
calorie content based on the size of the food volume using the
Mask Region-based Convolutional Neural Network (R-CNN)
algorithm.
Mask R-CNN is a state-of-the-art method developed by
Kaiming He [5], a computer vision researcher from Facebook
AI Research (FAIR) and obtained the Best Paper Award at
the IEEE International Conference on Computer Vision
(ICCV) conference in 2017. This method or algorithm is used
to solve instance segmentation problems [5] in computer
vision and have approached human capabilities in visual
perception tasks.
Instance-aware semantic segmentation or instance
segmentation is separating different instance of objects in an
image or video. For example, in one food image, there are
two objects of white bread, then the object will be segmented
using two masks and two bounding boxes with different
identities.
This technique has the task of identifying objects at the
pixel level and associating each pixel with the physical
instance of an object [6]. Compared to assignments in other
areas of computer vision, this task is the most challenging
task. Because there are more jobs and stages in detecting and
segmenting objects, first, the system will carry out the
classification process along with the improvement of object
detection tasks. Moreover, the second is to recognize pixels
by pixel and understand the types of objects contained within
each pixel.
The Mask R-CNN successfully outperformed two
previous algorithms in instance segmentation tasks. The two
algorithms are the Multi-task Network Cascade (MNC) [7],
the winner of the Microsoft Common Objects in Context
(COCO) competition in 2015 and Fully Convolutional
Instance-aware Semantic Segmentation (FCIS) [8], the
winner of the 2016 Microsoft COCO competition.
This research uses a flow diagram as shown in Fig. 1
below:
Fig. 1. Research flow diagram
A. Determine the Object Classes
This research begins by determining the class of objects
to be calculated for the number of calories in one serving.
Image segmentation is obtained from the prediction of pixel
or pixel dense prediction on food images according to each
class of objects that have been defined on the ground truth.
Segmentation is used to calculate the surface area of each
predicted object class. The height of each object class is
obtained from the measurement results stored as multiplier
constants of the area estimation of the segmentation results to
get the volume of the object.
Each object class has also calculated its density so that the
mass of each object class is obtained and calorie calculation
is then carried out based on the calorie table reference. The
class of objects used in this research was plate, where the
plate is used as a calibrator for the area of the food class,
white bread, braised spiced tempeh, fried tempeh, braised
spiced tofu, and fried tofu.
B. Data Acquisition
Data acquisition is done by taking or capturing digital
food images with a combination of dishes by the class of
objects that have been defined along with measurements of
the height, area, and mass of the object used to validate the
results of segmentation. The stages of data acquisition in this
research are:
a) Image data acquisition: Taking food image data
with various combinations of object classes. Food image
data is taken perpendicular to the angle of ± 90 degrees to
the surface of the food using an iPhone 6s smartphone
camera that has a resolution of twelve million megapixels
with dimensions of 3024px x 4032px. After the image is
obtained, the size is reduced to 768px x 768px to suppress
the computational resources needed during the training
process in each training batch.
Fig. 2. The image of food that has been taken through the iPhone 6s
smartphone camera then reduces its dimensions to 768px x 768px
b) Measuring the height of food objects: This stage is
done to get the height of a food object used for multiplying
constants resulting from the calculation of the area of
segmentation results so that the volume of the object of
each food is obtained.
c) Measuring the area and mass of food objects: The
actual area of a food object is used to validate the estimated
area of the results of segmentation of food objects so that
the accuracy of the results of object segmentation can be
measured. Then the actual plate area size is used as a
comparison constant that is used to calculate the actual
area of a food object from the results of segmentation.
Fig. 3. Measurement of a mass class of white bread and fried tempeh
objects using ek3650 Camry digital weighing scale
TABLE I. RESULT OF MEASUREMENT, HIGH, MASS, AND DENSITY OF
OBJECT CLASS
Food Name
Area
(cm2)
Height
(cm)
Mass
(g)
Density
(g/cm3)
Fried Tofu
22,5
1,4
24
0.735
Fried Tempeh
16
1
14
0.872
Braised Spiced Tofu
18
1.5
33
1.076
Braised Spiced Tempeh
26,5
1.3
37
0.947
White Bread
133
1.3
38
0.237
In Table I above is the result of measuring the area, height,
mass, and density of food objects that will be predicted
the caloric content based on the area of segmentation
obtained in the test results. The measurement results of
area and mass are used as values to validate the results of
segmentation obtained, while height and density are used
multipliers to obtain the volume and mass of food objects.
C. Class Object Labeling
Labeling object classes are used to get images with pixel
labels that represent each class of the object.
Fig. 4. Labeling uses the pixel-dense labeling technique
After labeling the food image data, then perform
segmentation visualization to see how precise the
segmentation mask data on the ground truth is to produce a
segmentation model that can recognize objects in each food
dish.
Fig. 5. Observation of the segmentation or mask of food object classes on
ground truth
D. Splitting the Dataset
After doing object class labeling, a dataset is divided into
three parts, namely training set, development set or hold-out
cross-validation, and a test set to get the best performance
segmentation model from the training results.
Fig. 6. Splitting dataset distribution into a training set, development or
validation set, and test set
Braised Spiced Tempeh
Fried Tofu
Fried Tempeh
Input
E. Segmentation Model Training and Validation
The Mask R-CNN has an architecture or learning process
flow as shown in Figure 7.
Fig. 7. Mask R-CNN Architecture in the training and inference process
The Mask R-CNN architecture is consists of two parts,
namely the first part produces the Region Proposal Network
(RPN) to find out where the possibility of food objects in
image input using the ResNet 50 or ResNet 101 [9] backbone
is possible to use sharing parameters on convolution results.
The second part predicts the object class based on ground
truth and refining the bounding box for object localization
then generates a mask of segmentation with pixel level based
on the region of the proposal generated in the first part.
The mask segmentation branch using Fully Convolutional
Network (FCN) [10] that is applied to every Region of
Interest (RoI), predicts the mask of segmentation by a pixel-
to-pixel method. FCN uses the per-pixel softmax loss
function. Formally the loss function is calculated by summing
all the loss predictions of each pixel which can be expressed
in the following equation:
!
"
#$%
&
'()
*
#
"
+
&
,-./%
"
+
&
The implementation of training and segmentation model
validation consists of several stages as follows:
1) Prepare computers for training, validation, and testing
with specifications:
Dual Processor with Intel(R) Xeon(R) CPU @
2.30GHz model.
Memory capacity 13 GB.
GPU Tesla K80.
2) Evaluation of accuracy, error (loss), and improvement
of the Mask R-CNN hyperparameter configuration.
Training is conducted several times to get the model with
the lowest loss or error value. When the loss value in training
is too high this can be caused by the high bias, so the way to
overcome this is to add layers to the Mask R-CNN or increase
the number of epochs in training. Whereas when the loss
value in validation is too high, this can be overcome by
adding data to the validation set to minimize overfitting.
Formally, in the training process, the Mask R-CNN
defines multi-task loss on each RoI sample as:
0 '(0123(4056,(4 07839
From the results of the training segmentation model, the
results of measurement of accuracy and error (loss) as shown
in Table II below:
TABLE II. RESULTS OF MEASUREMENT OF ERROR (LOSS) AND
VALIDATION IN TRAINING EXPERIMENTS FOR A SEGMENTATION
MODEL
Training #1
Training #2
Training #3
Epoch
40
30
40
Loss
0.04132
0.06896
0.04447
BBox loss
0.003998
0.007919
0.003894
Class loss
0.008325
0.01043
0.004594
Mask loss
0.02632
0.04133
0.03183
RPN bbox loss
0.002266
0.008789
0.003773
RPN class loss
0.000402
0.0004826
0.0003732
In the first training, using 40 epochs to get the lowest loss
and validation loss values, and the lowest loss value obtained
from the three training trials is 0.04132. Furthermore, the first
experimental training model is used for calorie prediction
testing or inference.
Measurement of the loss function is needed in this
research because it serves as a way to measure the distance or
difference between the predicted output and the ground truth
label to produce a more accurate segmentation model.
F. Model Testing
Testing the food segmentation model is done after the
food segmentation model is obtained from the results of
training.
Fig. 8. Testing the detection of food object segmentation using a
segmentation model and confidence scores obtained.
In the second test, segmentation detection was carried out
by plotting confusion matrix to determine the performance of
Intersection over Union (IoU) between predictions and
ground truth from the result of segmentation in Fig. 8.
Fig. 9. Defines IoU on object detection tasks
IoU calculations are obtained through the equation below:
:;< '=>?@(;A(BC?>D@#
=>?@(;A(<EF;E
(2)
(3)
(1)
Object Class: Braised Spiced
Tofu
Area (px2): 8495
Area (cm2): 21
Height (cm): 1,5
Volume (cm3): 31
Density (g/cm3): 1,07176564
Mass (g): 33
Calories (cal): 49
Object Class: Fried Tofu
Area (px2): 9877
Area (cm2): 24
Height (cm): 1,4
Volume (cm3): 33
Density (g/cm3): 0,71828812
Mass (g): 25
Calories (cal): 27
Fig. 10. Plotting confusion matrix from the result of segmentation in Fig. 8.
Next, the measurement of Average Precision (AP)
measures the results of the bounding box prediction and the
class of detected objects. Measurement is obtained from
calculating the average precision value for recall values with
a range of values between 0 and 1 using the detection
evaluation metric used by the Microsoft COCO dataset. In
this study, the author used three types of evaluation matrices
as follows:
1. AP or percentage of AP on IoU = .50: .05: .95 (main
test)
2. APIoU=.50 or percentage of AP on IoU = .50 (metric
used by PASCAL VOC)
3. APIoU=.75 or percentage of AP on IoU = .75 (strict
metric)
TABLE III. MEASUREMENT OF AVERAGE PRECISION (AP) OF THE
SEGMENTATION MODEL
Backbone
AP
AP50
AP75
ResNet 101
0.8941
0.9678
0.9638
G. Volume calculation of detected object
After obtaining food object segmentation, the next step is
to measure the area of each food object detected or
segmented, then compare the area of detected food object
with the area of the detected plate.
To get the mass object, the actual object area is compared
to the actual plate area, then the actual object area multiplied
by the height of the food object to obtain the volume of the
food objects, while the object mass is obtained by multiplying
the volume of the object with the object density constant of
the object.
Fig. 11. Detect the class of food objects and get the calorie content of each
object
After obtaining the mass of food items from the results of
segmentation, then calculating the estimated calories
contained in food is based on the reference calorie table.
Reference to the calorie table is obtained through the source
of the Indonesian Ministry of Health Promotion brochure on
"Healthy Lifestyle" which includes the five classes of defined
objects.
TABLE IV. CALORIE TABLE
Food Name
Mass (g)
Calories
Unit
Fried Tofu
100
111
1,5
Fried Tempeh
50
118
1,5
Braised Spiced Tofu
100
147
1,75
Braised Spiced Tempeh
50
157
2
White Bread
50
128
1,5
Source: Health Ministry of the Republic of Indonesia
The calorie of each class of food object as a result of
segmentation is obtained by dividing the mass of the object
with the mass of the object on the calorie table then multiplied
by the number of calories in the calorie table so that the
calorie content calculation can be poured into the equation
below.
G@DH'( BIJ?KL(M@NN
BIJ(M@NN(FE(G@DHO@ID?(G@DHFE(G@DHO@ID?
(
H. Show Object Segmentation Results and Calorie
Calculation
Testing the segmentation model in this study uses web
applications and Android applications, thus facilitating the
calculation of the number of food calorie content using the
segmentation model that has been obtained from the results
of training.
(4)
Fig. 12. Detection results of calorie image segmentation and calculation of
food on web applications and Android applications
IV. RESULTS AND DISCUSSION
From the results of segmentation testing according to Fig.
12 a comparison of the results of measurements of the area,
mass, calories from segmentation with ground truth, and
percentage of caloric accuracy is presented through Table V
below.
The accuracy of calorie calculation in Table V is
calculated using the equation below:
(
P>>;> 'G@DH;A(QO )G@DH;A(R>?SFKLF;E(TUU
G@DH;A(QO
(
=KKV>@KW 'TUU)P>>;>
TABLE V. RESULT TABLE
Class
Area
of GT*
(cm2)
Mass
of GT
(g)
Calories
of GT
(cal)
Area
of
Pred.*
(cm2)
Mass
of
Pred.
(g)
Total of
Calories
(cal)
Accuracy
(%)
Fried
Tofu
22,5
24
26.64
24.07
24.77
27.5
92.98
Fried
Tempeh
16
14
33.04
15.47
13.49
31.8
96.25
Braised
Spiced
Tofu
18
33
48.51
20.72
33.45
49.2
98.58
Braised
Spiced
Tempeh
26,5
37
116.18
30.0
36.92
115.9
99.76
White
Bread
133
38
97.28
123.1
37.93
97.1
99.81
GT*: Ground Truth, Pred.*: Prediction
As seen in Table V, the accuracy of calorie calculation in
the fried tofu class was 92.98%, fried tempeh 96.25%, braised
spiced tofu 98.58%, braised spiced tempeh 99.76%, and
white bread 99.81%. From the result table above, the average
accuracy of calorie calculation obtained is 97.48%.
In this research, the actual area of food was calculated to
obtain the volume, and mass of the food, so that the actual
number of calories can be obtained from food, and this
technique has outperformed several previous studies related
to food calorie calculations that did not take calorie
measurements based on the actual volume and mass of food
objects and only carried out the task of image classification
or object detection (localization).
V. CONCLUSION
In this research, we built a system with a computer vision
approach that can calculate food calorie content based on
food volume and mass using the Mask Region-based
Convolutional Neural Network (R-CNN). The author got the
conclusion from this research that Mask R-CNN can be
implemented for food calorie calculations because Mask R-
CNN can compute a pixel-wise mask for every object in the
image using ResNet 101 as backbone model, RPN, RoI
classification and bounding box, and segmentation mask.
This algorithm was able to distinguish each of the same
instances of objects. The segmentation model can be used on
web applications and Android mobile applications, making it
easier for users to calculate the number of food calories
automatically. For future work, another experiment needs to
be performed to evaluate the type of food which have a
convex or concave structure. This structure would be a new
challenge in calculating food calories.
REFERENCES
[1]
P. Temdee and S. Uttama, "Food recognition on
smartphone using transfer learning of convolution
neural network," Cape Town, 2017.
[2]
T. Ege and K. Yanai, "Estimating Food Calories for
Multiple-dish Food Photos," 2017.
[3]
N. A. A. Nor Muhammad, C. P. Lee, K. M. Lim and S.
F. Abdul Razak, "Malaysian Food Recognition and
Calorie Counter Application," 2017.
[4]
S. V. Chavan and S. S. Sambare, "Segmentation of Food
Images using Morphological Operations with
Watershed Segmentation Technique," International
Journal of Computer Applications, vol. 151, no. 1, 2016.
[5]
K. He, G. Gkioxari, P. Dolla ́r and R. Girshick, "Mask
R-CNN," arXiv e-prints, p. arXiv:1703.06870, 2017.
[6]
M. Bai and R. Urtasun, "Deep Watershed Transform for
Instance Segmentation," Honolulu, HI, USA, 2017.
[7]
J. Dai, K. He and J. Sun, "Instance-aware Semantic
Segmentation via Multi-task Network Cascades," arXiv
e-prints, p. arXiv:1512.04412, 2015.
[8]
Y. Li, H. Qi, J. Dai, X. Ji and Y. Wei, "Fully
Convolutional Instance-aware Semantic Segmentation,"
arXiv e-prints, p. arXiv:1611.07709, 2016.
[9]
K. He, X. Zhang, S. Ren and J. Sun, "Deep Residual
Learning for Image Recognition," arXiv e-prints, p.
arXiv:1512.03385, 2015.
[10]
J. Long, E. Shelhamer and T. Darrell, "Fully
Convolutional Networks for Semantic Segmentation,"
arXiv e-prints, p. arXiv:1411.4038, 2014.
(5)
(6)
... Yogaswara et al. [78] created a system that employs a computer vision method and the Deep Learning Mask Region-based Convolutional Neural Network (R-CNN) technology to automatically calculate the number of calories in meals based on the size of the food volume. The segmentation technique uses the instance-aware semantic segmentation approach to identify each pixel from an instance of an object for each object recognized in a food image. ...
... From the analysis of the twenty-five studies, it is observed that a total of thirty-two different databases are used, of which thirteen (about 41%) are fully proprietary and private [3,9,10,14,19,25,31,48,66,68,74,77,78]. There are also two works that, in their composition, apply their databases together with other publicly available [38,49], corresponding to about 6% of the databases. ...
... From another perspective and considering the similarity of themes in the databases, it is observed that about 44% of these are applied to food-related studies. In general, these studies [10,16,19,20,31,49,54,66,68,74,77,78] aim for health monitoring and management (e.g., calorie intake) through measurable information such as food volume. In a significant number, with about 61% of the databases involving nineteen studies [3,9,13,14,25,26,29,32,36,38,48,49,73,76], some seek to identify and reconstruct objects or different shapes. ...
Article
Nowadays, the object’s volume is essential for monitoring any scene. Technological equipment is evolving, and mobile devices and other devices embed high-resolution cameras. The high-resolution cameras open a window for different research studies, where the volume measurement is vital for different areas. This study aims to identify image processing techniques for measuring the object’s volume. Thus, a systematic review was performed with a Natural Language Processing (NLP)-based framework for identifying studies between 2010 and 2023 related to the measurement of object volume. As a result of this search, this paper reviewed and analyzed 25 studies, verifying that different computer vision methods accurately handle object recognition. Additionally, an evaluation of the databases presented by the studies above is performed to consider further the design of a new approach to infer the volume of objects from an image.
... In relation to the problem of dietary monitoring, Yogaswara et at. [5] proposed the use of instance-aware semantic classification and segmentation for the task of estimating the caloric content of images captured by a smartphone camera. In their approach, they first use the Mask Region-based Convolutional Neural Networks (Mask R-CNN) [6] to identify different in-stances of each object, then they estimate the area and the volume of each food type, and finally they provide an estimation of the caloric content. ...
... In relation to the problem of dietary monitoring, Yogaswara et at. [5] proposed the use of instance-aware semantic classification and segmentation for the task of estimating the caloric content of images captured by a smartphone camera. In their approach, they first use the Mask Region-based Convolutional Neural Networks (Mask R-CNN) [6] to identify different in-stances of each object, then they estimate the area and the volume of each food type, and finally they provide an estimation of the caloric content. ...
Preprint
The demand for accurate food quantification has increased in the recent years, driven by the needs of applications in dietary monitoring. At the same time, computer vision approaches have exhibited great potential in automating tasks within the food domain. Traditionally, the development of machine learning models for these problems relies on training data sets with pixel-level class annotations. However, this approach introduces challenges arising from data collection and ground truth generation that quickly become costly and error-prone since they must be performed in multiple settings and for thousands of classes. To overcome these challenges, the paper presents a weakly supervised methodology for training food image classification and semantic segmentation models without relying on pixel-level annotations. The proposed methodology is based on a multiple instance learning approach in combination with an attention-based mechanism. At test time, the models are used for classification and, concurrently, the attention mechanism generates semantic heat maps which are used for food class segmentation. In the paper, we conduct experiments on two meta-classes within the FoodSeg103 data set to verify the feasibility of the proposed approach and we explore the functioning properties of the attention mechanism.
Book
Originated with Bachelor's project in 2021 with three Bangaladesh young friends. Present third author suggested by second author. More refined and more precise version can be written.
Chapter
Background: EEG provides researchers with an opportunity to study neural correlates in terms of temporal connectivity. This connectivity can shed light on the possible network topology between a healthy person versus a patient or help differentiate between two different groups (experts and non-experts). Purpose: With the help of machine learning models, the difference in network topology can be used to understand the neural correlations between healthy control and a patient with ease compared to traditional EEG analysis. Further, a comparative analysis between the different spectral connectivity measures provides the best suitable measure for the study. Methods: EEG data from a meditation study (n = 31) and Parkinson's study (n = 24) containing the resting-state EEG recordings are utilized here. The EEG data is converted to spectral connectivity: coherence, which becomes the input for the machine learning models, support vector machine, k-means clustering, deep convolution neural networks, recurrent neural networks, and graph neural networks. Results: Classification accuracies of SVM and RNN are 56.585 and 56%, whereas D-CNN provides an accuracy of 59.5%. Both (~ 7%) k-means and GNN failed in the off-the-shelf approach. Conclusion: The comparative study shows the application capabilities of neural networks machine learning with commonly used machine learning models and the impact the various connectivity measures have on model accuracy.
Chapter
According to the WHO, an unhealthy diet is responsible for nearly 20% of all deaths worldwide. Most of the population are living in countries where obesity and overweight cause more fatalities than underweight. The issue here is not a lack of food; rather people are unaware of what is in their diet. Knowing how many calories are in the foods that individuals eat and can assist them maintain their body’s health. Knowing how many calories are in the foods that individuals eat, can assist them maintain their body’s health by meeting the body’s fundamental calorie needs. It would have a variety of beneficial impacts, such as living a healthy lifestyle and providing a suitable amount of energy for regular exercise. Those who do not even worry over their caloric demands, on the other hand, will suffer a variety of health issues, such as obese and increasing ailments such as hypertension and prediabetes. People could easily decide how many calories they want to consume if they could estimate their calorie intake using images of their food. Determining the real quantity of caloric from meal technologically involves the food item’s region, size, and weight. Deep learning algorithms can identify the object and calories are estimated based on the object detection method and volume estimation method. If people knew how many calories were in their food, this problem could be mitigated slightly. This is accomplished in three stages: (1) image segmentation to determine each food’s contour, (2) image recognition using faster R-CNN, and (3) estimate the food’s weight and calories. In this study, the proposed system detects the object of each food’s contour using Otsu’s method and estimates the calories of each food, with data trained using faster RCNN.
Chapter
Complex optimization problems are often associated to large search spaces and consequent prohibitive execution times in finding the optimal results. This is especially relevant when dealing with dynamic real problems, such as those in the field of power and energy systems. Solving this type of problems requires new models that are able to find near-optimal solutions in acceptable times, such as metaheuristic optimization algorithms. The performance of these algorithms is, however, hugely dependent on their correct tuning, including their configuration and parametrization. This is an arduous task, usually done through exhaustive experimentation. This paper contributes to overcome this challenge by proposing the application of sequential model algorithm configuration using Bayesian optimization with Gaussian process and Monte Carlo Markov Chain for the automatic configuration of a genetic algorithm. Results from the application of this model to an electricity market participation optimization problem show that the genetic algorithm automatic configuration enables identifying the ideal tuning of the model, reaching better results when compared to a manual configuration, in similar execution times.KeywordsAutomatic algorithm configurationElectricity marketsGenetic algorithmMetaheuristic optimizationPortfolio optimization
Chapter
Diabetes is a chronic metabolic disease characterized by high blood sugar levels, which over time leads to body complications that can affect the heart, blood vessels, eyes, kidneys, and nerves. To control this disease, the use of applications for tracking and monitoring vital signs have been used frequently. These support systems improve their quality of life and prevent exacerbations, however they cannot help with nutritional control, so several patients with this disease still use the counting carbohydrates method, but this process is not available to everyone and is a time-consuming and not very rigorous method. This study evaluates three approaches including Support Vector Machine, Convolution Neural Network, and a pre-trained Convolution Neural Network called MobileNetV2, to choose the algorithm with the best performance in meals recognition and makes the control nutritional task more quickly, accurately, and efficiently. The results showed that the pre-trained Convolution Neural Network is the best choice for recognizing meals from an image, with an accuracy of 99%.KeywordsDiabetesFood recognitionSupport vector machineConvolutional neural networksMobileNetV2
Article
Full-text available
We present the first fully convolutional end-to-end solution for instance-aware semantic segmentation task. It inherits all the merits of FCNs for semantic segmentation and instance mask proposal. It performs instance mask prediction and classification jointly. The underlying convolutional representation is fully shared between the two sub-tasks, as well as between all regions of interest. The proposed network is highly integrated and achieves state-of-the-art performance in both accuracy and efficiency. It wins the COCO 2016 segmentation competition by a large margin. The code would be released at \url{https://github.com/daijifeng001/TA-FCN}.
Article
We present a conceptually simple, flexible, and general framework for object instance segmentation. Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance. The method, called Mask R-CNN, extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition. Mask R-CNN is simple to train and adds only a small overhead to Faster R-CNN, running at 5 fps. Moreover, Mask R-CNN is easy to generalize to other tasks, e.g., allowing us to estimate human poses in the same framework. We show top results in all three tracks of the COCO suite of challenges, including instance segmentation, bounding-box object detection, and person keypoint detection. Without tricks, Mask R-CNN outperforms all existing, single-model entries on every task, including the COCO 2016 challenge winners. We hope our simple and effective approach will serve as a solid baseline and help ease future research in instance-level recognition. Code will be made available.