Conference PaperPDF Available

FoodIE: A Rule-based Named-entity Recognition Method for Food Information Extraction

Authors:

Abstract and Figures

The application of Natural Language Processing (NLP) methods and resources to biomedical textual data has received growing attention over the past years. Previously organized biomedical NLP-shared tasks (such as, for example, BioNLP Shared Tasks) are related to extracting different biomedical entities (like genes, phenotypes, drugs, diseases, chemical entities) and finding relations between them. However, to the best of our knowledge there are limited NLP methods that can be used for information extraction of entities related to food concepts. For this reason, to extract food entities from unstructured textual data, we propose a rule-based named-entity recognition method for food information extraction, called FoodIE. It is comprised of a small number of rules based on computational linguistics and semantic information that describe the food entities. Experimental results from the evaluation performed using two different datasets showed that very promising results can be achieved. The proposed method achieved 97% precision, 94% recall, and 96% F1 score.
Content may be subject to copyright.
FoodIE: A Rule-based Named-entity Recognition Method for Food
Information Extraction
Gorjan Popovski1, Stefan Kochev1, Barbara Korouˇ
si´
c Seljak2and Tome Eftimov2
1Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University,
Rugjer Boshkovikj 16, 1000 Skopje, Macedonia
2Computer Systems Department, Joˇ
zef Stefan Institute, Jamova cesta 39, 1000 Ljubljana, Slovenia
{gorjan.popovski, stefan.kochev}@gmail.com, {barbara.korousic, tome.eftimov}@ijs.si
Keywords: Information Extraction, Rule-based Named-entity Recognition, Food-entity Recognition.
Abstract: The application of Natural Language Processing (NLP) methods and resources to biomedical textual data
has received growing attention over the past years. Previously organized biomedical NLP-shared tasks (such
as, for example, BioNLP Shared Tasks) are related to extracting different biomedical entities (like genes,
phenotypes, drugs, diseases, chemical entities) and finding relations between them. However, to the best of
our knowledge there are limited NLP methods that can be used for information extraction of entities related to
food concepts. For this reason, to extract food entities from unstructured textual data, we propose a rule-based
named-entity recognition method for food information extraction, called FoodIE. It is comprised of a small
number of rules based on computational linguistics and semantic information that describe the food entities.
Experimental results from the evaluation performed using two different datasets showed that very promising
results can be achieved. The proposed method achieved 97% precision, 94% recall, and 96% F1score.
1 INTRODUCTION
Nowadays, a large amount of textual information is
available in digital form and published in public web
repositories (e.g., online news, scientific publications,
social media). The textual information is presented as
unstructured data, meaning that the data has no pre-
defined data model. Working with textual data is a
challenge because of its variability - the same con-
cepts can be mentioned in different ways regarding
the fact how people express themselves and use dif-
ferent writing styles.
Information Extraction (IE) is a task of automat-
ically extracting information from unstructured data
and, in most cases, is concerned with the processing
of human language text by means of natural language
processing (NLP) (Aggarwal and Zhai, 2012). The
idea behind IE is to provide a structured representa-
tion of extracted information obtained from analyzed
text. The information to be extracted is defined by
users, and consists of predefined concepts of interest
and related entities, as well as relationships between
entities and events.
One of the classic IE tasks is named-entity recog-
nition (NER), which addresses the problem of iden-
tification and classification of predefined concepts
(Nadeau and Sekine, 2007). It aims to determine and
identify words or phrases in text into predefined labels
(classes) that describe concepts of interest in a given
domain. Various NER methods exist: terminological-
driven,rule-based,corpus-based,methods based on
active learning (AL), and methods based on deep neu-
ral networks (DNNs).
In this paper, we focus on IE of food entities. To
the best of our knowledge, not a large amount of re-
search focusing on food entities has been done. How-
ever, nowadays, the knowledge about extracted food
entities and their relations with other biomedical enti-
ties (like genes, drugs, diseases, etc.) is important for
improving public health.
The main contributions of this paper are:
A rule-based NER method for IE of food entities.
Evaluation of the proposed method, which pro-
vides promising results on unstructured data,
without a need for an annotated corpus.
In the remainder of the paper, we first present an
overview of the related work. Then, we present the
proposed rule-based NER method for IE of food en-
tities. Next, the data used for evaluation is explained,
followed by the results and discussion. Finally, the
conclusions of the paper and a discussion for future
work are presented.
2 RELATED WORK
IE from biomedical literature is a very important task
with the goal of improving public health. Because
NER methods which have the best performances are
usually corpus-based NER methods, there is a need
for an annotated corpus from biomedical literature
that includes the entities of interest. For this purpose,
different annotated corpora are produced by shared
tasks, where the main aim is to challenge and encour-
age research teams on NLP problems.
In comparison with the extensive work done for
biomedical tasks, in the food science domain the sit-
uation is different. Several studies have been con-
ducted, but with different goals. For example, in (Xia
et al., 2013) authors presented an approach to iden-
tify rice protein resistant to Xanthomonas oryzae pv.
oryzae, which is an approach to enhance gene prior-
itization by combining text mining technologies with
a sequence-based approach. Co-occurrence methods
were also used to identify ingredients mentioned in
food labels and extracting food-chemical and food-
disease relationship (do Nascimento et al., 2013;
Jensen et al., 2014).
A ML approach to Japanese recipe text processing
was proposed in (Mori et al., 2012), where one task,
which was evaluated, was food-named entity recog-
nition. This approach used the r-FG corpus, which
is composed solely from Japanese food recipes. An-
other similar approach for generating graph structures
from food recipes was proposed in (Chen, 2017),
where authors manually annotated a recipe corpus
that is then used for training a ML model.
The UCREL Semantic Analysis System (USAS)
is a framework for automatic semantic analysis of
text, which distinguishes between 21 major cate-
gories, one of which is “food and farming” (Rayson
et al., 2004), being heavily utilized in our rule-based
system - FoodIE. The USAS can provide additional
information about the food entity, but the limitation is
that it works on a token level. For example, if in the
text two words (i.e. tokens), like “grilled chicken”,
denote one food entity that needs to be extracted and
analyzed, the semantic tagger would actually parse
the words “grilled” and “chicken” as separate entities
and obtain separate semantic tags.
In (Eftimov et al., 2017), a rule-based NER used
for IE from evidence-based dietary recommendation,
called drNER, is presented, where among other enti-
ties, food entities were also of interest.
Recipe description
(text)
Food-related
pre-processing
Text POS-tagging
and
post-processing the tag
dataset
Semantic tagging
of food tokens
in the text
Food named-entity
recognition
Food entities
Figure 1: The flowchart of the Foodie methodology.
3 FOODIE: A RULE-BASED
FOOD-NAMED ENTITY
RECOGNITION
To enable food-named entity recognition, in this pa-
per, we propose a rule-based approach, called FoodIE.
It works with unstructured data (more specifically,
with a recipe that includes textual data in form of in-
structions on how to prepare the dish) and consists of
four steps:
Food-related text pre-processing
Text POS-tagging and post-processing of the tag
dataset
Semantic tagging of food tokens in the text
Food-named entity recognition
The flowchart of the methodology is presented in Fig-
ure 1. Further, we are going to explain each part in
more detail.
3.1 Food-related Text Pre-processing
The pre-processing step takes into account the dis-
crepancies that exist between the outputs of the tag-
gers we are utilizing, coreNLP tagger from the R pro-
gramming language (Arnold and Tilton, 2016) and the
UCREL Semantic Analysis System (USAS) (Rayson
et al., 2004). It is also used to remove any characters
that are unknown to the taggers.
Firstly, quotation marks should be removed from
the raw text, for the simple reason that they are treated
differently by both used NLP libraries, causing a dis-
crepancy.
Secondly, every white space sequence (including
tabulation, newlines, etc.) is converted into a single
white space to provide a consistent structure to the
text.
Additionally, ASCII transliteration is performed,
which means characters that are equivalent to ASCII
characters are transliterated. An example of such
characters is [`
e, ¨
o, `
a], which are transliterated to [e,
o, a], respectively.
Finally, fractions should be converted into real
numbers. Usually, when a food-related text is writ-
ten (e.g., recipe), fractions are used when discussing
quantities. However, they are usually written in plain
ASCII format and in a manner which is confusing to
NLP taggers. For example, “2.5” is usually written as
“2 1/2” in such texts. This does not bode well with
coreNLP and the USAS semantic tagger. Thus, in the
pre-processing step, all fractions are converted into
the standard mathematical decimal notation for real
numbers.
3.2 Text POS-tagging and
Post-processing of the Tag Set
To obtain the morphological information from a tex-
tual data, we use UCREL Semantic Analysis System
(USAS) and coreNLP.
The USAS semantic tagger provides word tokens
associated with their POS tags, lemmas, and seman-
tic tags. The semantic tags show semantic fields that
group together word senses that are related at some
level of generality with the same contextual concept.
The groups include not only synonyms and antonyms
but also hypernyms and hyponyms. More details
about semantic tags can be found in (Rayson et al.,
2004; Alexander and Anderson, 2012).
Furthermore, the same is done using the coreNLP
library, which includes all of the above except seman-
tic tags.
For example, the sentence “Heat the beef soup un-
til it boils” is processed by both libraries. The results
from the coreNLP library for the above mentioned ex-
ample sentence are presented in Table 1, while the re-
sults from USAS are presented in Table 2. Observing
the results presented in the tables, it is obvious that
there is a discrepancy between the POS tags for the
token “Heat”.
As is evident, both the USAS semantic tagger and
the coreNLP library, do not provide perfect tags (e.g.,
sometimes verbs are misclassified as nouns, as is the
Table 1: Tags obtained from coreNLP for one recipe sen-
tence.
Token ID Token Lemma POS tag
1 Heat heat NN
2 the the DT
3 beef beef NN
4 soup soup NN
5 until until IN
6 it it PRP
7 boils boil VBZ
8 . . .
case with the first token in the example given in Table
1). For this reason, the tags returned by both taggers
are post-processed and modified using the following
linguistic rules:
If at least one of the taggers classify a token as a
verb, mark it as a verb.
If there exists a discrepancy between the tags for
a specific token, prioritize the tag given by the
USAS semantic tagger.
If a past participle form or a past simple form of
a verb precedes and is adjacent to a noun, and it
is classified as a verb, change the tag from verb to
adjective.
Finally, we keep two versions of the modified tag
set, one in each format. These modified tags in the
coreNLP format and USAS format are presented in
Table 3 and Table 4, respectively.
3.3 Semantic Tagging of Food Tokens in
Text
To define phrases in the text related to food enti-
ties, we first need to find tokens that are related
to food entities. For this purpose, the USAS se-
mantic tagger is utilized. Using it, a specific rule
is defined to determine the food tokens in the text.
Food tokens are predominantly nouns or adjectives,
so we account for this as to improve the false pos-
itive rate, i.e. allowing a token to be categorized
as a food token if and only if it is either a noun
or an adjective. The decision rule combines three
conditions using the following Boolean expression
((Condition1OR Condition2)AND Condition3). If
the expression is true, then the token is classified as
food token. For clarity, let us assume that tis a to-
ken and stis the semantic tag that is assigned to it
using the USAS semantic tagger. Each condition is
constructed using the following rules:
Condit ion1:
Food tag F(1|2|3|4), or
Table 2: Tags obtained from USAS for one recipe sentence.
Token ID Token Lemma POS tag Semantic tag 1 Semantic tag2
1 Heat heat VV0 O4.6+ AJ.03.c.02 [Heat]; AJ.03.c.02 [Heat]; AJ.03.c.02.a [Heating/making hot/warm];
2 the the AT Z5 ZC [Grammatical Item];
3 beef beef NN1 F1 AG.01.d.03 [Beef]; AE.14.m.03 [Subfamily Bovinae (bovines)]; AE.14.m.03 [Subfamily Bovinae (bovines)];
4 soup soup NN1 F1 AG.01.n.02 [Soup/pottage]; AA.04.g.04 [Wave]; AA.11.h [Cloud];
5 until until CS Z5 ZC [Grammatical Item];
6 it it PPH1 Z8 ZF [Pronoun];
7 boils boil VVZ O4.6+ E3- AJ.03.c.02.b [Action of boiling]; AJ.03.c.02.b [Action of boiling]; AJ.03.c.02.b [Action of boiling];
8 . PUNC YSTP PUNC NULL
Table 3: Modified tags from coreNLP for one recipe sen-
tence.
Token ID Token Lemma POS tag
1 Heat heat VB
2 the the DT
3 beef beef NN
4 soup soup NN
5 until until IN
6 it it PRP
7 boils boil VBZ
8 . . .
Living tag L(2|3), or
Substance tag (liquid and solid) O1.(1|2).
Condit ion2:
Body part tag B1, and
Not Linear order tag N4, and
Not Location and direction tag M6, and
Not Texture tag O4.5.
Condit ion3:
Not General Object tag O2, and
Not Quantities tag N5, and
Not Clothing tag B5, and
Not Equipment for food preparation tag
AG.01.t.08, and
Not Container for food, place for storing food
tag AG.01.u, and
Not Clothing tag AH.02.
More formally, using Boolean algebra, we can
write these rules as:
Condit ion1:
st∈ {F1,F2,F3,F4} ∨ st∈ {L2,L3} ∨ st
{O1.1,O1.2}
Condit ion2:
st=B1 st6=N4 st6=M6 st6=O4.5
Condit ion3:
st6=O2 st6=N5 st6=B5 st6=AG.01.t.08 st6=
AG.01.ust6=AH.02.
Rule1:
(Condition1Condition2)Condition3
Additionally, we define one rule to determine ob-
ject tokens. Determining the object tokens will fur-
ther help us in the definition of food entities, mainly
to avoid false positives. The rule consists of
General Object tag O2, or
Clothing tag B5, and
Not Body Part tag B1, and
Not Living tag L(2|3), and
Not a food token as defined by the aforementioned
first rule.
Using Boolean algebra, this rule is represented as
Rule2:
(st=O2 st=B5)st6=B1 st6=L2 st6=
L3 ∧ ¬Rule1.
If this condition is met, the token is tagged as gen-
eral object.
The single rule for defining color noun is consisted
of
Color tag O4.3.
The rule for defining a color noun is then formally
defined as
Rule3:
st=O4.3.
These tags are useful when food entities ending
on a color, such as “egg whites” or “hash browns”,
appear in the text, which indeed are to be treated as
food entities.
At the end, one additional rule is constructed for
defining what is explicitly disallowed to be the main
token in a food entity, and is defined as
Equipment for food preparation AG.01.t.08, and
Table 4: Modified tags from USAS for one recipe sentence.
Token ID Token Lemma POS tag Semantic Tag 1 Semantic tag 2
1 Heat heat VV0 O4.6+ AJ.03.c.02 [Heat]; AJ.03.c.02 [Heat]; AJ.03.c.02.a [Heating/making hot/warm];
2 the the AT Z5 ZC [Grammatical Item];
3 beef beef NN1 F1 AG.01.d.03 [Beef]; AE.14.m.03 [Subfamily Bovinae (bovines)]; AE.14.m.03 [Subfamily Bovinae (bovines)];
4 soup soup NN1 F1 AG.01.n.02 [Soup/pottage]; AA.04.g.04 [Wave]; AA.11.h [Cloud];
5 until until CS Z5 ZC [Grammatical Item];
6 it it PPH1 Z8 ZF [Pronoun];
7 boils boil VVZ O4.6+ E3- AJ.03.c.02.b [Action of boiling]; AJ.03.c.02.b [Action of boiling]; AJ.03.c.02.b [Action of boiling];
8 . PUNC YSTP PUNC NULL
Container for food, place for storing food
AG.01.u, and
Clothing tag AH.02, and
Temperature tag O4.6, and
Measurement tag N3.
This rule can be represented as
Rule4:
st=AG.01.t.08 st=AG.01.ust=AH.02 st=
O4.6st=N3.
This rule is utilized when isolating entities that
could be potential false positives. An example of
this would be “oil temperature” or “cake pan”. Ad-
ditionally, there are some manually added resources
in this disallowed category, which frequently occur in
the texts.
3.4 Food-named Entity Recognition
To obtain food chunks, we used the modified tag set
from the USAS semantic tagger obtained in Subsec-
tion 3.2 in combination with the food tokens obtained
in Subsection 3.3. The process of food-named entity
recognition consists of three steps.
Firstly, we iterate through every food token which
we extracted previously from the text, and for each
token we define a set of rules that constitute a food
entity.
Adjacent to the left of the food token we allow
chaining of adjectives (JJ), nouns (NN), proper nouns
(NP), genitive tag (GE), unknown tags (Z99) and gen-
eral tokens tagged as food, but explicitly omit general
objects. The purpose of including the unknown POS
tag (Z99) is to catch tokens that do not concisely fall
into one of the tags in the standard POS tag set, yet
still are of importance to the semantics of the food en-
tity. Such an example would be “Colby-Jack cheese”,
whose POS tags are Z99 and NN, respectively.
Adjacent to the right the logic is the same, differ-
ing only by allowing general object to be part of the
food entity and tokens that have been been tagged as
a color noun by the rule engine. We also keep track
not to use a token twice.
Then, to determine if it truly is a food entity chunk
or just a chunk related to food but not a food entity in
and of itself, we check the last token of the chunk.
The whole chunk is discarded if the last token is:
A noun (starts with NN) and a general non-food
object, or
in the disallowed category as defined by the rule
engine, or
in the disallowed category as defined by the re-
sources.
Some examples where this would be a false pos-
itive are “muffin liner”, “casserole dish” or “egg
timer”. If this check passes and the last token is not
a general object, we mark each token in the new food
chunk with an index unique to the whole chunk and
continue iterating through the remaining food tokens.
After the first step, we now must concatenate all
relevant information for each food entity. For each
indexed food entity, we join all the instances into one
entry, thus creating a vector where each token is its
own entry, except for the food entities which are rep-
resented as one entry. If initially we had a vector of
tokens such as [Chop, the, hot, Italian, sausage, into,
pieces, .] the output would be [Chop, the, hot Ital-
ian sausage, into, pieces, .]. This also applies to other
relevant information we might want to track, such as
lemmas, POS tags, sentence indexes or even individ-
ual token indexes.
For additional robustness, we perform a check to
assure that each food chunk we have isolated indeed
contains a food token, and that the token is marked
under some food chunk. For this we only mark a
chunk as a food entity if it contains at least one word
that has previously been tagged as a food token and
has been indexed as part of the respective chunk as
well.
4 EVALUATION
The evaluation was performed manually, since there
is no pre-existing method to evaluate such a text cor-
pus. To avoid any kind of bias when evaluating food-
related text, one person was tasked with manually
performing food chunk extraction from each individ-
ual text, while another person cross referenced those
manually obtained chunks with the ones obtained
from FoodIE. Using this method, a figure for true pos-
itives (TPs), false negatives (FNs) and false positives
(FPs) was procured, while it was decided that the cat-
egory true negative was not applicable to the nature
of the problem and its evaluation. Additionally, it was
decided that a “partial (inconclusive)” category was
necessary, as some of the food chunks were incom-
plete, but nevertheless caught, thus including signifi-
cant information. This category encompasses all the
extracted food chunks which were caught, but missed
at least one token. An example would be “bell pep-
per”, where FoodIE would only catch “pepper”.
We would like to compare the results using the
model presented in (Chen, 2017), but we were un-
able to obtain the requested model and corpus. We
provide a small example of comparing FoodIE with
drNER (Eftimov et al., 2017), in order to show that
they provide food entities on different level, so a fair
comparison cannot be made.
While the evaluation was being done, we kept
track of all the False Negative instances and have con-
structed a resource set that will improve the perfor-
mance of FoodIE in future implementations.
4.1 Data
Firstly, a total of 200 recipes were processed and eval-
uated. The original 100 recipes, which were analyzed
and upon which the rule engine was built, were taken
into consideration, as well as 100 new recipes which
had not been analyzed beforehand. The recipes were
taken from two separate user-based sites, Allrecipes
(https://www.allrecipes.com/) and MyRecipes (https:
//www.myrecipes.com/), where there is no standard-
ized format for the recipe description. This was cho-
sen as such to ensure that the linguistic constructs uti-
lized in each written piece varied and had no pattern
behind them. The texts were chosen from a variety of
topics, as to provide further diversity.
Secondly, we selected 1,000 independently ob-
tained recipes from Allrecipes (Groves, 2013), which
is the largest food-focused social network, where ev-
eryone plays part in helping cooks discover and share
home cooking. We selected the Allrecipes because
there is no limitation as to who can post recipes, so
we have variability in how users express themselves.
The recipes were selected from five recipe categories:
Appetizers and snacks, Breakfast and Lunch, Dessert,
Dinner, and Drinks. From each recipe category 200
recipes were included in the evaluation set.
The evaluation datasets, including the obtained
results, are publicly available at http://cs.ijs.si/
repository/FoodIE/FoodIE\datasets.zip.
4.2 Results and Discussion
The results for TPs, FPs, and FNs of evaluating the
FoodIE using the dataset of 200 recipes are presented
in Table 5. The group “Partial (Inconclusive)” was
left out of these evaluations, as some would argue
they should be counted as TPs, while other that they
should be included in the FNs. Some examples in-
cluded here are: “empty passion fruit juice”, “cinna-
mon” and “soda”, where the actual food entity chunks
would be “passion fruit juice”, “cinnamon sticks” and
“club soda”, respectively. These are mostly due to the
dual nature of words, meaning that a word that is a
synonym of both a noun and a verb or an adjective and
a verb, occur. For such words, the tagger sometimes
incorrectly classifies the tokens. In these examples,
“empty” is tagged as an adjective, where in context
it, in fact, is a verb. The same explanation holds for
the other two examples. For these reasons, when the
evaluation metrics were calculated, this category was
simply omitted. Moreover, even if they are grouped
with either TPs or FNs, this does not significantly af-
fect the results.
Regarding the FN category (type II error), there
were some specific patterns that produced the most
instances. One very simple type of a FN instance
is where the author of the text refers to a specific
food using the brand name, such as “allspice” or
“J¨
agermeister”. These are difficult to catch if there is
no additional information following the brand name.
However, if the user includes the general classifi-
cation of the branded food, FoodIE will catch it.
An example of this would be by simply writing
“J¨
agermeister liqueur”. Another instance of a type II
error is when the POS taggers give incorrect tags, as
was the case with some “Partial (Inconclusive)” in-
stances. An example of this is when the tagger misses
chunks such as “mint leaves” and “sweet glazes”,
where both “leaves” and “glazes” are incorrectly clas-
sified as verbs when in this context they should be
tagged as nouns. Another example would be when
the semantic tagger incorrectly classifies some token
within the given context, such as “date” being clas-
sified as a noun meaning day of year, as opposed to
it being a certain fruit. Furthermore, there exist FNs
which are simply due to the rarity of the food, such
as “kefir”, “couscous” or “stevia”, the last one being
of immense importance to people suffering from dia-
betes, as it is a safe sugar substitute. Another category
of type II errors is due to the fact that some foods are
often referred by their colloquial name, such as “half-
and-half” and “spring greens”. The final category of
this type of error is where there exist spelling varia-
tions for a single food, such as “eggnog”, “egg nog”,
“egg-nog”. These are very difficult, if not impossible,
to correctly predict since grammatical and morpho-
logical styles vary with each user, which extend as far
as including simply improper use of the English lan-
guage. This is a separate problem in and of itself, i.e.
spellchecking and spelling correction.
The second type of error to discuss is the FP cate-
gory (type I error), which is often due to the existence
of objects that are not foods, but are closely related to
food entities. These include instances such as “dol-
lop” or “milk frother”, where the first example has a
meaning very closely related to food, thus making it
difficult to distinguish using the semantic tags. The
second chunk is simply an instrument related to food
and cooking, while being rare enough such that the
semantic tagger does not classify it properly as an ob-
ject.
Table 5: Predictions (200 recipes).
True Positive (TP) 3063
False Positive (FP) 75
False Negative (FN) 185
Partial (Inconclusive) 97
Using the results reported in Table 5, the evalu-
ation metrics for F1score, precision, and recall, are
presented in Table 6.
Table 6: Evaluation metrics (200 recipes).
F1Score Precision Recall
0.9593 0.9761 0.9430
The results from evaluation the FoodIE on the
dataset with 1000 recipes are reported in tables 7 and
8.
Table 7: Predictions (1000 recipes).
True Positive (TP) 11461
False Positive (FP) 258
False Negative (FN) 684
Partial (Inconclusive) 359
Comparing the results obtained from the evalua-
tions (tables 6 and 8), we can conclude that FoodIE
behaves consistently. Evaluating the dataset with 200
recipes, which consists of 100 recipes that were ana-
lyzed to build the rule engine and 100 new recipes that
were not analyzed beforehand, we obtained 0.9761
precision, 0.9430 recall, and 0.9593 F1score. Fur-
thermore, by evaluating it on a dataset that consists
of 1000 new recipes, it obtained 0.9780 for precision,
0.9437 for recall, and 0.9605 for F1score. Comparing
Table 8: Evaluation metrics (1000 recipes).
F1Score Precision Recall
0.9605 0.9780 0.9437
these results provides that FoodIE gives very promis-
ing and consistent results.
We also provided the TPs, FPs, FNs, and Par-
tial predictions, together with the evaluation metrics
for each recipe category separately (Table 9). Us-
ing them, we can see that Dinner category provides
most FNs (223), while the Breakfast/lunch category
provides the least FNs (82). Regarding the FNs,
the Breakfast/lunch category provides the most FPs
(108), while the Drinks category provides the least
FPs (31). Looking at the results, it is evident that
FoodIE retains the aforementioned consistency, even
when comparing the evaluation metrics from each cat-
egory between themselves.
Table 9: Predictions and evaluation metrics for each recipe
category.
Recipe category TP FP FN Partial F1Score Precision Recall
Appetizers/snacks 2147 27 162 45 0.9578 0.9876 0.9298
Breakfast/lunch 2443 33 82 108 0.9770 0.9876 0.9675
Desserts 2612 87 127 124 0.9607 0.9678 0.9536
Dinner 3176 47 223 51 0.9592 0.9854 0.9344
Drinks 1083 64 90 31 0.9336 0.9442 0.9233
In Table 10, we present the results obtained for
10 sentences (i.e evidence-based dietary recommen-
dations) previously used in (Eftimov et al., 2016; Ef-
timov et al., 2017), in order to present the difference
between FoodIE and drNER. Semicolon was used to
split separate food entities. Using the table, we can
see that drNER and FoodIE provide results on a dif-
ferent level. For example, let us consider the sixth rec-
ommendation. drNER extracted only one food entity,
which is “Milk, cheese, yogurt and other dairy prod-
ucts”, while FoodIE extracted four separate food enti-
ties, i.e. “Milk”, “cheese”, “yogurt”, and “other dairy
products”. From this, it follows that FoodIE provides
more precise results, which means it can also be used
as a post-processing tool for drNER in order to extract
the food entities on a individual level.
The performance of the rule-based system FoodIE
heavily depends on the taggers used, so the improve-
ment of the qualities of the POS-tagging and seman-
tic tagging methods will also improve the evaluation
metrics for FoodIE.
5 CONCLUSIONS
To extract food entities from unstructured textual
data, we propose a rule-based named-entity recogni-
tion method for food information extraction, called
FoodIE. It is a rule engine, where the rules are
Table 10: Food entities extracted by drNER and FoodIE.
Recommendation drNER FoodIE
1.
Good sources of
magnesium are:
fruits or vegetables,
nuts, peas and beans,
soy products, whole grains
and milk.
fruits or vegetables,
nuts, peas
and beans;
soy products;
whole grains and milk
fruits; vegetables;
nuts; peas; beans;
whole grains; milk
2.
The RDAs for Mg
are 300 mg
for young women
and 350 mg
for young men.
- -
3.
Increase potassium
by ordering
a salad, extra steamed
or roasted vegetables,
bean-based dishes
fruit salads,
and low-fat milk
instead of soda.
salad;
extra steamed or
roasted vegetables;
fruit salads;
low-fat milk
salad;
roasted vegetables;
bean-based dishes;
fruit salads;
low-fat milk;
soda
4.
Babies need
protein about 10
g a day.
- -
5.
1 teaspoon of
table salt contains
2300 mg of sodium.
table salt table salt
6.
Milk, cheese,
yogurt and other
dairy products
are good sources
of calcium and protein,
plus many other
vitamins and minerals.
Milk, cheese,
yogurt and
other dairy
products
Milk; cheese;
yogurt;
other dairy products
7.
Breast milk
provides sufficient
zinc, 2 mg/day
for the first 4-6
months of life.
Breast milk milk
8.
If you’re
trying to
get more
omega-3, you
might choose
salmon, tuna
or eggs enriched
with omega-3.
salmon, tuna;
eggs
salmon; tuna;
eggs
9.
If you need
to get more fiber,
look to beans,
vegetables, nuts
and legumes.
beans, vegetables,
nuts, and legumes
beans;
vegetables;
nuts; legumes
10.
Excellent sources
of alpha-linolenic
acid, ALA,
include flaxseeds
and walnuts.
flaxseeds and walnuts
alpha-linolenic acid;
flaxseeds;
walnuts
based on computational linguistics and semantic in-
formation that describe the food entities. Evaluation
showed that FoodIE behaves consistently using differ-
ent independent evaluation datasets and very promis-
ing results have been achieved.
To the best of our knowledge, there is a lim-
ited number of NLP tools that can be used for IE
of food entities. Moreover, there is a lack of anno-
tated corpora that can be used to train corpus-based
NER methods. Motivated by the evaluation results
obtained, we are planning to use it in order to build
an annotated corpus that can be further used for ex-
tracting food entities together with their relations to
other biomedical entities. By performing this, we can
easily follow the new knowledge that comes rapidly
with each day with new scientifically published pa-
pers aimed at improving public health.
ACKNOWLEDGEMENTS
This work was supported by the Slovenian Research
Agency Program P2-0098 and ERA Chair ISO-
FOOD for isotope techniques in food quality, safety
and traceability [grant agreement no. 621329].
REFERENCES
Aggarwal, C. C. and Zhai, C. (2012). Mining text data.
Springer Science & Business Media.
Alexander, M. and Anderson, J. (2012). The hansard cor-
pus, 1803-2003.
Arnold, T. and Tilton, L. (2016). coreNLP: Wrappers
Around Stanford CoreNLP Tools. R package version
0.4-2.
Chen, Y. (2017). A Statistical Machine Learning Approach
to Generating Graph Structures from Food Recipes.
PhD thesis.
do Nascimento, A. B., Fiates, G. M. R., dos Anjos, A.,
and Teixeira, E. (2013). Analysis of ingredient
lists of commercially available gluten-free and gluten-
containing food products using the text mining tech-
nique. International journal of food sciences and nu-
trition, 64(2):217–222.
Eftimov, T., Korouˇ
si´
c Seljak, B., and Koroˇ
sec, P. (2017).
A rule-based named-entity recognition method for
knowledge extraction of evidence-based dietary rec-
ommendations. PloS One, 12(6):e0179488.
Eftimov, T., Korouˇ
si´
c Seljak, B., and Koroˇ
sec, P. (2016).
Grammar and dictionary based named-entity linking
for knowledge extraction of evidence-based dietary
recommendations. In KDIR, pages 150–157.
Groves, S. (2013). How allrecipes. com became the worlds
largest food/recipe site. roi of social media (blog).
Jensen, K., Panagiotou, G., and Kouskoumvekaki, I. (2014).
Integrated text mining and chemoinformatics analy-
sis associates diet to health benefit at molecular level.
PLoS computational biology, 10(1):e1003432.
Mori, S., Sasada, T., Yamakata, Y., and Yoshino, K. (2012).
A machine learning approach to recipe text process-
ing. In Proceedings of the 1st Cooking with Computer
Workshop, pages 29–34.
Nadeau, D. and Sekine, S. (2007). A survey of named entity
recognition and classification. Lingvisticae Investiga-
tiones, 30(1):3–26.
Rayson, P., Archer, D., Piao, S., and McEnery, A. (2004).
The ucrel semantic analysis system.
Xia, J., Zhang, X., Yuan, D., Chen, L., Webster, J., and
Fang, A. C. (2013). Gene prioritization of resistant
rice gene against xanthomas oryzae pv. oryzae by us-
ing text mining technologies. BioMed research inter-
national, 2013.
... However, systems that can annotate articles for food and dietary constituents are scarce. Currently, there exist only a few food named-entity recognition tools for this task: namely, FoodIE [8], NCBO Annotator [9], and the UCREL Semantic Analysis System (USAS) [10]. Even these systems are only capable of tagging general food names but not terms that relate to nutrition (e.g., calcium, iron, riboflavin, biotin) or phytochemicals (non-nutritive dietary constituents such as alkaloids, organosulfides, carotenoids and flavonoids). ...
... However, if the categories are well defined with low ambiguities, it is possible to create a meticulous and thorough rule-based system that produces high-quality results. Interestingly, the current state-of-the-art (SOTA) food annotator FoodIE [8] also belongs to this model category. Specifically, FoodIE uses a rule-based approach to extract generic food named entities from food recipes. ...
... We study these together with four deep language models: BERT, BioBERT, RoBERTa and ELECTRA. In addition, we compare our results with models from the literature, including FoodIE [8], NCBO Annotator [9] and BuTTER [27]. These models cover the most frequent analysis paradigms for BioNER systems; i.e., dictionary-based, rule-based, machine learning-based and hybrid approaches. ...
Article
Full-text available
Biomedical Named-Entity Recognition (BioNER) has become an essential part of text mining due to the continuously increasing digital archives of biological and medical articles. While there are many well-performing BioNER tools for entities such as genes, proteins, diseases or species, there is very little research into food and dietary constituent named-entity recognition. For this reason, in this paper, we study seven BioNER models for food and dietary constituents recognition. Specifically, we study a dictionary-based model, a conditional random fields (CRF) model and a new hybrid model, called FooDCoNER (Food and Dietary Constituents Named-Entity Recognition), which we introduce combining the former two models. In addition, we study deep language models including BERT, BioBERT, RoBERTa and ELECTRA. As a result, we find that FooDCoNER does not only lead to the overall best results, comparable with the deep language models, but FooDCoNER is also much more efficient with respect to run time and sample size requirements of the training data. The latter has been identified via the study of learning curves. Overall, our results not only provide a new tool for food and dietary constituent NER but also shed light on the difference between classical machine learning models and recent deep language models.
... In the FoodIE paper [3], named entity recognition is performed by building on and improving semantic tagging using the UCREL Semantic Analysis System (USAS). The USAS is an automatic semantic tagger which has many categories related to food. ...
Preprint
Full-text available
In recent years, there has been an increase in the number of devices with virtual assistants (e.g: Siri, Google Home, Alexa) in our living rooms and kitchens. As a result of this, these devices receive several queries about recipes. All these queries will contain terms relating to a "recipe-domain" i.e: they will contain dish-names, ingredients, cooking times, dietary preferences etc. Extracting these recipe-relevant aspects from the query thus becomes important when it comes to addressing the user's information need. Our project focuses on extracting ingredients from such plain-text user utterances. Our best performing model was a fine-tuned BERT which achieved an F1-score of $95.01$. We have released all our code in a GitHub repository.
... Apart from that, there is a rule-based NER approach for food information extraction, called FoodIE. It is composed of a small number of computational linguistics and semantic information rules, that describe the food entities (Popovski et al. 2019). A recent survey by (Popovski et al. 2020) an extensive comparison between automated ex-traction methods for food information was made. ...
Article
Full-text available
In various application domains/sectors, data collected from the respective industries are complemented with open data providing added value to the overall analysis and decision making process. Open data refer to weather data, transportation information, stock/investment products prices, or even health-related data. One of the application domains that could harvest the added-value of analytics (including open-data) refers to the food industry and more specifically the decisions related to food recalls. The collected data can be analyzed in real-time through Artificial Intelligence techniques and obtain insights about potential unsafe goods and products. These insights are exploited to drive decision making, such as which goods are more probable to be harmful in the near future and subsequently optimize the food supply chain. The latter reflects the overall food recall process monitoring and is enhanced through a data-driven forecasting approach. This provides actionable insights regarding the enhancement of the food safety across the food supply chain given that goods and products can become unsafe for plenty of reasons, such as mislabeling allergens, contamination etc. To address this challenge, this paper introduces a deep learning approach leveraging Natural Language Processing and Time-series Forecasting techniques, to monitor and analyze the risk associated with each food product category and the corresponding potential recalls. Furthermore, we propose a technique that exploits reinforcement learning to utilize historical recall announcements of food products for predicting their future recalls, thus providing insights to food companies regarding upcoming trends in food recalls that can lead to timely recalls. We also evaluate and demonstrate the effectiveness and added-value of the proposed approaches through a real-world scenario that yields promising results. While several techniques/models have been analyzed and applied to address the challenge of food recall predictions, the usage of analogous/surrogate data has also been studied and evaluated towards more accurate outcomes.
... ey explored the feasibility of migration learning using a pretrained model based on BERT and achieved good results with few-shot learning [8]. Popovski et al. proposed a rule engine for extracting food concepts called FoodIE, a rulebased named entity recognition method with rule content describing food entities' computational linguistic and semantic information [9]. ...
Article
Full-text available
In recent years, entity relation extraction has been a critical technique to help people analyze complex structured text data. However, there is no advanced research in food health and safety to help people analyze the complex concepts between food and human health and their relationships. This paper proposes an entity relation extraction method FHER for the few-shot learning in the food health and safety domain. For few-shot learning in the food health and safety domain, we propose three methods that effectively improve the performance of entity relationship extraction. The three methods are applied to the self-built data sets FH and MHD. The experimental results show that the method can effectively extract domain-related entities and their relations in a small sample size environment.
... • Named entity recognition (NER): to segmentize the strings into quantity and unit, which means we need rules of what represents a quantity, a resource with all the possible units, i.e., the common food household measurements, and lastly, a resource to identify the ingredients/food items [29,30]. • Normalizing the quantities: after the NER, the units (household measurements) need to be converted in grams, in order to have the same unit all across the dataset. ...
Article
Full-text available
Being both a poison and a cure for many lifestyle and non-communicable diseases, food is inscribing itself into the prime focus of precise medicine. The monitoring of few groups of nutrients is crucial for some patients, and methods for easing their calculations are emerging. Our proposed machine learning pipeline deals with nutrient prediction based on learned vector representations on short text–recipe names. In this study, we explored how the prediction results change when, instead of using the vector representations of the recipe description, we use the embeddings of the list of ingredients. The nutrient content of one food depends on its ingredients; therefore, the text of the ingredients contains more relevant information. We define a domain-specific heuristic for merging the embeddings of the ingredients, which combines the quantities of each ingredient in order to use them as features in machine learning models for nutrient prediction. The results from the experiments indicate that the prediction results improve when using the domain-specific heuristic. The prediction models for protein prediction were highly effective, with accuracies up to 97.98%. Implementing a domain-specific heuristic for combining multi-word embeddings yields better results than using conventional merging heuristics, with up to 60% more accuracy in some cases.
... However, drNER extracts several food entities as one. This was improved by developing the rule-based NER Food Information Extraction [31], where the rules incorporate computational linguistics information in combination with food semantic annotations from the Hansard corpus [32]. Another way to perform food information extraction is to use the NCBO (National Center for Biomedical Ontology) annotator [33], which is a web service that annotates text by using food ontology concepts that are part of the BioPortal software services [34]. ...
Article
Full-text available
Background Recently, food science has been garnering a lot of attention. There are many open research questions on food interactions, as one of the main environmental factors, with other health-related entities such as diseases, treatments, and drugs. In the last 2 decades, a large amount of work has been done in natural language processing and machine learning to enable biomedical information extraction. However, machine learning in food science domains remains inadequately resourced, which brings to attention the problem of developing methods for food information extraction. There are only few food semantic resources and few rule-based methods for food information extraction, which often depend on some external resources. However, an annotated corpus with food entities along with their normalization was published in 2019 by using several food semantic resources. Objective In this study, we investigated how the recently published bidirectional encoder representations from transformers (BERT) model, which provides state-of-the-art results in information extraction, can be fine-tuned for food information extraction. Methods We introduce FoodNER, which is a collection of corpus-based food named-entity recognition methods. It consists of 15 different models obtained by fine-tuning 3 pretrained BERT models on 5 groups of semantic resources: food versus nonfood entity, 2 subsets of Hansard food semantic tags, FoodOn semantic tags, and Systematized Nomenclature of Medicine Clinical Terms food semantic tags. Results All BERT models provided very promising results with 93.30% to 94.31% macro F1 scores in the task of distinguishing food versus nonfood entity, which represents the new state-of-the-art technology in food information extraction. Considering the tasks where semantic tags are predicted, all BERT models obtained very promising results once again, with their macro F1 scores ranging from 73.39% to 78.96%. Conclusions FoodNER can be used to extract and annotate food entities in 5 different tasks: food versus nonfood entities and distinguishing food entities on the level of food groups by using the closest Hansard semantic tags, the parent Hansard semantic tags, the FoodOn semantic tags, or the Systematized Nomenclature of Medicine Clinical Terms semantic tags.
Article
Background In response to growing needs for the integration of heterogeneous data on food and nutrition security (FNS), and the current fragmentation of interoperability resources, the ‘FNS-Cloud project’ aims to develop a cross-domain, interoperable ‘food-cloud’ that integrates diverse FNS data. Currently, there is insufficient guidance on how to develop such an FNS data platform and integrate a variety of FNS data types that differ in both their syntax and semantics. Scope and approach In the present paper, we propose a generalizable workflow to guide data managers in building interoperable, cross-domain FNS data platforms, which centres around the definition of interoperability criteria (IC) that capture standardized data structures, terminologies and reporting formats for key variables across FNS data types. Information technology tools for automating different workflow steps are discussed. Finally, we include an illustrative case study, where we manually harmonize and link branded food datasets based on pre-defined IC to answer an example research question. Key findings and conclusions Our work highlighted the unique harmonization requirements within the FNS field. We provide two examples of how generic and domain-specific IC addressing these requirements can be defined. Incoming FNS data must comply with defined IC in order to enable their (semi-)automated integration into a data platform. Our case study reinforced the importance of semantic annotation of FNS data, and the need for clear mapping rules to be included into a platform's internal semantic data model. The proposed workflow can be applied to any setting in which data managers strive towards harmonized and linked FNS data, and, thus, promotes an open-data and open-science environment.
Article
Full-text available
Evidence-based dietary information represented as unstructured text is a crucial information that needs to be accessed in order to help dietitians follow the new knowledge arrives daily with newly published scientific reports. Different named-entity recognition (NER) methods have been introduced previously to extract useful information from the biomedical literature. They are focused on, for example extracting gene mentions, proteins mentions, relationships between genes and proteins, chemical concepts and relationships between drugs and diseases. In this paper, we present a novel NER method, called drNER, for knowledge extraction of evidence-based dietary information. To the best of our knowledge this is the first attempt at extracting dietary concepts. DrNER is a rule-based NER that consists of two phases. The first one involves the detection and determination of the entities mention, and the second one involves the selection and extraction of the entities. We evaluate the method by using text corpora from heterogeneous sources, including text from several scientifically validated web sites and text from scientific publications. Evaluation of the method showed that drNER gives good results and can be used for knowledge extraction of evidence-based dietary recommendations.
Conference Paper
Full-text available
In order to help people to follow the new knowledge about healthy diet that comes rapidly each day with the new published scientific reports, a grammar and dictionary based named-entity linking method is presented that can be used for knowledge extraction of evidence-based dietary recommendations. The method consists of two phases. The first one is a mix of entity detection and determination of a set of candidates for each entity, and the second one is a candidate selection. We evaluate our method using a corpus from dietary recommendations presented in one sentence provided by theWorld Health Organization and the U.S. National Library of Medicine. The corpus consists of 50 dietary recommendations and 10 sentences that are not related with dietary recommendations. For 47 out of 50 dietary recommendations the proposed method extract all the useful knowledge, and for remaining 3 only the information for one entity is missing. Due to the 10 sentences that are not dietary recommendation the method does not extract any entities, as expected.
Article
Full-text available
Awareness that disease susceptibility is not only dependent on genetic make up, but can be affected by lifestyle decisions, has brought more attention to the role of diet. However, food is often treated as a black box, or the focus is limited to few, well-studied compounds, such as polyphenols, lipids and nutrients. In this work, we applied text mining and Naïve Bayes classification to assemble the knowledge space of food-phytochemical and food-disease associations, where we distinguish between disease prevention/amelioration and disease progression. We subsequently searched for frequently occurring phytochemical-disease pairs and we identified 20,654 phytochemicals from 16,102 plants associated to 1,592 human disease phenotypes. We selected colon cancer as a case study and analyzed our results in three directions; i) one stop legacy knowledge-shop for the effect of food on disease, ii) discovery of novel bioactive compounds with drug-like properties, and iii) discovery of novel health benefits from foods. This works represents a systematized approach to the association of food with health effect, and provides the phytochemical layer of information for nutritional systems biology research.
Article
Full-text available
To effectively assess the possibility of the unknown rice protein resistant to Xanthomonas oryzae pv. oryzae, a hybrid strategy is proposed to enhance gene prioritization by combining text mining technologies with a sequence-based approach. The text mining technique of term frequency inverse document frequency is used to measure the importance of distinguished terms which reflect biomedical activity in rice before candidate genes are screened and vital terms are produced. Afterwards, a built-in classifier under the chaos games representation algorithm is used to sieve the best possible candidate gene. Our experiment results show that the combination of these two methods achieves enhanced gene prioritization.
Article
Full-text available
Ingredients mentioned on the labels of commercially available packaged gluten-free and similar gluten-containing food products were analyzed and compared, using the text mining technique. A total of 324 products' labels were analyzed for content (162 from gluten-free products), and ingredient diversity in gluten-free products was 28% lower. Raw materials used as ingredients of gluten-free products were limited to five varieties: rice, cassava, corn, soy, and potato. Sugar was the most frequently mentioned ingredient on both types of products' labels. Salt and sodium also were among these ingredients. Presence of hydrocolloids, enzymes or raw materials of high nutritional content such as pseudocereals, suggested by academic studies as alternatives to improve nutritional and sensorial quality of gluten-free food products, was not identified in the present study. Nutritional quality of gluten-free diets and health of celiac patients may be compromised.
Conference Paper
Full-text available
The UCREL semantic analysis system (USAS) is a software tool for undertaking the automatic semantic analysis of English spoken and written data. This paper describes the software system, and the hierarchical semantic tag set containing 21 major discourse fields and 232 fine-grained semantic field tags. We discuss the manually constructed lexical resources on which the system relies, and the seven disambiguation methods including part-of-speech tagging, general likelihood ranking, multi-word-expression extraction, domain of discourse identification, and contextual rules. We report an evaluation of the accuracy of the system compared to a manually tagged test corpus on which the USAS software obtained a precision value of 91%. Finally, we make reference to the applications of the system in corpus linguistics, content analysis, software engineering, and electronic dictionaries.
Article
Full-text available
The term “Named Entity”, now widely used in Natural Language Processing, was coined for the Sixth Message Understanding Conference (MUC-6) (R. Grishman & Sundheim 1996). At that time, MUC was focusing on Information Extraction (IE) tasks where structured information of company activities and defense related activities is extracted from unstructured text, such as newspaper articles. In defining the task, people noticed that it is essential to recognize information units like names, including person, organization and location names, and numeric expressions including time, date, money and percent expressions. Identifying references to these entities in text was recognized as one of the important sub-tasks of IE and was called “Named Entity Recognition and Classification (NERC)”. Le terme « entité nommée », maintenant largement utilisé dans le cadre du traitement des langues naturelles, a été adopté pour la Sixth Message Understanding Conference (MUC 6) (R. Grishman et Sundheim, 1996). À cette époque, la Conférence était concentrée sur les tâches d'extraction d'information (EI), dans lesquelles l'information structurée relative aux activités des entreprises et aux activités liées à la défense sont extraites de texte non structuré, comme les articles de journaux. Au moment de définir cette tâche, on a remarqué qu'il est essentiel de reconnaître les unités d'information comme les noms (dont les noms de personnes, d'organisations et de lieux géographiques) et les expressions numériques, notamment l'expression de l'heure, de la date, des sommes monétaires et des pourcentages. On a alors conclu que l'identification des références à ces entités dans le texte était une des principales sous-tâches de l'EI et on a alors nommé cette tâche Named Entity Recognition and Classification (NERC) (reconnaissance et classification d'entités nommées).
Book
Text mining applications have experienced tremendous advances because of web 2.0 and social networking applications. Recent advances in hardware and software technology have lead to a number of unique scenarios where text mining algorithms are learned. Mining Text Data introduces an important niche in the text analytics field, and is an edited volume contributed by leading international researchers and practitioners focused on social networks & data mining. This book contains a wide swath in topics across social networks & data mining. Each chapter contains a comprehensive survey including the key research content on the topic, and the future directions of research in the field. There is a special focus on Text Embedded with Heterogeneous and Multimedia Data which makes the mining process much more challenging. A number of methods have been designed such as transfer learning and cross-lingual mining for such cases. Mining Text Data simplifies the content, so that advanced-level students, practitioners and researchers in computer science can benefit from this book. Academic and corporate libraries, as well as ACM, IEEE, and Management Science focused on information security, electronic commerce, databases, data mining, machine learning, and statistics are the primary buyers for this reference book. © 2012 Springer Science+Business Media, LLC. All rights reserved.
coreNLP: Wrappers Around Stanford CoreNLP Tools
  • T Arnold
  • L Tilton
Arnold, T. and Tilton, L. (2016). coreNLP: Wrappers Around Stanford CoreNLP Tools. R package version 0.4-2.
How allrecipes. com became the worlds largest food/recipe site. roi of social media (blog)
  • S Groves
Groves, S. (2013). How allrecipes. com became the worlds largest food/recipe site. roi of social media (blog).