Content uploaded by SMFD Syed Mustapha
Author content
All content in this area was uploaded by SMFD Syed Mustapha on Oct 11, 2016
Content may be subject to copyright.
Abstract
Intelligent Conversational Channel (ICC) is a community
channel developed to facilitate knowledge sharing
activities by utilizing multiple agents to create virtual
community. The concept of social knowledge is
emphasized as the major source of knowledge to be
shared among communities. The social knowledge is
extracted not only from the community talks but also from
scientific documents. We show the architecture of the
system to do this and methodology for social knowledge
extraction. Agents are used as the communicator to the
community members to deliver the knowledge in a
conversational manner.
1. Introduction
Numerous research works have been done in the field
of document understanding that ranges from
understanding the document layout, retrieving the
document for classification, summarizing the document,
tracking the stories in the document [1,2 3]. Many of these
efforts have been looking at the formal content of the
document. Another side about document understanding
that concerns our research is looking at the social content
of the document. It has been believed that documents
creates and generates social relationships among the users
and those who created the documents [5]. The social
content of a document describes the knowledge about
xwho are the authors;
xwhat are the related works;
xwhat are their research interests;
xwho are the other researchers that work within similar
research field
xwhat are the supporting publications regarding this
topics
These types of social contents are also called the social
knowledge of the document (at present, we only focus on
the scientific paper in which the format is well-structured).
We believe this knowledge could be the preliminary
inquisition of researchers who newly embark on certain
areas of research. It is proposed in this project that this
knowledge can be represented in a conversational manner
through the communication with the agents. There are
three major steps that are involved in order to perform this
task. First, the social knowledge has to be extracted from
the documents. Secondly, the knowledge has to be
represented such that it can be retrieved based on the
query posted by the user. Thirdly, the represented
knowledge can be processed into the conversational
knowledge such that it can be communicated by the
agents.
We have developed a system so called Intelligent
Conversational Channel (ICC) that supports the
community learning [6]. In this scope of the project, the
community is defined as the research communities who
have common research inquisitions. This helps us to
narrow the scope of inquisitive questions that possibly
imposed by the users as well as the types of social
knowledge that are needed by the users. There are three
major components of the system which are the discourse
communicator, hypermedia learning space and discourse
analyzer which are all working around the community
channel. This system has been previously described for
different applications [5], but in this paper we are
concerning about the architecture that supports that
operation of the discourse communicator and hypermedia
learning space. We demonstrate in this paper how the
research communities can share their knowledge through
building their own social knowledge collectively, storing
them in the hypermedia learning space and having them to
be accessed by the discourse communicator.
In the following sections, given an applicative view of
the Intelligent Conversational Channel (ICC) in Section
2, the architecture for extracting the social knowledge in
Extracting Social Knowledge in the Intelligent Conversational Channel for
Agent Communication
S.M.F.D Syed Mustapha
Department of Computer Science, Dhofar University, Salalah, OMAN
smfdsm@hotmail.com
Proceedings of the 19th International Conference on Advanced Information Networking and Applications (AINA’05)
1550-445X/05 $20.00 © 2005 IEEE
Section 3, the knowledge representation for knowledge
retrieval and knowledge conversant in Section 4 and the
conclusion in Section 5.
2. Intelligent Conversational Channel
(ICC) – an applicative view
2.1. Forming knowledge body through
community effort
In ICC, there are two types of interaction which are -
with the members of the ICC system and also with the
agents. In the former, the members can communicate in
the asynchronous or synchronous mode. The posted
message will receive immediate responses on the
availability of the other members staying online. In the
latter, agents are the standby virtual members who will
be able to give immediate responses to the posted
messages or queries. The knowledge-building process
that can be adopted in ICC is explained in [7]. This
approach allows a rapid knowledge building as well as
the community members have full autonomy in the
knowledge shaping. The members can express their ideas
in the natural language that is placed in a text box. Along
with the text, supporting learning objects can be
associated with it to enhance understanding among other
members who are accessing it. The initial knowledge
piece which is inserted into the text box can be shared
together with subsequent argumentative messages
inserted by other members that eventually form a
complete knowledge body.
2.2. Social knowledge vs. content knowledge
The ICC system supports the community learning
mainly on the social knowledge. The content knowledge
describes about the know-how technical knowledge that
can be learnt from textbooks, lectures, seminars or
formal learning institution. The social knowledge is
derivable from socializing oneself with the peers,
communities through formal or informal discussion,
chitchat or social gatherings.
We argue that generally all documents contain the
social values which can be represented as knowledge to
be shared. Documents are authored by individual writers
or agencies that intent to bring forward some issues
which can be interpreted as politically motivated,
creating community awareness or fostering public
sympathy. Some of these issues can only be traced from
a series of sequel publications which requires a complex
intelligent mechanism to process them. The current
technology in natural language processing is still far
behind this capability such that in our work, we narrow
the scope to scientific articles that have standard writing.
We define as the social knowledge that can be extracted
from the scientific articles the following:
1. Name of authors and their affiliations such as the
research institution and research group/individual
homepage of the global researcher of certain
research field.
2. Prominence of the researcher. For example, the
researcher’s work is frequently cited by other
authors.
3. Other related publications written by the
researchers elsewhere.
4. Other prominent researchers besides the one stated
in 2.
5. Other researchers who are working on some related
work and the systems that they built.
At present, the inquisition of the social knowledge is
limited to the scope listed above. The content knowledge
is omitted here as the process to extract such knowledge
is more complicated. The content knowledge to be
extracted from a scientific paper could be the research
findings or results of the experiment, the analysis and the
experimental model being discussed.
2.3. Conversing social knowledge with virtual
agents
The purpose of the virtual agents is to simulate the
conversation among the community member in a
synchronous mode. Previous work had shown how the
agents converse the knowledge which have been
previously stored in the community memory [8]. This
work can be enhanced by allowing the agents to access
the social knowledge which is extracted from the
documents as discussed in the previous section. The
sources of the social knowledge are derivable from the
social dialogue among the members as well as the
automated knowledge extraction of the documents. We
demonstrate the following conversation between the
community member (CM) and the agent (ICC agent). We
indicate the source of knowledge where the response is
generated.
CM: Can you tell me, who are the main researchers in
the field of qualitative reasoning
Proceedings of the 19th International Conference on Advanced Information Networking and Applications (AINA’05)
1550-445X/05 $20.00 © 2005 IEEE
ICC agent: Ken Forbus who was working on Qualitative
Process Theory and Ben Kuipers who was working on
qualitative simulation.
(Knowledge source: multiple documents)
CM: What kind of system that Ken Forbus developed
using the QPT theory
ICC agent: Ken Forbus has developed CyclePad,
simulated software for learning the thermodynamic
theory.
(Knowledge source: social dialogue)
The two passages are different in terms of the
knowledge sources. The knowledge from the first
passage can be obtained through multiple documents
available on the server. The system examines the number
of times the work of the researcher being cited by other
authors in comparison to work being cited by the other
authors. The knowledge from the second passage is more
difficult to be extracted solely from the articles.
However, the system can search from the community
memory the related keywords and retrieve the text which
contains the highest number of matching keywords and
the lowest number of unmatching keywords. We show
these steps in more detail in the later section.
3. The architecture for extracting social
knowledge
The architecture is designed to be able to access
knowledge which is kept in two forms namely,
structured knowledge in the paper organization and the
social talk kept in the community memory. Both of them
require different types of knowledge representation
which will be discussed in detail in Section 4. The
architecture as shown in Figure 1 begins with the
member’s query on the leftmost of the architectural
diagram. The query can be posted in the form of question
or general statement in which both will return relevant
responses.
The final output of the architecture is the dialogue
activation for the agent communication. That means, the
overall system design takes a colloquial format as in the
query and transform it into structured representation and
re-transform the structured knowledge into colloquial
form which is in a dialogical form. There are six main
components in the architecture in extracting the social
knowledge of the scientific documents.
The Query Pre-processor simplifies the query into
salient keywords by performing some stemming steps
such as functional words removal and punctuation
symbol removals. For example, given the following
query posted by the community member,
“Who are the people working in building architecture
for agent communication?”
will be simplified into the following set of keywords,
QP = {who, people, working, building, architecture,
agent, communication}.
The Query-type Identifier determines the type of
query which can be either
xClass I – query which can only be responded through
the knowledge extracted from the scientific documents
xClass II – query which can only be responded
through the knowledge extracted from the community
Query Query Pre-
processor
(QP)
Query-type
Identifier
(Q-tI)
Word-based
matching
Query
Database
Extracting
knowledge
from scientific
documents
Extracting
knowledge from
community
memory
Answer
builder
Transformation
from structured
knowledge to
conversational
knowledge
Agent
Communicatio
n Activation
Figure 1 Architecture for extracting social
knowled
g
e
Proceedings of the 19th International Conference on Advanced Information Networking and Applications (AINA’05)
1550-445X/05 $20.00 © 2005 IEEE
memory
xClass III – query which can be responded by
extracting knowledge from either scientific document or
community memory; or both.
The query that can not be identified its type requires
human intervention to determine its type. If this occurs,
the system administrator has to step in deciding its type
and the query will be kept in the Query Database for
future use.
Depending on the class type, the system will decide
the path towards extracting the required knowledge. The
reason for this difference is that not all queries can be
answered by both classes of knowledge. In the following
we give two examples in which the Example 1 is a Class
I and Example II is the Class II.
Example I,
“Who are the prominent researchers in the field of
case-based reasoning?”
This query has a straightforward answer from the
structured knowledge which keeps the number of authors
citing the person’s work (this is based on the assumption
on usual practice that the prominent researcher has
his/her work widely cited).
Example II,
“Why do you think the expert system has failed to
predominate as the core technology in automation
industry?”
This query has less structured answer since the
question is open-ended. This type of question requires
the system to search the most relevant topic that has been
discussed earlier in the community channel.
Nevertheless, we do accept some arguments that in the
Example I, the query can also be addressed by the
knowledge that may be available in the community
memory. If this is the case, the question is in the Class
III. That means, the system will first search for the
structured knowledge, if knowledge is not found, the
searching will be shifted to the community memory.
Community memory is less structured which makes it
possible to store any form of community knowledge.
The answer builder organizes the answer by selecting
the required fields that are most relevant to the query. If
Example I is taken as an example, the field in the
structured representation that has the computable value
of the ratio between the number of times the paper is
cited against the number of papers being considered in
the similar research topic. The answer builder only
prepares the raw form of an answer which has to be
transformed into a dialogical representation as in the
second last component of the architecture. Therefore, if
the answer builder generate a name such as “Janet
Kolodner” as the answer to the query, the raw form of
the name will be transformed as the following,
“The prominent person in the CBR research area is
Janet Kolodner”.
Finally, an agent is activated in the computer screen to
response to the query using text to speech technology
[8].
4. Knowledge representation for knowledge
retrieval and knowledge conversant
This section illustrates two types of knowledge
representation for structured knowledge extracted from
the scientific documents and unstructured knowledge for
the community memory.
4.1. Conversing knowledge from structured
knowledge in the scientific documents
In the scientific documents we focus three major
searching points which are the Title of Paper (ToP),
Author Information (AI) and Research Area (RA). The
title of the paper, (ToP), is a structured knowledge which
contains the information which is important to be
processed as a social knowledge. The information for the
ToP is extracted entirely from the articles available in the
server. Each article has its own ToP information. The
information being extracted from the scientific paper is
the title, the name of the author, the affiliation, the
related work and the list of references. The related work
is very essential for the system to associate the work
described in the paper with other researchers. AI
information has the author’s name, the affiliation, the
URL address for his personal/group web page and the
present/past projects. Some of the information, however,
is not necessarily available on some researcher’s web
page for example the past projects or the URL address
itself. RA describes the research area such as mobile
network (MN), information processing (IP), neural
network (NN), genetic algorithm (GA) and others. The
structured knowledge can be represented in three-
dimensional setting as shown in Figure 2.
The three searching points are connected in a
bidirectional way. This allows searching to be made in a
forward or backward manner regardless of the entry
point of the query. Each article is associated with the
information about the author and also the research area
Proceedings of the 19th International Conference on Advanced Information Networking and Applications (AINA’05)
1550-445X/05 $20.00 © 2005 IEEE
where it can be obtained from the list of keywords
usually found in the paper.
Similarly, the authors are associated with the research
interest that usually listed in their personal web pages.
The knowledge is represented in three-dimensional setup
that allows the flexibility in searching the relevant
knowledge. Followings are the examples of a query for
different searching point.
Example 1,
“Who are the people working in the area of building
architecture for agent communication?”
The system will search for the articles that have some of
the keywords in the query appear in the title. They could
be more than one article being recalled. Since the
keyword “people” is in the query, the system will
interpret that the member requests for the names of
researchers associated to the title. From the top layers
(articles), the searching moves downward to the middle
layer (authors) where the respective author profiles are
retrieved.
Example 2,
“Who are the researchers in the field of Genetic
Algorithm and what papers do they publish”
In this example, the searching point begins at the bottom
layer (research area). Genetic Algorithm” is determined
as the research area and the searching path move
upwards to “Author 2” who has two publications,
“Article 1” and “Article 4”. The research area is checked
against the keywords to select the right article in which
in this case, “Article 4” will be chosen. The system can
generate response in a natural language the name of the
researcher and the title of the article.
4.2. Dynamic Associative Memory as the
knowledge representation for the ill-structured
knowledge
The knowledge asserted into the ICC through the
community channel is ill-structured since the text is not
fully formatted. In order to extract the semantic
component of the sentence, a robust natural language
processing with a complex ontology is required. The
approach to ontology is more suitable for domain-
specific problem but in our case, the scope of the topic is
not predetermined and therefore the type of vocabularies
needed can be uncertain. Another challenging issue
when dealing with an open topic is that the meaning of
the keywords changes in different subject of discourse.
For example, the word “stroke” may appear in meaning
as cerebral accident in medical subject or movement of
upper torso and arm to strike a ball in golfing subject.
Dynamic Associative Memory builds the network of
word clusters from the natural text resided in the
community memory. The topology of the network
changes dynamically as it is determined by the factors
that reflects the closeness of the subject of the discourse
to the query. The community memory is made up of
several story objects created by the community members.
Each story object contains a set of discourse on different
subtopics within the community main topic. Therefore,
the positions of the story objects are not in the order of
their closeness to the query. When the query is posted to
the system, the query will be reduced into several
keywords (i.e. the functional words will be removed).
The story objects have to be evaluated and the network
will be constructed based on two major factors
xthe query keyword distribution – the story object
that has all query keywords appeared in it will be
considered as very relevant to the topic. The
keywords should have equal number of
appearance from each other such that some
keywords do not dominate the others in order to
Title
Author 2
Author 4
Keywords
Related work
Reference:
Title
Author 1
Keywords
Related work
Reference:
Title
Author 1
Author 3
Keywords
Related work
Reference:
Title
Author 2
Keywords
Related work
Reference:
Article 1 Article 2 Article 3 Article 4
Name
Affiliation
Project
URL
Name
Affiliation
Project
URL
Name
Affiliation
Project
URL
Name
Affiliation
Project
URL
Author 1 Author 2 Author 3 Author
MN IP
N
N GA
Figure 2. Knowledge representation for structured
social knowledge
Proceedings of the 19th International Conference on Advanced Information Networking and Applications (AINA’05)
1550-445X/05 $20.00 © 2005 IEEE
avoid prejudice in selection. The standard
deviation (SD) is used to determine whether the
story object has equal-distributed or biased-
distributed keywords. The story object that has
the lowest SD value will be categorized as equal-
distributed.
xthe average value (known as
A
) of the story
objects – another aspect about the story object is
the total number of keywords appear. Using the
keyword distribution factor alone does not give
good comparison among the story objects that
have the highest keyword frequencies. The
average,
A
, gives the ratio of the total number of
keyword frequencies in the story objects against
the number of keywords. The story object that has
the high average value will be considered to have
more content about the query.
Based on both factors the network topology can be
constructed as shown in Figure 3. The solid circle is the
query keywords and the rectangular boxes represent the
words in the story boxes that match the query keywords.
The solid lines for each rectangular box have different
length which represents the distance between. The story
box that has the lowest SD value and average value will
be represented with the shortest length. The length, L,
can be calculated as follows:
L = (SD / A) where
SD - standard deviation
A - average
Therefore, the length solid line is drawn short when
the SD is the lowest and A is the largest. The story object
that is the closest to the query will be visited first and the
content of the discourse will be submitted to the agents.
Several agents will be activated to represent the
conversation taken place in the discourse. If the user did
not post another query, the system will continue to visit
the story object which is the second shortest in length
and this continue until all story objects have been visited.
5. Conclusion
Social knowledge is emphasized in this work as the main
component in the community knowledge sharing. The
ICC facilitates the channel for the communities
augmented with the agents that extract the knowledge
from the documents as well as from the social talk stored
in the system. However, many incoming challenges
refinement work before making the system into a full-
fledged workable system need to be done when dealing
with documents with idiosyncratic varieties in the way it
is written.
6. References
[1] M. Aiello, C. Monz, L. Todoran, M. Worring. Document
understanding for a broad class of documents.
International Journal on Document Analysis and
Recognition, Springer-Verlag, 2002, 5: 1 – 16.
[2] J. Allan, J.G. Carbonell, G. Doddington, J. Yamron, Y.
Yang. Topic detection and tracking pilot study final
report, Proceedings of the Broadcast News Transcription
and Understanding Workshop (Sponsored by Darpa), Feb
1998.
[3] J. Mostafa and W. Lam. Automatic classification using
supervised learning in a medical document filtering
application. Information processing and management, 36,
2000, pp 415 – 444.
[4] J.S. Brown, P. Duguid, The Social Life of Information,
(February 2000) Harvard Business School Press, 2000.
[5] S.M.F.D Syed Mustapha. Theoretical optic on enabling
PBL activities for large group through intelligent
conversational channel. 5th Asia-Pacific Conference on
Problem-based Learning, 2004.
[6] S.M.F.D Syed Mustapha Intelligent Conversational
Channel for Learning Social Knowledge among
Communities. 8th International Conference on Knowledge-
based Intelligent Information & Engineering Systems,
KES 04 in Lecture Notes of Computer Science, New
Zealand, 2004, pp343.
[7] S.M.F.D Syed Mustapha. Knowledge construction
technology through hyper-based media community
channel. 2nd International Conference of Artificial
Intelligence in Engineering and Technology, 2004.
[8] S.M.F.D Syed Mustapha. An algorithm for avoiding
paradoxical arguments among the multi-agent in the
discourse communicator, 8th International Conference on
Knowledge-based Intelligent Information & Engineering
Systems, KES 04 in Lecture Notes of Computer Science,
New Zealand, 2004, pp350.
Figure 3. Network topology based on
keyword distribution and average value
Story Object
Proceedings of the 19th International Conference on Advanced Information Networking and Applications (AINA’05)
1550-445X/05 $20.00 © 2005 IEEE