Ningning Wu's research while affiliated with University of Arkansas at Little Rock and other places

Publications (13)

Article
Full-text available
Objective: Rule-based data quality assessment in health care facilities was explored through compilation, implementation, and evaluation of 63,397 data quality rules in a single-center case study to assess the ability of rules-based data quality assessment to identify data errors of importance to physicians and system owners. Methods: We applied...
Conference Paper
With the help of Internet and Web technologies, more and more consumers tend to seek opinions online before making purchase decisions. However, with the ever-increasing volume of user generated reviews, people are overwhelmed with the amount of data they have. Thus there is a great need for a system that can summarize the reviews and produce a set...
Article
Full-text available
This paper presents ongoing research conducted through collaboration between the University of Arkansas at Little Rock and the Arkansas Department of Education to develop an entity resolution and identity management system. The process includes a multi-phase approach consisting of data-quality analysis, selection of entity-identity attributes for e...
Conference Paper
Full-text available
This paper compares entity resolution results obtained by using both probabilistic and deterministic matching when applied to the deduplication of student enrollment data. The approach outlined in this paper uses deterministic matching to represent equivalence for the calculation of weights to be used in probabilistic matching based on the Fellegi-...
Conference Paper
Full-text available
Errors in data sources of information product (IP) manufacturing systems can degrade overall IP quality as perceived by consumers. Data defects from inputs propagate throughout the IP manufacturing process. Information Quality (IQ) research has focused on improving the quality of inputs to mitigate error propagation and ensure an IP will be fit for...
Conference Paper
Full-text available
This paper presents a parsing method for the entity extraction from open source documents. A Web page of interest is first downloaded to a text file. The method then applies a set of patterns to the text file to extract interesting entity fragments. The patterns are currently particularly designed for obituary announcements. With the extracted enti...
Article
Historical research of intrusion, anomaly, or rogue software detection, and network protection techniques to prevent Denial of Service, or other malicious software attacks, have involved antiquated singularly scoped techniques. Malicious software attacks, whether in the form of worms, spyware, malware, or computer viruses, have economically and pro...
Article
Full-text available
This paper addresses the problem of entity identification in documents in which key identity attributes are missing. The most common approach is to take a single entity reference and determine the "best match" of its attributes to a set of candidate identities selected from an appropriate entity catalog. This paper describes a new technique of mult...
Article
Full-text available
The paper introduces an on-going research project of entity identification in open (publicly available) source documents where part of the identifying attributes have been redacted. The project proof-of-concept focuses on published obituary notices as the target source, and the decedent and other individuals listed in the notice as the identities t...
Article
This paper describes a proposed master's degree program in the new field of Information Quality, the reasons behind it, and how academe and industry have collaborated to create it.

Citations

... We use NLP and standardized techniques, which greatly reduces the burden of manual inspection (22,23). To the best of our knowledge, there have been a few of rule-based quality control studies of medical record, but AI research on the use of medical guidelines for disease quality control is still relatively rare (24). However, some systems or frameworks based on artificial intelligence and knowledge bases have been reported to help doctors make certain clinical decisions (11,25). ...
... The scoring rule setup requires more effort and analysis than using the OYSTER identity (Boolean) rules configuration. The scoring rule is similar to the Boolean rule which specifies a similarity function and optional data preparation function for comparing the values of identity attributes between two entity references [2], [3]. Unlike the Boolean rules in OYSTER, which uses deterministic logic for the similarity matching, the decision in the scoring rule is associated with a numerical value called a "weight" [6]. ...
... Talburt et al. [50] in their formal problem formulation described an entity identification process of determining the "best match" of a single identity fragment 1 extracted from unstructured documents from among a set of possible candidates in entity catalogs. Two of the limitations of this method are the catalog may be replete with many similar identity choices and the catalog may be incomplete, hence the correct identity is not among the choices. ...
... Indeed, it treats strict comparison between terms and considers the text as a sequence of lines. The task of disambiguation in the case of more than one record returned for an entity referenced in the document has been treated by (Wu et al., 2007) that exploited relationships between candidate entities for different fragments (a fragment is defined as entity features extracted from the document using seman- tic patterns) of the same document for identifying the identity of fragments. In contrast, our approach does not remedy ambiguity cases but it avoids them from the beginning. ...
... Education mode reformation must be adapted to the transference from single word to comprehensive sense of participation, from linear knowledge layout to the mesh nonlinear layout, from the static and dynamic to an integrated and many dimensions, from paying attention to students' external behaviors to focus attention on the students' inner experience and learning strategy training [2] . This essay hopes that using Cloud Computing and new mobile technology, integrated with teaching resources and digital resources, and going through interactive teaching network platform to offer service [3] . ...
... The research into the use of open source documents for resolving, identifying, disambiguating and/or updating entities attributes in entity catalogs or proprietary repositories has led to much of the research described in this paper [46][47] [48] [49][50] [51]. In order for organizations to effectively take advantage of the abundant information available online using their current data and analytical tools, the unstructured information must be transformed in to a structure in which the entities of interests within the text can be represented using relational database systems. ...
... Understanding which attributes are most important for any given application is a key step in any data management process (Heien, Wu, & Talburt, 2010). In the context of MDM, each entity type has a set of attributes taking on different values that describe the entities' characteristics. ...
... It is the process of extracting information from different sources, recombining them to identify patterns and deriving information from such digital sources (Grobelnik et al., 2002). Text mining applications rise to support extraction and interpretation of unstructured text format, which in its present form, does not make a suitable input to automatic processing tasks such as information retrieval, document indexing, clustering and text classification (Chiang et al., 2008). Text mining is an intuitive choice of technology particularly in the research and educational field due to its ability to discover new hidden relationships from complex and voluminous body of knowledge published in the literature (Grobelnik et al., 2002), whether in related or non-related field of research. ...