Ningning Wu’s research while affiliated with University of Arkansas at Little Rock and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (14)


A Rule-Based Data Quality Assessment System for Electronic Health Record Data
  • Article
  • Full-text available

August 2020

·

416 Reads

·

36 Citations

Applied Clinical Informatics

·

·

Ningning Wu

·

[...]

·

Objective: Rule-based data quality assessment in health care facilities was explored through compilation, implementation, and evaluation of 63,397 data quality rules in a single-center case study to assess the ability of rules-based data quality assessment to identify data errors of importance to physicians and system owners. Methods: We applied a design science framework to design, demonstrate, test, and evaluate a scalable framework with which data quality rules can be managed and used in health care facilities for data quality assessment and monitoring. Results: We identified 63,397 rules partitioned into 28 logic templates. A total of 819,683 discrepancies were identified by 4.5% of the rules. Nine out of 11 participating clinical and operational leaders indicated that the rules identified data quality problems and articulated next steps that they wanted to take based on the reported information. Discussion: The combined rule template and knowledge table approach makes governance and maintenance of otherwise large rule sets manageable. Identified challenges to rule-based data quality monitoring included the lack of curated and maintained knowledge sources relevant to data error detection and lack of organizational resources to support clinical and operational leaders with investigation and characterization of data errors and pursuit of corrective and preventative actions. Limitations of our study included implementation within a single center and dependence of the results on the implemented rule set. Conclusion: This study demonstrates a scalable framework (up to 63,397 rules) with which data quality rules can be implemented and managed in health care facilities to identify data errors. The data quality problems identified at the implementation site were important enough to prompt action requests from clinical and operational leaders.

Download

SentiLDA — An Effective and Scalable Approach to Mine Opinions of Consumer Reviews by Utilizing Both Structured and Unstructured Data

September 2016

·

43 Reads

Lecture Notes in Computer Science

With the help of Internet and Web technologies, more and more consumers tend to seek opinions online before making purchase decisions. However, with the ever-increasing volume of user generated reviews, people are overwhelmed with the amount of data they have. Thus there is a great need for a system that can summarize the reviews and produce a set of aspects being mentioned in the reviews together with the pros/cons being expressed to them. To address the need, this paper proposes a new probabilistic topic model, SentiLDA, for mining reviews (unstructured data) and their ratings (structured data) jointly to detect the product/service aspects and their corresponding positive and negative opinions simultaneously. A key feature of SentiLDA is that it is capable of mining positive and negative sub-topics under the same aspect without the need of sentiment seed words. Experiment results show that the performance of SentiLDA outperforms the other related state-of-the-art models in detecting product/service aspects and their corresponding sentiments in reviews.


Figure 1. SSN Validation Tool. Source: Arkansas Department of Education
Figure 2. Arkansas enterprise data system
SSN validation analysis
Records incorrectly linked because of loose ER rules
Missing letter presented in one of the records prevent exact match ER

+7

A Case Study on Data Quality, Privacy, and Evaluating the Outcome of Entity Resolution Processes

July 2016

·

250 Reads

This paper presents ongoing research conducted through collaboration between the University of Arkansas at Little Rock and the Arkansas Department of Education to develop an entity resolution and identity management system. The process includes a multi-phase approach consisting of data-quality analysis, selection of entity-identity attributes for entity resolution, defined a rule set using the open source entity-resolution system named OYSTER and used entropy approach to identify the potential false positive and false negative. The research is the first known of its kind to evaluate privacy-enhancing, entity-resolution rule sets in a state education agency.


Probabilistic Matching Compared to Deterministic Matching for Student Enrollment Records

April 2014

·

364 Reads

·

6 Citations

This paper compares entity resolution results obtained by using both probabilistic and deterministic matching when applied to the deduplication of student enrollment data. The approach outlined in this paper uses deterministic matching to represent equivalence for the calculation of weights to be used in probabilistic matching based on the Fellegi-Sunter model.


A Case Study on Data Quality, Privacy, and Entity Resolution

January 2013

·

24 Reads

·

1 Citation

This chapter presents ongoing research conducted through collaboration between the University of Arkansas at Little Rock and the Arkansas Department of Education to develop an entity resolution and identity management system. The process includes a multi-phase approach consisting of data-quality analysis, selection of entity-identity attributes for entity resolution, development of a truth-set, and implementation and benchmarking of an entity-resolution rule set using the open source entity-resolution system named OYSTER. The research is the first known of its kind to evaluate privacy-enhancing, entityresolution rule sets in a state education agency.


Table 1 : IP definitions and functions 
Methods to Measure Importance of Data Attributes to Consumers of Information Products.

January 2010

·

487 Reads

·

1 Citation

Errors in data sources of information product (IP) manufacturing systems can degrade overall IP quality as perceived by consumers. Data defects from inputs propagate throughout the IP manufacturing process. Information Quality (IQ) research has focused on improving the quality of inputs to mitigate error propagation and ensure an IP will be fit for use by consumers. However, the feedback loop from IP consumers to IP producers is often incomplete since the overall quality of the IP is not based solely on quality of inputs but rather by the IP's fitness for use as a whole. It remains uncertain that high quality inputs directly correlate to a high quality IP. The methods proposed in this paper investigate the effects of intentionally decreasing, or disrupting, quality of inputs, measuring the consumers' evaluations as compared to an undisrupted IP, and proposes scenarios illustrating the advantage of these methods over traditional survey methods. Fitness for use may then be increased using those attributes deemed 'important' by consumers in future IP revisions.


A Case Study in Partial Parsing Unstructured Text

May 2008

·

137 Reads

·

7 Citations

This paper presents a parsing method for the entity extraction from open source documents. A Web page of interest is first downloaded to a text file. The method then applies a set of patterns to the text file to extract interesting entity fragments. The patterns are currently particularly designed for obituary announcements. With the extracted entities, the next step is to identify these entities before they are populated into a database. An entity resolution process is presented to determine the actual identities. A case study is illustrated with the method and the results are presented also. Although the results show that the method is not technically effective and promising, the research results do help understand how well or bad a quick parsing technique extracts entities of interest from obituaries on the Web. More effective techniques should be further considered to improve the extraction results.


Building a network testbed for internet security research

January 2008

·

18 Reads

·

3 Citations

Journal of Computing Sciences in Colleges

Historical research of intrusion, anomaly, or rogue software detection, and network protection techniques to prevent Denial of Service, or other malicious software attacks, have involved antiquated singularly scoped techniques. Malicious software attacks, whether in the form of worms, spyware, malware, or computer viruses, have economically and productively impacted the state of information exchange throughout the interconnected world. The ability to proactively identify the threats or unauthorized activity that contradicts day-today activities, will allow initiation of defenses before a full threat infestation occurs. This paper describes a test networked system that has been built for our research projects involving Internet worm detection. The goal of the system is to simulate a global network containing heterogeneous systems; in turn, we may study the behaviors of various worms and to design effective strategies for predicting, detecting, and quarantining outbreaks.



Figure 1: Single-Reference, Attribute Matching
Table 3). 
Entity Identification in Documents Expressing Shared Relationships

January 2007

·

337 Reads

·

2 Citations

This paper addresses the problem of entity identification in documents in which key identity attributes are missing. The most common approach is to take a single entity reference and determine the "best match" of its attributes to a set of candidate identities selected from an appropriate entity catalog. This paper describes a new technique of multiple-reference, shared-relationship identity resolution that can be employed when a document references several entities that share a specific relationship, a situation that often occurs in published documents. It also describes the results obtained from a recent test of the multiple-reference, shared-relationship identity resolution technique applied to obituary notices. The preliminary results show that the multiple-reference technique can provide higher quality identification results than single-reference matching in cases where a shared relationship is asserted.


Citations (10)


... Furthermore, data collected from wearable devices may not always be accurate or reliable, particularly if patients do not use the devices consistently or if the devices malfunction (Z. Wang, Talburt, Wu, Dagtas, & Zozus, 2020) [46] . A significant barrier to the effective use of health data analytics in chronic disease management is the lack of standardized data formats and protocols. ...

Reference:

A Conceptual Framework for Integrating Health Data Analytics into Chronic Disease Management: Improving Patient Outcomes
A Rule-Based Data Quality Assessment System for Electronic Health Record Data

Applied Clinical Informatics

... For example, if a company introduces a new product, the policies define who is responsible for creating the new entry in the master product registry, the standards for creating the product identifier, what persons or department should be notified, and which other data systems should be updated. Compliance to regulation along with the privacy and security of information are also important policy issues (Decker, Liu, Talburt, Wang, & Wu, 2013). ...

A Case Study on Data Quality, Privacy, and Entity Resolution
  • Citing Article
  • January 2013

... The scoring rule setup requires more effort and analysis than using the OYSTER identity (Boolean) rules configuration. The scoring rule is similar to the Boolean rule which specifies a similarity function and optional data preparation function for comparing the values of identity attributes between two entity references [2], [3]. Unlike the Boolean rules in OYSTER, which uses deterministic logic for the similarity matching, the decision in the scoring rule is associated with a numerical value called a "weight" [6]. ...

Probabilistic Matching Compared to Deterministic Matching for Student Enrollment Records

... Talburt et al. [50] in their formal problem formulation described an entity identification process of determining the "best match" of a single identity fragment 1 extracted from unstructured documents from among a set of possible candidates in entity catalogs. Two of the limitations of this method are the catalog may be replete with many similar identity choices and the catalog may be incomplete, hence the correct identity is not among the choices. ...

Entity Identification in Documents Expressing Shared Relationships

... Indeed, it treats strict comparison between terms and considers the text as a sequence of lines. The task of disambiguation in the case of more than one record returned for an entity referenced in the document has been treated by (Wu et al., 2007) that exploited relationships between candidate entities for different fragments (a fragment is defined as entity features extracted from the document using seman- tic patterns) of the same document for identifying the identity of fragments. In contrast, our approach does not remedy ambiguity cases but it avoids them from the beginning. ...

A method for entity identification in open source documents with partially redacted attributes

Journal of Computing Sciences in Colleges

... Education mode reformation must be adapted to the transference from single word to comprehensive sense of participation, from linear knowledge layout to the mesh nonlinear layout, from the static and dynamic to an integrated and many dimensions, from paying attention to students' external behaviors to focus attention on the students' inner experience and learning strategy training [2] . This essay hopes that using Cloud Computing and new mobile technology, integrated with teaching resources and digital resources, and going through interactive teaching network platform to offer service [3] . ...

Building a network testbed for internet security research
  • Citing Article
  • January 2008

Journal of Computing Sciences in Colleges

... The research into the use of open source documents for resolving, identifying, disambiguating and/or updating entities attributes in entity catalogs or proprietary repositories has led to much of the research described in this paper [46][47] [48] [49][50] [51]. In order for organizations to effectively take advantage of the abundant information available online using their current data and analytical tools, the unstructured information must be transformed in to a structure in which the entities of interests within the text can be represented using relational database systems. ...

Entity Identification Using Indexed Entity Catalogs.

... Understanding which attributes are most important for any given application is a key step in any data management process (Heien, Wu, & Talburt, 2010). In the context of MDM, each entity type has a set of attributes taking on different values that describe the entities' characteristics. ...

Methods to Measure Importance of Data Attributes to Consumers of Information Products.

... It is the process of extracting information from different sources, recombining them to identify patterns and deriving information from such digital sources (Grobelnik et al., 2002). Text mining applications rise to support extraction and interpretation of unstructured text format, which in its present form, does not make a suitable input to automatic processing tasks such as information retrieval, document indexing, clustering and text classification (Chiang et al., 2008). Text mining is an intuitive choice of technology particularly in the research and educational field due to its ability to discover new hidden relationships from complex and voluminous body of knowledge published in the literature (Grobelnik et al., 2002), whether in related or non-related field of research. ...

A Case Study in Partial Parsing Unstructured Text