Ziqi Zhang

Ziqi Zhang
The University of Sheffield | Sheffield · Information School

PhD, MSc, BSc

About

68
Publications
32,412
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,799
Citations
Citations since 2017
24 Research Items
1404 Citations
2017201820192020202120222023050100150200250300
2017201820192020202120222023050100150200250300
2017201820192020202120222023050100150200250300
2017201820192020202120222023050100150200250300
Introduction
I am currently a lecturer in the Information School, University of Sheffield, UK. Between September 2016 and 2018 I was a computer science lecturer at the Computing and Technology Department, Nottingham Trent University. And before that, I was a researcher in the OAK research lab, University of Sheffield. My research addresses methods that enable machines to extract human knowledge from text, to represent such knowledge in a structured representation that is understandable and usable by machines. This ultimately enhances our capability of processing and sense-making of very large scale data, improving decision making. Specifically, this include but is not limited to: text mining, information extraction, natural language processing, semantic Web and Linked Data, and social media analytics.
Additional affiliations
January 2018 - present
The University of Sheffield
Position
  • Lecturer
September 2016 - January 2018
Nottingham Trent University
Position
  • Lecturer
September 2006 - September 2016
The University of Sheffield
Position
  • Researcher

Publications

Publications (68)
Article
Full-text available
Background: Patients with diabetes may experience different needs according to their diabetes stage. These needs may be met via online health communities in which individuals seek health-related information and exchange different types of social support. Understanding the social support categories that may be more important for different diabetes s...
Article
Full-text available
The Linked Open Data practice has led to a significant growth of structured data on the Web. While this has created an unprecedented opportunity for research in the field of Natural Language Processing, there is a lack of systematic studies on how such data can be used to support downstream NLP tasks. This work focuses on the e-commerce domain and...
Article
Full-text available
Data-driven approaches to urban flooding management require a comprehensive understanding of how heterogenous data are leveraged in tackling this problem. In this paper, we conduct an integrative review of related studies, and this is structured based on two angles: tasks and data. From the selected 69 articles on this topic, diverse tasks in tackl...
Article
Toxic comment classification models are often found biased towards identity terms, i.e., terms characterizing a specific group of people such as “Muslim” and “black”. Such bias is commonly reflected in false positive predictions, i.e., non-toxic comments with identity terms. In this work, we propose a novel approach to debias the model in toxic com...
Preprint
Toxic comment classification models are often found biased toward identity terms which are terms characterizing a specific group of people such as "Muslim" and "black". Such bias is commonly reflected in false-positive predictions, i.e. non-toxic comments with identity terms. In this work, we propose a novel approach to tackle such bias in toxic co...
Preprint
Full-text available
The Linked Open Data practice has led to a significant growth of structured data on the Web in the last decade. Such structured data describe real-world entities in a machine-readable way, and have created an unprecedented opportunity for research in the field of Natural Language Processing. However, there is a lack of studies on how such data can...
Article
Full-text available
Previous studies of research methods in LIS lack consensus in how to define or classify research methods, and there have been no studies on automated recognition of research methods in the scientific literature of this field. This work begins to fill these gaps by studying how the scope of ‘research methods’ in LIS has evolved, and the challenges i...
Article
Full-text available
Purpose The purpose of this work is to study how different stakeholders of a football club engage with interactions online through Twitter. It analyses the football club’s Twitter network to discover influential actors and the topic of interest in their online communication. Design/methodology/approach The authors analysed the social networks deri...
Chapter
Full-text available
Markup languages such as RDFa and Microdata have been widely used by e-shops to embed structured product data, as evidence has shown that they improve click-through rates for e-shops and potentially increases their sales. While e-shops often embed certain categorisation information in their product data in order to improve their products’ visibilit...
Article
Full-text available
Objectives To compare information sharing of over 379 health conditions on Twitter to uncover trends and patterns of online user activities. Methods We collected 1.5 million tweets generated by over 450,000 Twitter users for 379 health conditions, each of which was quantified using a multivariate model describing engagement, user and content aspec...
Article
Full-text available
In the age of Internet of Things (IoT), online data has witnessed significant growth in terms of volume and diversity, and research into information retrieval has become one of the important research themes in the Internet oriented data science research. In information retrieval, machine-learning techniques have been widely adopted to automate the...
Article
Full-text available
In recent years, the increasing propagation of hate speech on social media and the urgent need for effective counter-measures have drawn significant investment from governments, companies, and researchers. A large number of methods have been developed for automated hate speech detection online. This aims to classify textual content into non-hate or...
Chapter
Full-text available
The increasing presence of hate speech on social media has drawn significant investment from governments, companies, and empirical research. Existing methods typically use a supervised text classification approach that depends on carefully engineered features. However, it is unclear if these features contribute equally to the performance of such me...
Conference Paper
Full-text available
The increasing presence of hate speech on social media has drawn significant investment from governments, companies, and empirical research. Existing methods typically use a supervised text classification approach that depends on carefully engineered features. However, it is unclear if these features contribute equally to the performance of such me...
Chapter
Full-text available
In recent years, the increasing propagation of hate speech on social media and the urgent need for effective counter-measures have drawn significant investment from governments, companies, and empirical research. Despite a large number of emerging scientific studies to address the problem, a major limitation of existing work is the lack of comparat...
Preprint
Full-text available
The increasing presence of hate speech on social media has drawn significant investment from governments, companies, and empirical research. Existing methods typically use a supervised text classification approach that depends on carefully engineered features. However, it is unclear if these features contribute equally to the performance of such me...
Article
Full-text available
Automatic Term Extraction deals with the extraction of terminology from a domain specific corpus, and has long been an established research area in data and knowledge acquisition. ATE remains a challenging task as it is known that there is no existing ATE methods that can consistently outperform others in any domain. This work adopts a refreshed pe...
Preprint
Full-text available
Automatic Term Extraction is a fundamental Natural Language Processing task often used in many knowledge acquisition processes. It is a challenging NLP task due to its high domain dependence: no existing methods can consistently outperform others in all domains, and good ATE is very much an unsolved problem. We propose a generic method for improvin...
Conference Paper
Full-text available
ScholarlyData is the new and currently the largest reference linked dataset of the Semantic Web community about papers, people, organisations, and events related to its academic conferences. Originally started from the Semantic Web Dog Food (SWDF), it addressed multiple issues on data representation and maintenance by (i) adopting a novel data mode...
Article
Full-text available
This article addresses a number of limitations of state-of-the-art methods of Ontology Alignment: 1) they primarily address concepts and entities while relations are less well-studied; 2) many build on the assumption of the 'well-formedness' of ontologies which is unnecessarily true in the domain of Linked Open Data; 3) few have looked at schema he...
Article
Full-text available
http://www.semantic-web-journal.net/content/effective-and-efficient-semantic-table-interpretation-using-tableminer-0
Conference Paper
Full-text available
This paper describes the LODIE team (from the OAK lab of the University of Sheffield) participation at TAC-KBP 2015 for the Entity Discovery task in the Cold Start KBP track. We have taken a cross-document coreference resolution approach that starts with Named Entity Recognition to locate and classify mentions of named entities, followed by a clust...
Conference Paper
Full-text available
Automatic Term Extraction (ATE) or Recognition (ATR) is a fundamental processing step preceding many complex knowledge engineering tasks. However, few methods have been implemented as public tools and in particular, available as open-source freeware. Further, little effort is made to develop an adaptable and scalable framework that enables customiz...
Poster
Full-text available
Automatic Term Extraction (ATE/ATR) is an important Natural Language Processing (NLP) task that deals with the extraction of terminologies from domain-specific textual corpora. This poster demonstrates a solution that integrates ATE with Apache Solr framework to benefit from its extensive, extensible, flexible text processing libraries; it can eith...
Article
Full-text available
Information extraction (IE) is the technique for transforming unstructured textual data into a structured representation that can be understood by machines. The exponential growth of the web generates an exceptional quantity of data for which automatic knowledge capture is essential. This work describes the methodology for web-scale information ext...
Conference Paper
Full-text available
This work studies methods of annotating Web tables for semantic indexing and search - labeling table columns with semantic type information and linking content cells with named entities. Built on a state-of-the-art method, the focus is placed on developing and evaluating methods able to achieve the goals with partial content sampled from the table...
Conference Paper
Full-text available
This paper describes TableMiner, the first semantic Table Interpretation method that adopts an incremental, mutually recursive and bootstrapping learning approach seeded by automatically selected ‘partial’ data from a table. TableMiner labels columns containing named entity mentions with semantic concepts that best describe data in columns, and dis...
Conference Paper
Full-text available
This work explores the usage of Linked Data for Web scale Information Extraction, with focus on the task of Wrapper Induction. We show how to effectively use Linked Data to automatically generate training material and build a self-trained Wrapper Induction method. Experiments on a publicly available dataset demonstrate that for covered domains, our...
Article
Full-text available
Information Extraction (IE) is the technique for transforming textual data into structured representation that can be understood by machines. It is a crucial technique in enabling the Semantic Web, where increasing interest has been seen in recent years. This article reports recent progress in the LODIE project - Linked Open Data for Information Ex...
Conference Paper
Full-text available
The Web of Data is a rich common resource with billions of triples available in thousands of datasets and individual Web documents created by both expert and non-expert ontologists. A common problem is the imprecision in the use of vocabularies: annotators can misunderstand the semantics of a class or property or may not be able to find the right o...
Article
Full-text available
Measuring lexical semantic relatedness is an important task in Natural Language Processing (NLP). It is often a prerequisite to many complex NLP tasks. Despite an extensive amount of work dedicated to this area of research, there is a lack of an up-to-date survey in the field. This paper aims to address this issue with a study that is focused on fo...
Conference Paper
Full-text available
Linking heterogeneous resources is a major research challenge in the Semantic Web. This paper studies the task of mining equivalent relations from Linked Data, which was insufficiently addressed before. We introduce an unsupervised method to measure equivalency of relation pairs and cluster equivalent relations. Early experiments have shown encoura...
Conference Paper
Full-text available
This work explores the usage of Linked Data for Web scale Information Extraction and shows encouraging results on the task of Wrapper Induction. We propose a simple knowl-edge based method which is (i) highly flexible with respect to different domains and (ii) does not require any training material, but exploits Linked Data as background knowl-edge...
Conference Paper
Full-text available
Linked Data is a gigantic, constantly growing and extremely valuable resource, but its usage is still heavily dependent on (i) the familiarity of end users with RDF's graph data model and its query language, SPARQL, and (ii) knowledge about available datasets and their contents. Intelligent keyword search over Linked Data is currently being investi...
Conference Paper
Full-text available
Research has shown that topic-oriented words are often related to named entities and can be used for Named Entity Recognition. Many have proposed to measure topicality of words in terms of …informativeness' based on global distributional characteristics of words in a corpus. However, this study shows that there can be large discrepancy between info...
Article
Full-text available
Knowledge Patterns (KPs), and even more specifically Ontology Design Patterns (ODPs), are no longer only generated in a top-down fashion, rather patterns are being extracted in a bottom-up fashion from online ontologies and data sources, such as Linked Data. These KPs can assist in tasks such as making sense of datasets and formulating queries over...
Article
Full-text available
Many semantic search tool evaluations have reported a user preference for free natural language as a query input approach as opposed to controlled or view-based inputs. Although the exibility offered by this approach is a significant advantage, it can also be a major difficulty. Allowing users complete freedom in the choice of terms increases the d...
Article
Full-text available
Information Extraction (IE) is the technique for transforming unstructured textual data into structured representation that can be understood by machines. The exponential growth of the Web generates an exceptional quantity of data for which automatic knowledge capture is essential. This work describes the methodology for Web scale Information Extra...
Conference Paper
Full-text available
Procedural knowledge is the knowledge required to perform certain tasks, and forms an important part of expertise. A major source of procedural knowledge is natural language instructions. While these readable instructions have been useful learning resources for human, they are not interpretable by machines. Automatically acquiring procedural knowle...
Article
Full-text available
This work analyzes research gaps and challenges for Web-scale Information Extraction and foresees the usage of Linked Open Data as a groundbreaking solution for the field. The paper presents a novel methodology for Web scale Information Extraction which will be the core of the LODIE project (Linked Open Data Information Extraction). LODIE aims to d...
Chapter
This chapter proposes a novel Semantic Relatedness (SR) measure that exploits diverse features extracted from a knowledge resource. Computing SR is a crucial technique for many complex Natural Language Processing (NLP) as well as Semantic Web related tasks. Typically, semantic relatedness measures only make use of limited number of features without...
Conference Paper
In this demonstration, we will present a semantic environment called the K-Box. The K-Box supports the lightweight integration of knowledge tools, with a focus on semantic tools, but with the flexibility to integrate natural language and conventional tools. We discuss the implementation of the framework, and two existing applications, including det...
Conference Paper
Full-text available
This work investigates the process of selecting, extracting and reorganizing content from Semantic Web information sources, to produce an ontology meeting the specifications of a particular domain and/or task. The process is combined with traditional text-based ontology learning methods to achieve tolerance to knowledge incompleteness. The paper de...
Conference Paper
Full-text available
Measuring semantic relatedness between words or concepts is a crucial process to many Natural Language Processing tasks. Exiting methods exploit semantic evidence from a single knowledge source, and are predominantly evaluated only in the general domain. This paper introduces a method of harnessing different knowledge sources under a uniform model...
Article
Full-text available
Named Entity Recognition (NER) deals with identifying and classifying atomic texts into pre-defined ontological classes. It is the enabling technique to many complex knowledge acquisition tasks. The recent flourish of Web resources has opened new opportunities and challenges for knowledge acquisition. In the domain of NER and its application in ont...
Conference Paper
Full-text available
Procedural knowledge is the knowledge required to perform certain tasks designed to solve a problem. It forms an important part of expertise, and is crucial for learning new tasks. For this reason, considerable efforts have been dedicated to researching the acquisition of procedural knowledge. This paper summarises related work, and identifies two...
Article
Full-text available
One of the ultimate aims of Natural Language Processing is to automate the analysis of the meaning of text. A fundamental step in that direction consists in enabling effective ways to automatically link textual references to their referents, that is, real world objects. The work presented in this paper addresses the problem of attributing a sense t...
Conference Paper
Full-text available
Natural Language is a mean to express and discuss about concepts, objects,events, i.e. it carries semantic contents. One of the ultimate roles of NaturalLanguage Processing techniques is identifying the meaning of the text, providingeffective ways to make a proper linkage between textual references and realworld objects. This work adresses the prob...
Conference Paper
Full-text available
Domain specific entity recognition often relies on domain-specific knowledge to improve system performance. However, such knowledge often suffers from limited domain portability and is expensive to build and maintain. Therefore, obtaining it in a generic and unsupervised manner would be a desirable feature for domain-specific entity recognition sys...
Conference Paper
Full-text available
Determining semantic relatedness between words or concepts is a fundamental process to many Natural Language Processing applications. Approaches for this task typically make use of knowledge resources such as WordNet and Wikipedia. However, these approaches only make use of limited number of features extracted from these resources, without investig...
Conference Paper
Full-text available
Procedural knowledge is the knowledge required to perform certain tasks. It forms an important part of expertise, and is crucial for learning new tasks. This paper summarises existing work on procedural knowledge acquisition, and identifies two major challenges that remain to be solved in this field; namely, automating the acquisition process to ta...
Conference Paper
There exists a large and underutilized resource of archaeological literature, both formal, such as scholarly journals and less formal in the form of `grey literature'. In the archaeological domain the vast majority of this literature contains some geo-spatial element as well as the expected temporal information and therefore its ease of discovery w...
Article
Full-text available
Gazetteers or entity dictionaries are important knowledge resources for solving a wide range of NLP problems, such as entity extraction. We in-troduce a novel method to automatically generate gazetteers from seed lists using an external knowledge resource, the Wikipedia. Unlike previous methods, our method exploits the rich content and various stru...
Article
Full-text available
This paper describes 'Archaeotools', a major e-Science project in archaeology. The aim of the project is to use faceted classification and natural language processing to create an advanced infrastructure for archaeological research. The project aims to integrate over 1 x 10(6) structured database records referring to archaeological sites and monume...
Article
Full-text available
Abstract Ontology construction for any domain is a labour intensive and complex process. Any methodology that can reduce the cost and increase efficiency has the potential to make a major impact in the life sciences. This paper describes an experiment in ontology construction from text for the animal behaviour domain. Our objective was to see how...
Article
Full-text available
Natural Language is a mean to express and discuss about concepts, objects, events, i.e. it carries semantic contents. The Semantic Web aims at tightly coupling contents with their precise meanings. One of the ultimate roles of Natu-ral Language Processing techniques is identifying the meaning of the text, provid-ing effective ways to make a proper...
Article
Full-text available
Automatic Term Recognition (ATR) systemsextract domain-specific terms from text corpora.Unfortunately, the output of currentATR systems fails to capture the whole ofthe domain covered by the corpus. To addressthis problem, we present a generic termre-ranking algorithm that generates term listscontaining terms that are not only individuallysalient,...
Conference Paper
Full-text available
Automatic Term recognition (ATR) is a fundamental processing step preceding more complex tasks such as semantic search and ontology learning. From a large number of methodologies available in the literature only a few are able to handle both single and multi-word terms. In this paper we present a comparison of five such algorithms and propose a com...
Article
Full-text available
In this paper, we describe our work on a ran-dom walks-based approach to disambiguat-ing people in web search results, and the im-plementation of a system that supports such approach, which we used to participate at Semeval'07 Web People Search task.
Article
Full-text available
The fundamental failure of current approaches to ontology learning is to view it as single pipeline with one or more specific inputs and a single static output. In this paper, we present a novel approach to ontology learning which takes an iterative view of knowledge acquisition for ontologies. Our approach is founded on three open-ended resources:...
Article
Full-text available
Measuring semantic relatedness between words or concepts is a crucial process to many Natural Language Processing tasks. Exiting methods exploit semantic evidence from a single knowledge source, and are predominantly evaluated only in the general domain. This paper introduces a method of harnessing different knowledge sources under a uniform model...

Questions

Question (1)
Question
Could I get some suggestions on how to interpret results of a canonical co-relation analysis, especially canonical coefficients and canonical loadings - I have asked detailed questions with an example at:
I wonder if anybody can take a look and give me some suggestions - either here or stackexchange?
Many thanks!

Network