Dirk Thomas’s scientific contributions

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (1)


Figure 1: An example of a company graph. 
Figure 2: An example of a token trie. Double circles indicate final states. 
Improving Company Recognition from Unstructured Text by using Dictionaries
  • Conference Paper
  • Full-text available

March 2017

·

3,393 Reads

·

16 Citations

·

·

·

[...]

·

Dirk Thomas

While named entity recognition is a much addressed research topic, recognizing companies in text is of particular difficulty. Company names are extremely heterogeneous in structure, a given company can be referenced in many different ways, their names include person names, locations, acronyms, numbers, and other unusual tokens. Further, instead of using the official company name, quite different colloquial names are frequently used by the general public. We present a machine learning (CRF) system that reliably recognizes organizations in German texts. In particular, we construct and employ various dictionaries, regular expressions, text context, and other techniques to improve the results. In our experiments we achieved a precision of 91.11% and a recall of 78.82%, showing significant improvement over related work. Using our system we were able to extract 263,846 company mentions from a corpus of 141,970 newspaper articles.

Download

Citations (1)


... In the early stages of research, ontology-based approaches and statistical learning methods on large-scale risk domain corpora were the main patterns for financial entity extraction, laying the foundation for developing entity extraction research in related domains [15,29]. On this basis, the combination of the conditional random field (CRF) model and dictionary rules has also become a choice for scholars [30,31]. This approach is simple, efficient, and suitable for colloquial text corpora. ...

Reference:

Survey and Prospect for Applying Knowledge Graph in Enterprise Risk Management
Improving Company Recognition from Unstructured Text by using Dictionaries