Ronen Feldman

Ronen Feldman
  • Professor of Internet Studies
  • Professor at Hebrew University of Jerusalem

About

135
Publications
72,281
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
9,378
Citations
Current institution
Hebrew University of Jerusalem
Current position
  • Professor
Additional affiliations
June 2007 - present
Hebrew University of Jerusalem
Position
  • Professor (Full) of Internet Technologies

Publications

Publications (135)
Article
Analysts and practitioners have long sought information on order backlog (OB) as indicators of future sales, and in turn, of future earnings and stock returns. OB disclosures, though mandatory for annual reports, are voluntarily included in some quarterly reports and are sometimes presented only in textual narration. Given that the required annual...
Article
What moves stock prices? Prior literature concludes that the revelation of private information through trading, and not public news, is the primary driver. We revisit the question by using textual analysis to identify fundamental information in news. We find that this information accounts for 49.6% of overnight idiosyncratic volatility (vs. 12.4% d...
Conference Paper
We present an end-to-end text mining methodology for relation extraction of adverse drug reactions (ADRs) from medical forums on the Web. Our methodology is novel in that it combines three major characteristics: (i) an underlying concept of using a head-driven phrase structure grammar (HPSG) based parser; (ii) domain-specific relation patterns, the...
Article
Full-text available
Stock-related messages on social media have several interesting properties regarding the sentiment analysis (SA) task. On the one hand, the analysis is particularly challenging, because of frequent typos, bad grammar, and idiosyncratic expressions specific to the domain and media. On the other hand, stock-related messages primarily refer to the sta...
Conference Paper
Full-text available
Sentiment relevance detection problems occur when there is a sentiment expression in a text, and there is the question of whether or not the expression is related to a given entity or, more generally, to a given situation. The paper discusses variants of the problem, and shows that it is distinct from other somewhat similar problems occurring in th...
Article
Finance and accounting research has recently focused on extracting the tone or sentiment of a document (such as an earnings press release, cover story about a company, or management's presentations to analysts) by using positive or negative words/phrases in the document. This chapter shows that signals based on tone or sentiment (extracted from qua...
Article
Findings in the prior literature on the implications of Order Backlog (OB) for stock returns are both sparse and inconclusive. For example, Rajgopal et al. (2003) show that firms with larger ratios of OB to total assets earn lower subsequent returns than firms with smaller ratios; while Lev and Thiagarajan (1993) find that increases in OB beyond sa...
Article
The main applications and challenges of one of the hottest research areas in computer science.
Conference Paper
A basic tenet of financial economics is that asset prices change in response to unexpected fundamental information. Since Roll’s (1988) provocative presidential address that showed little relation between stock prices and news, however, the finance literature has had limited success reversing this finding. This paper revisits this topic in a novel...
Article
Web 2.0 provides gathering places for internet users in blogs, forums, and chat rooms. These gathering places leave footprints in the form of colossal amounts of data regarding consumers’ thoughts, beliefs, experiences, and even interactions. In this paper, we propose an approach for firms to explore online user-generated content and “listen” to wh...
Article
The Stock Sonar (TSS) is a stock sentiment analysis application based on a novel hybrid approach. While previous work focused on document level sentiment classification, or extracted only generic sentiment at the phrase level, TSS integrates sentiment dictionaries, phrase-level compositional patterns, and predicate-level semantic events. TSS genera...
Article
This study examines the immediate and delayed market responses to revisions in analyst forecasts of earnings, target prices, and recommendations. Consistent with prior literature, revisions in earnings forecasts are positively and significantly associated with short-term market returns around the revisions. However, we show that short-term market r...
Conference Paper
The paper describes a method of relation extraction, which is based on parsing the input text using a combination of a generic HPSG-based grammar and a highly focused domain-and relation-specific lexicon. We also show a method of unsupervised acquisition of such a lexicon from a large unlabeled corpus. Together, the methods introduce a novel approa...
Conference Paper
Information published in online stock investment message boards, and more recently in stock microblogs, is considered highly valuable by many investors. Previous work focused on aggregation of sentiment from all users. However, in this work we show that it is beneficial to distinguish expert users from non-experts. We propose a general framework fo...
Conference Paper
Full-text available
The Stock Sonar (TSS) is a stock sentiment analysis application based on a novel hybrid approach. While previous work focused on document level sentiment classification, or extracted only generic sentiment at the phrase level, TSS integrates sentiment dictionaries, phrase-level compositional patterns, and predicate-level semantic events. TSS genera...
Article
Full-text available
This study explores whether the management discussion and analysis (MD&A) section of Forms 10-Q and 10-K has incremental information content beyond financial measures such as earnings surprises and accruals. It uses a classification scheme of words into positive and negative categories to measure the tone change in the MD&A section relative to prio...
Chapter
Full-text available
Text Mining is the automatic discovery of new, previously unknown information, by automatic analysis of various textual resources. Text mining starts by extracting facts and events from textual sources and then enables forming new hypotheses that are further explored by traditional Data Mining and data analysis methods. In this chapter we will defi...
Article
We will discuss the recent progress in increasing the technical readiness level of a space flight prototype Doppler Asymmetric Spatial Heterodyne (DASH) instrument for measuring upper atmospheric winds using the O(1D) red line at 630nm. DASH is a modified Spatial Hetero-dyne Spectrometer (SHS), and is therefore a close relative of a Fourier transfo...
Article
This study explores whether the management discussion and analysis (MD&A) section of Forms 10-Q and 10-K has incremental information content beyond financial measures such as earnings surprises and accruals. It uses a classification scheme of words into positive and negative categories to measure the tone change in the MD&A section relative to prio...
Conference Paper
Web pages often contain text that is irrelevant to their main content, such as advertisements, generic format elements, and references to other pages on the same site. When used by automatic content-processing systems, e.g., for Web indexing, text classification, or information extraction, this irrelevant text often produces substantial amount of n...
Article
Web extraction systems attempt to use the immense amount of unlabeled text in the Web in order to create large lists of entities and relations. Unlike traditional Information Extraction methods, the Web extraction systems do not label every mention of the target entity or relation, instead focusing on extracting as many different instances as possi...
Article
A new disclosure in Form 8-K is required when a company needs to warn investors that they cannot rely on previously issued financial statements. Using such disclosures in 2005, the authors find that the initial stock market reaction to the filing of the Form 8-K is negative, with an average three-day abnormal return centered on the Form 8-K filing...
Conference Paper
Full-text available
Product discussion boards are a rich source of information about consumer sentiment about products, which is being increasingly exploited. Most sentiment analysis has looked at single products in isolation, but users often compare different products, stating which they like better and why. We present a set of techniques for analyzing how consumers...
Article
In today's information age, the amount of text documents available electronically (on the Web, on corporate intranets, on news wires and elsewhere) is overwhelming. Search engines and information retrieval, while useful to find documents that satisfy a certain query, offer little help with analyzing the unstructured documents themselves. Text Minin...
Conference Paper
Full-text available
In recent years, product discussion forums have become a rich environment in which consumers and potential adopters exchange views and information. Researchers and practitioners are starting to extract user sentiment about products from user product reviews. Users often compare different products, stating which they like better and why. Extracting...
Conference Paper
Unsupervised Relation Identification is the task of automatically discovering interesting relations between entities in a large text corpora. Relations are identified by clustering the frequently co-occurring pairs of entities in such a way that pairs occurring in similar contexts end up belonging to the same clusters. In this paper we compare seve...
Conference Paper
This research is focused on developing effective visualization tools for query construction and advanced exploration of temporal relational databases. Temporal databases enable the retrieval of each of the states observed in the past and even planned future states. Several query languages for relational databases have been introduced, but only a fe...
Conference Paper
We present URIES - an unsupervised relation identification and extraction system. The system automatically identifies interesting binary relations between entities in the input corpus, and then proceeds to extract a large number of instances of these relations. The system discovers relations by clustering frequently co- occuring pairs of entities,...
Conference Paper
Full-text available
Many errors produced by unsupervised and semi-supervised relation extraction (RE) systems occur because of wrong recogni- tion of entities that participate in the rela- tions. This is especially true for systems that do not use separate named-entity rec- ognition components, instead relying on general-purpose shallow parsing. Such sys- tems have gr...
Article
Full-text available
We discuss what makes exciting and motivating Grand Challenge problems for Data Mining, and propose criteria for a good Grand Challenge. We then consider possible GC problems from multimedia mining, link mining, large- scale modeling, text mining, and proteomics. This report is the result of a panel held at KDD-2006 conference.
Book
Text mining tries to solve the crisis of information overload by combining techniques from data mining, machine learning, natural language processing, information retrieval, and knowledge management. In addition to providing an in-depth examination of core text mining and link detection algorithms and operations, this book examines advanced pre-pro...
Conference Paper
Web extraction systems attempt to use the immense amount of unlabeled text in the Web in order to create large lists of entities and relations. Unlike traditional IE methods, the Web extraction systems do not label every mention of the target entity or relation, instead focusing on extracting as many different instances as possible while keeping th...
Conference Paper
Full-text available
This panel will discuss possible exciting and motivating Grand Challenge problems for Data Mining, focusing on bioinformatics, multimedia mining, link mining, text mining, and web mining.
Article
Full-text available
Typographic and visual information is an integral part of textual documents. Most information extraction (IE) systems ignore most of this visual information, processing the text as a linear sequence of words. Thus, much valuable information is lost. In this paper, we show how to make use of this visual information for IE. We present an algorithm th...
Article
Full-text available
This study explores a system to retrieve and classify the reasons for late mandatory SEC (Securities and Exchange Commission) fllings. From the source documents, the system identifles the reasons for the late flling and classifles them into one or more of seven categories. The system can be used by potential investors who have to track a large numb...
Conference Paper
Full-text available
In the CoNLL 2003 NER shared task, more than two thirds of the submitted systems used a feature-rich representation of the task. Most of them used the maximum entropy principle to combine the features together. Others used large margin linear classifiers, such as SVM and RRM. In this paper, we compare several common classifiers under exactly the sa...
Conference Paper
Full-text available
Most information extraction systems ei- ther use hand written extraction patterns or use a machine learning algorithm that is trained on a manually annotated cor- pus. Both of these approaches require massive human effort and hence prevent information extraction from becoming more widely applicable. In this paper we present URES (Unsupervised Relat...
Conference Paper
Web extraction systems attempt to use the immense amount of unlabeled text in the Web in order to create large lists of entities and relations. Unlike traditional Information Extraction methods, the Web extraction systems do not label every mention of the target entity or relation, instead focusing on extracting as many different instances as possi...
Conference Paper
Full-text available
Web extraction systems attempt to use the immense amount of unlabeled text in the Web in order to create large lists of entities and relations. Unlike traditional IE methods, the Web extraction systems do not label every mention of the target entity or relation, instead focusing on extracting as many different instances as possible while keeping th...
Conference Paper
Full-text available
This paper describes a framework for defining domain specific Feature Functions in a user friendly form to be used in a Maximum Entropy Markov Model (MEMM) for the Named Entity Recognition (NER) task. Our system called MERGE allows defining general Feature Function Templates, as well as Linguistic Rules incorporated into the classifier. The simple...
Article
Full-text available
This paper describes a hybrid statistical and knowledge-based information extraction model, able to extract entities and relations at the sentence level. The model attempts to retain and improve the high accuracy levels of knowledge-based systems while drastically reducing the amount of manual labour by relying on statistics drawn from a training c...
Conference Paper
Full-text available
Conference Paper
In the CoNLL 2003 NER shared task, more than two thirds of the submitted systems used the feature-rich representation of the task. Most of them used maximum entropy to combine the features together. Others used linear classifiers, such as SVM and RRM. Among all systems presented there, one of the MEMM-based classifiers took the second place, losing...
Article
Full-text available
We describe a new tool for mining association rules, which is of special value in text mining. The new tool, called maximal associations, is geared toward discovering associations that are frequently lost when using regular association rules. Intuitively, a maximal association rule X ⇒max Y says that whenever X is the only item of its type in a tra...
Conference Paper
Full-text available
This paper describes a framework for defining domain specific Feature Functions in a user friendly form to be used in a Maximum Entropy Markov Model (MEMM) for the Named Entity Recognition (NER) task. Our system called MERGE allows defining general Feature Function Templates, as well as Linguistic Rules incorporated into the classifier. The simple...
Conference Paper
Full-text available
The University of Maryland Electron Ring (UMER) is a low energy electron recirculator for the study of space charge dominated beam transport. The system’s pulse length (100 ns) and large number of diagnostics make it ideal for investigating the longitudinal evolution of intense beams. Pulse shape flexibility is provided by the pulser system and the...
Conference Paper
Full-text available
The semantic web is expected to have an impact at least as big as that of the existing HTML based web, if not greater. However, the challenge lays in creating this semantic web and in converting existing web information into the semantic paradigm. One of the core technologies that can help in migration process is automatic markup, the semantic mark...
Conference Paper
Full-text available
This paper describes a hybrid statistical and knowledge-based information extraction model, able to extract entities and relations at the sentence level. The model attempts to retain and improve the high accuracy levels of knowledge-based systems while drastically reducing the amount of manual labor by relying on statistics drawn from a training co...
Article
The information age has made the electronic storage of large amounts of data effortless. The proliferation of documents available on the Internet, corporate intranets, news wires and elsewhere is overwhelming. Search engines only exacerbate this overload problem by making increasingly more documents available in only a few keystrokes. This informat...
Article
This study investigates market reactions to voluntary earnings guidance provided by managers after the enactment of Regulation FD, which requires companies to disseminate material news to all investors simultaneously. More managers now issue their guidance to the public instead of disclosure to a selective group of analysts, in conformity with Regu...
Article
Full-text available
The past decade has seen a tremendous growth in the amount of experimental and computational biomedical data, specifically in the areas of genomics and proteomics. This growth is accompanied by an accelerated increase in the number of biomedical publications discussing the findings. In the last few years, there has been a lot of interest within the...
Article
Full-text available
Information extraction is one of the most important techniques used in Text Mining. One of the main problems in building information extraction (IE) systems is that the knowledge elicited from domain experts tends to be only approximately correct. In addition, the knowledge acquisition phase for building IE rules usually takes a tremendous amount o...
Article
The information age is characterized by a rapid growth in the amount of information available in electronic media. Traditional data handling methods are not adequate to cope with this flood of information. Knowledge discovery in databases (KDD) is a new paradigm that focuses on automatic or semiautomatic exploration of large amounts of data and on...
Article
Full-text available
Below we describe the winning system that we built for the KDD Cup 2002 Task 1 competition. Our system is a Rule-based Information Extraction (IE) system. It combines pattern matching, Natural Language Processing (NLP) tools, semantic constraints based on the domain and the specific task, and a post-processing stage for making the final curation de...
Conference Paper
Full-text available
Most information extraction systems focus on the textual content of the documents. They treat documents as sequences or of words, disregarding the physical and typographical layout of the information.. While this strategy helps in focusing the extraction process on the key semantic content of the document, much valuable information can also be deri...
Article
Text mining is the process of analyzing unstructured, natural language texts in order to discover information and knowledge that are difficult to retrieve directly. Information extraction is one of the most important techniques used in text mining. Natural language processing tools, augmented by lexical resources and semantic constraints can be use...
Conference Paper
Full-text available
The availability of online text documents exposes readers to a vast amount of potentially valuable knowledge buried therein. The sheer scale of material has created the pressing need for automated methods of discovering relevant information without having to read it all. Hence the growing interest in recent years in Text Mining. A common approach t...
Article
This paper reviews the best practices and challenges for project managers and developers involved in implementing text-mining applications. With focus on rule-based information extraction, and references to actual cases, the authors share their experiences from developing several text-mining applications in diverse industries. First, project manage...
Article
Document Explorer is a data mining system for document collections. Such a collection represents an application domain, and the primary goal of the system is to derive patterns that provide knowledge about this domain. Additionally, the derived patterns can be used to browse the collection. Document Explorer searches for patterns that capture relat...
Conference Paper
Full-text available
This paper reviews the best practices and challenges for project managers and developers involved in implementing text-mining applications. With focus on rule-based information extraction, and references to actual cases, the authors share their experiences from having developed several text-mining applications in diverse industries. First, project...
Conference Paper
The information age is characterized by a rapid growth in the amount of information available in electronic media. Traditional data handling methods are not adequate to cope with this flood of information. Knowledge discovery in databases (KDD) is a new paradigm that focuses on automatic or semiautomatic exploration of large amounts of data and on...
Conference Paper
Full-text available
Text-Mining is a growing area of interest within the field of Data Mining and Knowledge Discovery. Given a collection of text documents, most approaches to Text Mining perform knowledge-discovery operations either on external tags associated with each document, or on the set of all words within each document. Both approaches suffer from limitations...
Article
The Maryland Infrared Free Electron Laser is being constructed at the University of Maryland, and is expected to lase in the far infrared. The accelerator driving the laser is a 10-MeV linac which is being assembled in an "in-line" configuration. The design work for the accelerator was accomplished using Trace-3D and PARMELA computer simulations. W...
Article
Full-text available
We consider a situation where events (e.g. manufacturing plant alarms, web-page accesses) occur in sequence. In such cases, the ability to predict future events based on current events can be valuable. A key question, in any given domain, is whether the domain is accurately modeled by a simple Markov chain or whether, alternatively, past history is...
Article
Full-text available
. Knowledge Discovery in Databases (KDD) focuses on the computerized exploration of large amounts of data and on the discovery of interesting patterns within them. While most work on KDD has been concerned with structured databases, there has been little work on handling the huge amount of information that is available only in unstructured textual...
Article
Full-text available
Knowledge Discovery in Databases (KDD), also known as data mining, focuses on the computerized exploration of large amounts of data and on the discovery of interesting patterns within them. While most work on KDD has been concerned with structured databases, there has been little work on handling the huge amount of information that is available onl...
Conference Paper
Full-text available
The proliferation of digitally available textual data necessitates automatic tools for analyzing large textual collections. Thus, in analogy to data mining for structured databases, text mining is defined for textual collections. A central tool in text-mining is the analysis of concept relationship, which discovers connections between different con...
Conference Paper
Knowledge Discovery in Databases (KDD), also known as data mining, focuses on the computerized exploration of large amounts of data and on the discovery of interesting patterns within them. While most work on KDD has been concerned with structured databases, there has been little work on handling the huge amount of information that is available onl...
Conference Paper
Full-text available
With over 800 million pages covering most areas of human endeavor, the World-wide Web is a fertile ground for data mining research to make a difference to the effectiveness of information search. Today, Web surfers access the Web through two dominant ...
Article
Full-text available
We consider the problem of finding association rules in a database with binary attributes. Most algorithms for finding such rules assume that all the data is available at the start of the data mining session. In practice, the data in the database may change over time, with records being added and deleted. At any given time, the rules for the curren...
Article
Full-text available
. TextVis is a visual data mining system for document collections. Such a collection represents an application domain, and the primary goal of the system is to derive patterns that provide knowledge about this domain. Additionally, the derived patterns can be used to browse the collection. TextVis takes a multi-strategy approach to text mining, and...
Article
Full-text available
This paper describes the FACT system for knowledge discovery from text. It discovers associations - patterns of co-occurrence - amongst keywords labeling the items in a collection of textual documents. In addition, FACT is able to use background knowledge about the keywords labeling the documents in its discovery process. FACT takes a query-centere...
Conference Paper
The proliferation of digitally available textual data necessitates automatic tools for analyzing large textual collections. Thus, in analogy to data mining for structured databases, text mining is defined for textual collections. A central tool in text mining is the analysis of concept relationship, which discovers connections between different con...
Article
Full-text available
Knowledge Discovery in Databases (KDD) focuses on the computerized exploration of large amounts of data and on the discovery of interesting patterns within them. While most work on KDD has been concerned with structured databases, there has been little work on handling the huge amount of information that is available only in unstructured textual fo...
Conference Paper
Full-text available
Knowledge Discovery in Databases (KDD) focuses on the computerized exploration of large amounts of data and on the discovery of interesting patterns within them. While most work on KDD has been concerned with structured databases, there has been little work on handling the huge amount of information that is available only in unstructured textual fo...
Conference Paper
Full-text available
Knowledge Discovery in Databases (KDD), also known as data mining, focuses on the computerized exploration of large a mounts of data a nd o n the discovery of interesting patterns within them. While most work on KDD has been concerned with structured d atabases, there has been little work on handling the huge amount of information that is available...
Article
Current algorithms for finding associations among the attributes describing data in a database have a number of shortcomings:1. Applications that require associations with very small support have prohibitively large running times. 2. They assume a static database. Some applications require generating associations in real-time from a dynamic databas...
Article
This paper describes the FACT system for knowledge discovery fromtext. It discovers associations—patterns ofco-occurrence—amongst keywords labeling the items in a collection oftextual documents. In addition, when background knowledge is available aboutthe keywords labeling the documents FACT is able to use this information inits discovery process....
Conference Paper
We present Document Explorer, a data mining system searching for patterns in document collections. These patterns provide knowledge on the application domain that is represented by the collection. A pattern can also be seen as a query that retrieves a set of documents. Thus the data mining tools can be used to identify interesting queries which can...
Conference Paper
Document Explorer is a data mining system for document collections. Such a collection represents an application domain, and the primary goal of the system is to derive patterns that provide knowledge about this domain. Additionally, the derived patterns can be used to browse the collection. Document Explorer searches for patterns that capture relat...
Conference Paper
Full-text available
Knowledge Discovery in Databases (KDD) focuses on the computerized exploration of large amounts of data and on the discovery of interesting patterns within them. While most work on KDD has been concerned with structured databases, there has been little work on handling the huge amount of information that is available only in unstructured document c...
Conference Paper
Full-text available
Data mining has informally been introduced as large scale search for interesting patterns in data. It is often an explorative task iteratively performed within the process of knowledge discovery in databases. In this process, interactive visualization techniques are also successfully applied for data exploration. We deal with the synergy of these t...

Network

Cited By