Featured projects (1)
We are trying to combine deep learning algorithms with k-means clustering such by using back propagation, clustering representatives and data representations will be learned in a joint way.
Featured research (3)
We consider in this paper the problem of predicting the ability of a startup to attract investments using freely, publicly available data. Information about startups on the web usually comes either as unstructured data from news, social networks, and websites or as structured data from commercial databases, such as Crunchbase. The possibility of predicting the success of a startup from structured databases has been studied in the literature and it has been shown that initial public offerings (IPOs), mergers and acquisitions (M\&A) as well as funding events can be predicted with various machine learning techniques. In such studies, heterogeneous information from the web and social networks is usually used as a complement to the information coming from databases. However, building and maintaining such databases demands tremendous human effort. We thus study here whether one can solely rely on readily available sources of information, such as the website of a startup, its social media activity as well as its presence on the web, to predict its funding events. As illustrated in our experiments, the method we propose yields results comparable to the ones making also use of structured data available in private databases.
On a wide range of natural language processing and information retrieval tasks, transformer-based models, particularly pre-trained language models like BERT, have demonstrated tremendous effectiveness. Due to the quadratic complexity of the self-attention mechanism, however, such models have difficulties processing long documents. Recent works dealing with this issue include truncating long documents, segmenting them into passages that can be treated by a standard BERT model, or modifying the self-attention mechanism to make it sparser as in sparse-attention models. However, these approaches either lose information or have high computational complexity (and are both time, memory and energy consuming in this later case). We follow here a slightly different approach in which one first selects key blocks of a long document by local query-block pre-ranking, and then few blocks are aggregated to form a short document that can be processed by a model such as BERT. Experiments conducted on standard Information Retrieval datasets demonstrate the effectiveness of the proposed approach.
We address the problem of startup valuation from a machine learning perspective with a focus on European startups. More precisely, we aim to infer the valuation of startups corresponding to the funding rounds for which only the raised amount was announced. To this end, we mine Crunchbase, a well-established source of information on companies. We study the discrepancy between the properties of the funding rounds with and without the startup’s valuation announcement and show that the Domain Adaptation framework is suitable for this task. Finally, we propose a method that outperforms, by a large margin, the approaches proposed previously in the literature.