Talel Abdessalem

Talel Abdessalem
Télécom ParisTech

PhD

About

97
Publications
23,564
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,172
Citations

Publications

Publications (97)
Article
Recommender Systems (RS) have proven to be effective tools to help users overcome information overload, and significant advances have been made in the field over the past two decades. Although addressing the recommendation problem required first a formulation that could be easily studied and evaluated, there currently exists a gap between research...
Preprint
Full-text available
River is a machine learning library for dynamic data streams and continual learning. It provides multiple state-of-the-art learning methods, data generators/transformers, performance metrics and evaluators for different stream learning problems. It is the result from the merger of the two most popular packages for stream learning in Python: Creme a...
Preprint
Full-text available
Boosting is an ensemble method that combines base models in a sequential manner to achieve high predictive accuracy. A popular learning algorithm based on this ensemble method is eXtreme Gradient Boosting (XGB). We present an adaptation of XGB for classification of evolving data streams. In this setting, new data arrives over time and the relations...
Article
The set of all frequent patterns that are extracted from a single network can be huge. A technique recently proposed for obtaining a compact, informative and useful set of patterns is output sampling, where a small set of frequent patterns is randomly chosen. However, existing output sampling algorithms work only in the transactional setting, where...
Chapter
Effective data anonymisation is the key to unleashing the full potential of big data analytics while preserving privacy. An organization needs to be able to share and consolidate the data it collects across its departments and in its network of collaborating organizations. Some of the data collected and the cross-references made in its aggregation...
Conference Paper
Full-text available
Betweenness centrality and k-path centrality are two important indices that are widely used to analyze social, technological and information networks. In the current paper, first given a directed network G and a vertex $r\in V(G)$, we present a novel adaptive algorithm for estimating betweenness score of r. Our algorithm first computes two subsets...
Article
Full-text available
The Publisher regrets an error in the spelling of the family name of the sixth author. The correct spelling is Bernhard Pfahringer, as it appears in the author list above.
Article
Full-text available
The type, the nature, and the amount of information at the disposal of tourists have exploded, overwhelming them with the number of choices and making trip planning a very challenging task. Recommender systems offer personalized recommendations and help individuals overcome this information overload. The deployment of such systems in the hotel indu...
Conference Paper
We propose the task of set labelling. Starting from some examples members of a set, set labelling tries to infer the most appropriate labels for the given set. For this work, we consider sets of words. We illustrate the task and a possible solution with an application to the classification of cosmetic products and hotels. The novel solution propose...
Conference Paper
Topics in topic modelling approaches are represented as a collection of weighted words. The labels for the topics, however, are not clearly defined and must be interpreted manually. Topic labelling proposes to automatically label the topics by leveraging a knowledge base or applying data mining and machine learning algorithms. We propose a naive to...
Preprint
Full-text available
An important index widely used to analyze social and information networks is betweenness centrality. In this paper, first given a directed network $G$ and a vertex $r\in V(G)$, we present a novel adaptive algorithm for estimating betweenness score of $r$. Our algorithm first computes two subsets of the vertex set of $G$, called $\mathcal{RF}(r)$ an...
Book
Collaborative filtering (CF) mainly suffers from rating sparsity and from the cold-start problem. Auxiliary information like texts and images has been leveraged to alleviate these problems, resulting in hybrid recommender systems (RS). Due to the abundance of data continuously generated in real-world applications, it has become essential to design...
Article
Full-text available
scikit-multiflow is a framework for learning from data streams and multi-output learning in Python. Conceived to serve as a platform to encourage the democratization of stream learning research, it provides multiple state-of-the-art learning methods, data generators and evaluators for different stream learning problems, including single-output, mul...
Conference Paper
Full-text available
Collaborative filtering (CF) mainly suffers from rating sparsity and from the cold-start problem. Auxiliary information like texts and images has been leveraged to alleviate these problems, resulting in hybrid recommender systems (RS). Due to the abundance of data continuously generated in real-world applications, it has become essential to design...
Preprint
Full-text available
Scikit-multiflow is a multi-output/multi-label and stream data mining framework for the Python programming language. Conceived to serve as a platform to encourage democratization of stream learning research, it provides multiple state of the art methods for stream learning, stream generators and evaluators. scikit-multiflow builds upon popular open...
Conference Paper
The recommendation problem in the hotel industry introduces several interesting and unique challenges leading to the insufficiency of classical approaches. Traveling is not a frequent activity and users tend to have multifaceted behaviors affected by their specific context. While context-aware recommender systems are a promising way to address this...
Chapter
Full-text available
Missing data is a common trait of real-world data that can negatively impact interpretability. In this paper, we present Cascade Imputation (CIM), an effective and scalable technique for automatic imputation of missing data. CIM is not restrictive on the characteristics of the data set, providing support for: Missing At Random and Missing Completel...
Chapter
Full-text available
In this paper, first given a directed network G and a vertex \(r \in V(G)\), we propose a new exact algorithm to compute betweenness score of r. Our algorithm pre-computes a set \(\mathcal {RF}(r)\), which is used to prune a huge amount of computations that do not contribute in the betweenness score of r. Then, for the cases where \(\mathcal {RF}(r...
Article
Full-text available
In this research, we adopt a design science approach to develop an analytical framework of social media capabilities of the firm. The approach has been applied in the case of five brands in the cosmetics industry using social network analysis, consumer engagement, descriptive statistics and sentiment analysis. The findings allow us to enlighten the...
Conference Paper
With the explosion of the volume of user-generated data, designing online recommender systems that learn from data streams has become essential. These systems rely on incremental learning that continuously update models as new observations arrive and they should be able to adapt to drifts in real-time. User preferences evolve over time and tracking...
Article
Full-text available
Random forests is currently one of the most used machine learning algorithms in the non-streaming (batch) setting. This preference is attributable to its high learning performance and low demands with respect to input preparation and hyper-parameter tuning. However, in the challenging context of evolving data streams, there is no random forests alg...
Conference Paper
Set expansion refers to the problem of reconstituting elements of a semantic class from a corpus based on some given examples, or seeds. Set of t-uples expansion corresponds to the particular task of re-building a set of t-uples, i.e. a relation, from a corpus based on some given seed t-uples. Both set and set of t-uples expansions require a rankin...
Article
Full-text available
Graphs are an important tool to model data in different domains, including social networks, bioinformatics and the world wide web. Most of the networks formed in these domains are directed graphs, where all the edges have a direction and they are not symmetric. Betweenness centrality is an important index widely used to analyze networks. In this pa...
Conference Paper
The collection and exploitation of ratings from users are modern pillars of collaborative filtering. Likert scale is a psychometric quantifier of ratings popular among the electronic commerce sites. In this paper, we consider the tasks of collecting Likert scale ratings of items and of finding the n-k best-rated items, i.e., the n items that are mo...
Article
Purpose - This paper focuses on the design of algorithms and techniques for an effective set expansion. We design and implement a tool that finds and extracts candidate sets of t-uples from the World Wide Web. For instance, when a given user provides <Indonesia, Jakarta, Indonesian Rupiah>, <China, Beijing, Yuan Renminbi>, <Canada, Ottawa, Canadian...
Article
Full-text available
Betweenness centrality is an important index widely used in different domains such as social networks, traffic networks and the world wide web. However, even for mid-size networks that have only a few hundreds thousands vertices, it is computationally expensive to compute exact betweenness scores. Therefore in recent years, several approximate algo...
Article
Full-text available
Distance-based indices, including closeness centrality, average path length, eccentricity and average eccentricity, are important tools for network analysis. In these indices, the distance between two vertices is measured by the size of shortest paths between them. However, this measure has shortcomings. A well-studied shortcoming is that extending...
Conference Paper
Community detection is important for analyzing and visualizing given networks. In real world, many complex systems can be modeled as signed networks composed of positive and negative edges. Although community detection in signed networks has been attempted by many researchers, studies for detecting detailed structures remain to be done. In this pap...
Article
Full-text available
We present in this paper a multicultural approach to social media marketing analytics, applied in two Facebook brand pages: French (individualistic culture, the country home of the brand) versus Saudi Arabian (collectivistic culture, one of its country hosts), which are published by an internationalbeauty \& cosmetics firm. Using social network ana...
Conference Paper
Set expansion is the task of finding elements of a set given example members. We are interested in the design of al- gorithms and techniques for a set expansion tool that ex- pands a set by searching, finding and extracting candidates from the World Wide Web. Existing approaches mostly consider sets of atomic data. We extend this idea to the expans...
Presentation
Full-text available
Set labelling is a task consisting of finding the most suitable label for a set, given examples of its members. The example members are called ”seeds”. The set of seeds is the ”query”. The task makes the assumption that the set is homogeneous; it makes the assumption that its members belong to the same class of objects or concepts. Set labelling tr...
Article
Graphs are important tools for modeling data in different biological, social and technological domains. The measurement of their complexity has theoretical and practical applications in many areas such as pattern recognition, graph clustering, network inference and network analysis. A widely used measure is the q-entropy. While this measure has ext...
Conference Paper
The main stage for a new generation of cooperative information systems are smart communities such as smart cities and smart nations. In the smart city context in which we position our work, urban planning, development and management authorities and stakeholders need to understand and take into account the mobility patterns of urban dwellers in orde...
Conference Paper
Modern recommendation systems leverage some forms of collaborative user or crowd sourced collection of information. For instance, services like TripAdvisor, Airbnb and HungyGoWhere rely on user-generated content to describe and classify hotels, vacation rentals and restaurants. By nature of such independent collection of information, the multiplici...
Conference Paper
Urban planning, development and management authorities and stakeholders need to understand and analyse the mobility patterns of urban dwellers in order to manage sociological, economic and environmental issues. Simulation is indispensable a tool for authorities and stakeholders to better design, operate and control the mobility infrastructures of s...
Conference Paper
Spatial crowdsourcing is an activity consisting in outsourcing spatial tasks to a community of online, yet on-ground and mobile, workers. A spatial task is characterized by the requirement that workers must move from their current location to a specified location to accomplish the task. We study the assignment of spatial tasks to workers. A sequenc...
Conference Paper
This demo presents SoMap, a web-based platform that provides new scalable methods to aggregate, analyse and valorise large collections of heterogeneous social data in urban contexts. The platform relies on geotagged data extracted from social networks and microblogging applications such as Instagram, Flickr and Twitter and on Points Of Interest gat...
Poster
Full-text available
The Web contains a large amount of useful information and resources. It is expanding rapidly, and changing from a pure document collection to a large space of connected public data. The structure and semantic annotations of the web pages make it easier to search, extract and mine the web content. Focused crawling enables to selectively find Web pag...
Conference Paper
A number of applications deal with monitoring moving objects: cars, aircrafts, ships, persons, etc. Traditionally, this requires capturing data from sensor networks, image or video analysis, or using other application-specific resources. We show in this demonstration paper how Web content can be exploited instead to gather information (trajectories...
Article
Tag recommendation is a major aspect of collaborative tagging systems. It aims to recommend suitable tags to a user for tagging an item. One of its main challenges is the effectiveness of its recommendations. Existing works focus on techniques for retrieving the most relevant tags to give beforehand, with a fixed number of tags in each recommended...
Conference Paper
We study in this vision paper the problem of integrating several web data sources under uncertainty and dependencies. We present a concrete application with web sources about objects in the maritime domain where uncertainties and dependencies are omnipresent. Uncertainties are mainly caused by imprecise information trackers and imperfect human know...
Article
Tag recommendation is a major aspect of collaborative tagging systems. It aims to recommend tags to a user for tagging an item. In this paper we present a part of our work in progress which is a novel improvement of recommendations by re-ranking the output of a tag recommender. We mine association rules between candidates tags in order to determine...
Article
Tag recommendation is a major aspect of collaborative tagging systems. It aims to recommend tags to a user for a given item. In this paper we propose an adaptation of the search algorithms proposed in [14, 1] to the tag recommendation problem. Our algorithm, called STRec, provides networkaware recommendations based on proximity measures computed on...
Article
Full-text available
Tag recommendation is a major aspect of collaborative tagging systems. It aims to recommend tags to a user for tagging an item. In this paper we present a part of our work in progress which is a novel improvement of recommendations by re-ranking the output of a tag recommender. We mine association rules between candidates tags in order to determine...
Article
In order to ease content enrichment, exchange, and sharing, web-scale collaborative platforms such as Wikipedia or Google Docs enable unbounded interactions between a large number of contributors, without prior knowledge of their level of expertise and reliability. Version control is then essential for keeping track of the evolution of the shared c...
Article
Full-text available
It is today accepted that matrix factorization models allow a high quality of rating prediction in recommender systems. However, a major drawback of matrix factorization is its static nature that results in a progressive declining of the accuracy of the predictions after each factorization. This is due to the fact that the new obtained ratings are...
Conference Paper
While online social networks (OSN) present unprecedented opportunities for sharing information and multimedia content among users, they raise major privacy issues as users could often access personal or confidential data of other users. Most social networks provide some basic access control policies, which however seem to be very limited given the...
Article
Full-text available
We present in this paper a novel approach for extracting structured data from the Web, whose goal is to harvest real-world items from template-based HTML pages (the structured Web). It illustrates a two-phase querying of the Web, in which an intentional description of the data that is targeted is first provided, in a flexible and widely applicable...
Article
Full-text available
Content-based online collaborative platforms and office ap-plications are widely used for collaborating and exchang-ing data, in particular in the form of XML-based electronic documents. Usually, a version control system is built-in in these applications to support collaboration and to properly manage document evolution. However, most version con-t...
Conference Paper
Full-text available
We present in this paper results on inferring a signed network (a "web of trust") from interactions on user-generated content in Wikipedia. From a collection of articles in the politics domain and their revision history, we investigate mechanisms by which relationships between Wikipedia contributors - in the form of signed directed links - can be i...
Conference Paper
Full-text available
This demonstration paper presents a probabilistic XML data merging tool, that represents the outcome of semi-structured document integration as a probabilistic tree. The system is fully automated and integrates methods to evaluate the uncertainty (modeled as probability values) of the result of the merge. It is based on the two-way tree-merge techn...
Article
Full-text available
We consider in this paper top-k query answering in social tagging systems, also known as folksonomies. This problem requires a significant departure from existing, socially agnostic techniques. In a network-aware context, one can (and should) exploit the social links, which can indicate how users relate to the seeker and how much weight their taggi...
Conference Paper
Full-text available
We report in this short paper results on inferring a signed network (a "web of trust") from user interactions. On the Wikipedia network of contributors, from a collection of articles in the politics domain and their revision history, we investigate mechanisms by which relationships between contributors - in the form of signed directed links - can b...
Conference Paper
Full-text available
As a result of the widespread use of social networking sites, millions of individuals can today easily share personal and confidential information with an incredible amount of possibly unknown other users. This raises the need of giving users more control on the distribution of their resources, which may be accessed by a community far wider than th...
Article
Full-text available
We present in this paper ObjectRunner, a system for extract- ing, integrating and querying structured data from the Web. Our system harvests real-world items from template-based HTML pages (the so-called structured Web). It illustrates a two-phase query- ing of the Web, in which an intentional description of the targeted data is first provided, in...
Article
We present in this paper ObjectRunner, a system for extracting, integrating and querying structured data from the Web. Our system harvests real-world items from template-based HTML pages (the so-called structured Web). It illustrates a two-phase querying of the Web, in which an intentional description of the targeted data is first provided, in a fl...
Article
Full-text available
this is a collective position paper presenting the vision, motivations and approaches of the ISICIL project. This project proposes to study and to experiment with the usage of new tools to assist tasks of corporate intelligence and technical watch. These tools rely on web 2.0 advanced interfaces (blog, wiki, social bookmarking) for interactions and...
Conference Paper
Full-text available
We present in this paper an approach for XQuery optimization that exploits minimization opportunities raised in composition-style nesting of queries. More precisely, we consider the simplification of XQuery queries in which the intermediate result constructed by a subexpression is queried by another subexpression. Based on a large subset of XQuery,...