Qatar Computing Research Institute
Recent publications
We propose a prescriptive learning approach for revenue management in air-cargo that combines machine learning prediction with decision making using deep reinforcement learning. This approach, named RL-Cargo, addresses a problem that is unique to the air-cargo business, namely the wide discrepancy between the quantity (weight or volume) that a shipper will book and the actual amount received at departure time by the airline. The discrepancy results in sub-optimal and inefficient behavior by both the shipper and the airline resulting in an overall loss of potential revenue for the airline. In the proposed approach, booking features and extracted disguised missing values are exploited to provide a prediction on the received volume, while a DQN method using uncertainty bounds from the prediction intervals is proposed for decision making. We have validated the benefits of RL-Cargo using a real dataset of 1000 flights to compare classical Dynamic Programming and Deep Reinforcement Learning techniques on offloading costs and revenue generation. Our results suggest that prescriptive learning which combines prediction with decision making provides a principled approach for managing the air cargo revenue ecosystem. Furthermore, the proposed approach can be abstracted to many other application domains where decision making needs to be carried out in face of both data and behavioral uncertainty.
Data exploration—the problem of extracting knowledge from database even if we do not know exactly what we are looking for —is important for data discovery and analysis. However, precisely specifying SQL queries is not always practical, such as “finding and ranking off-road cars based on a combination of Price, Make, Model, Age, Mileage, etc”—not only due to the query complexity (e.g.,the queries may have many if-then-else, and, or and not logic), but also because the user typically does not have the knowledge of all data instances (and their variants). We propose DExPlorer, a system for interactive data exploration. From the user perspective, we propose a simple and user-friendly interface, which allows to: (1) confirm whether a tuple is desired or not, and (2) decide whether a tuple is more preferred than another. Behind the scenes, we jointly use multiple ML models to learn from the above two types of user feedback. Moreover, in order to effectively involve human-in-the-loop, we need to select a set of tuples for each user interaction so as to solicit feedback. Therefore, we devise question selection algorithms, which consider not only the estimated benefit of each tuple, but also the possible partial orders between any two suggested tuples. Experiments on real-world datasets show that DExPlorer outperforms existing approaches in effectiveness.
The automatic identification of harmful content online is of major concern for social media platforms, policymakers, and society. Researchers have studied textual, visual, and audio content, but typically in isolation. Yet, harmful content often combines multiple modalities, as in the case of memes. With this in mind, here we offer a comprehensive survey with a focus on harmful memes. Based on a systematic analysis of recent literature, we first propose a new typology of harmful memes, and then we highlight and summarize the relevant state of the art. One interesting finding is that many types of harmful memes are not really studied, e.g., such featuring self-harm and extremism, partly due to the lack of suitable datasets. We further find that existing datasets mostly capture multi-class scenarios, which are not inclusive of the affective spectrum that memes can represent. Another observation is that memes can propagate globally through repackaging in different languages and that they can also be multilingual, blending different cultures. We conclude by highlighting several challenges related to multimodal semiotics, technological constraints, and non-trivial social engagement, and we present several open-ended aspects such as delineating online harm and empirically examining related frameworks and assistive interventions, which we believe will motivate and drive future research.
As a mainstream advertising channel, Search Engine Advertising (SEA) has a huge business impact and attracts a plethora of attention from both academia and industry. One important goal of SEA is to increase sales. Nevertheless, while previous research has studied multiple factors that are potentially related to the outcome of SEA campaigns, effects of these factors on actual sales generated by SEA remain understudied. It is also unclear whether and how such effects change over time in dynamic SEA campaigns that last for an extended period of time. As the first empirical investigation of the dynamic advertisement-sales relationship in SEA, this study builds an advertising response model within a time-varying coefficient (TVC) modeling framework, and estimates the model using a unique dataset from a large e-commerce retailer in the United States. Results reveal the effects of the advertising expenditure, consumer behaviors and advertisement characteristics on realized sales, and demonstrate that such effects on sales do change over time in non-linear ways. More importantly, we find that carryover has a stronger effect in generating sales than immediate or direct response does, and advertisers need to carefully decide how much to bid for higher ad positions. These findings have direct implications for business decision-making to launch more effective SEA campaigns and for SEA platforms to improve their pricing mechanism.
This research compares four standard analytics metrics from Google Analytics with SimilarWeb using one year’s average monthly data for 86 websites from 26 countries and 19 industry verticals. The results show statistically significant differences between the two services for total visits, unique visitors, bounce rates, and average session duration. Using Google Analytics as the baseline, SimilarWeb average values were 19.4% lower for total visits, 38.7% lower for unique visitors, 25.2% higher for bounce rate, and 56.2% higher for session duration. The website rankings between SimilarWeb and Google Analytics for all metrics are significantly correlated, especially for total visits and unique visitors. The accuracy/inaccuracy of the metrics from both services is discussed from the vantage of the data collection methods employed. In the absence of a gold standard, combining the two services is a reasonable approach, with Google Analytics for onsite and SimilarWeb for network metrics. Finally, the differences between SimilarWeb and Google Analytics measures are systematic, so with Google Analytics metrics from a known site, one can reasonably generate the Google Analytics metrics for related sites based on the SimilarWeb values. The implications are that SimilarWeb provides conservative analytics in terms of visits and visitors relative to those of Google Analytics, and both tools can be utilized in a complementary fashion in situations where site analytics is not available for competitive intelligence and benchmarking analysis.
We consider interdependent systems managed by multiple defenders that are under the threat of stepping-stone attacks. We model such systems via game-theoretic models and incorporate the effect of behavioral probability weighting that is used to model biases in human decision-making, as descended from the field of behavioral economics. We then incorporate into our framework called TASHAROK, two types of tax-based mechanisms for such interdependent security games where the central regulator incentivizes defenders to invest well in securing their assets so as to achieve the socially optimal outcome. We first show that due to the nature of our interdependent security game, no reliable tax-based mechanism can incentivize the socially optimal investment profile while maintaining a weakly balanced budget. We then show the effect of behavioral probability weighting bias on the amount of taxes paid by defenders, and prove that higher biases make defenders pay more taxes under the two mechanisms. We then explore voluntary participation in tax-based mechanisms. To evaluate our mechanisms, we use four representative real-world interdependent systems where we compare the game-theoretic optimal investments to the socially optimal investments under the two mechanisms. We show that the mechanisms yield higher decrease in the social cost for behavioral decision-makers compared to rational decision-makers.
We propose a novel framework for cross- lingual content flagging with limited target- language data, which significantly outperforms prior work in terms of predictive performance. The framework is based on a nearest-neighbor architecture. It is a modern instantiation of the vanilla k-nearest neighbor model, as we use Transformer representations in all its components. Our framework can adapt to new source- language instances, without the need to be retrained from scratch. Unlike prior work on neighborhood-based approaches, we encode the neighborhood information based on query– neighbor interactions. We propose two encoding schemes and we show their effectiveness using both qualitative and quantitative analysis. Our evaluation results on eight languages from two different datasets for abusive language detection show sizable improvements of up to 9.5 F1 points absolute (for Italian) over strong baselines. On average, we achieve 3.6 absolute F1 points of improvement for the three languages in the Jigsaw Multilingual dataset and 2.14 points for the WUL dataset.
We construct a lattice-based (key-policy) attribute-based signatures (ABS) scheme which supports attributes of unbounded polynomial length (the size of the public parameters is a fixed polynomial in the security parameter and a depth bound, with which one can generate signatures for attributes of arbitrary length). Our scheme does not rely on NIZKs, and we prove that our scheme is semi-adaptively unforgeable in the standard model; that is, the adversary can announce the challenge attribute after seeing the public parameters but before launching any query. Unlike our scheme, previous approaches either construct selectively unforgeable ABS schemes in the standard model that only support attributes of a-priori bounded polynomial length, or construct adaptively unforgeable ABS schemes that support attributes of unbounded polynomial length but relying on NIZKs. We adapt an existing technique developed by Brakerski and Vaikuntanathan for constructing lattice-based semi-adaptively secure (key-policy) attribute-based encryption (ABE) with unbounded attribute length. In particular, we use the adapted technique to generate an unbounded number of matrices out of a-priori bounded public matrices in the construction and program the challenge attribute into the public matrices in our semi-adaptive security proof. Moreover, to achieve adaptive signature query in our semi-adaptive security proof, we employ the traditional partitioning technique developed in identity-based systems to encode the message to be signed. Re-using and adapting lattice-based ABE technique and partitioning technique for lattice-based ABS should not be surprising since the three settings share many features, especially their security proof ideas.
Simulations are beneficial in evaluating clinicians’ empirical competencies through practical skills, prioritizing, and decision-making as part of patient care scenarios generally run in a full-scale physical context. However, such simulations require physical space, manufacturing, and replacement of damaged or used equipment. On the other hand, virtual reality (VR) computerized simulators are comparatively modern instruments for use in practical training. VR can be employed to simulate real-world situations without the actual need for physical devices. This work presents an ambulance patient compartment VR simulation that can be used by emergency medical services (EMS) staff to customize the configuration of the ambulance patient compartment according to their preference as well as for vehicle orientation or training purposes. The proposed simulation can be used repeatedly enabling the paramedics to access equipment in a fully immersive and safe environment. The user studies have demonstrated the usability and perceived effectiveness of the proposed simulation.
We propose Factual News Graph (FANG), a novel graphical social context representation and learning framework for fake news detection. Unlike previous contextual models that have targeted performance, our focus is on representation learning. Compared to transductive models, FANG is scalable in training as it does not have to maintain the social entities involved in the propagation of other news and is efficient at inference time, without the need to reprocess the entire graph. Our experimental results show that FANG is better at capturing the social context into a high-fidelity representation, compared to recent graphical and nongraphical models. In particular, FANG yields significant improvements for the task of fake news detection and is robust in the case of limited training data. We further demonstrate that the representations learned by FANG generalize to related tasks, such as predicting the factuality of reporting of a news medium.
Photo-response non-uniformity (PRNU) is an intrinsic characteristic of a digital imaging sensor, which manifests as a unique and permanent pattern introduced to all media captured by the sensor. The PRNU of a sensor has been proven to be a viable identifier for source attribution and has been successfully utilized for identification and verification of the source of digital media.
The data and Artificial Intelligence revolution has had a massive impact on enterprises, governments, and society alike. It is fueled by two key factors. First, data have become increasingly abundant and are often available openly. Enterprises have more data than they can process. Governments are spearheading open data initiatives by setting up data portals such as and releasing large amounts of data to the public. Second, AI engineering development is becoming increasingly democratized. Open source frameworks have enabled even an individual developer to engineer sophisticated AI systems. But with such ease of use comes the potential for irresponsible use of data. Ensuring that AI systems adhere to a set of ethical principles is one of the major problems of our age. We believe that data and model transparency has a key role to play in mitigating the deleterious effects of AI systems. In this article, we describe a framework to synthesize ideas from various domains such as data transparency, data quality, data governance among others to tackle this problem. Specifically, we advocate an approach based on automated annotations (of both data and the AI model), which has a number of appealing properties. The annotations could be used by enterprises to get visibility of potential issues, prepare data transparency reports, create and ensure policy compliance, and evaluate the readiness of data for diverse downstream AI applications. We propose a model architecture and enumerate its key components that could achieve these requirements. Finally, we describe a number of interesting challenges and opportunities.
Knowledge graphs represented as RDF datasets are integral to many machine learning applications. RDF is supported by a rich ecosystem of data management systems and tools, most notably RDF database systems that provide a SPARQL query interface. Surprisingly, machine learning tools for knowledge graphs do not use SPARQL, despite the obvious advantages of using a database system. This is due to the mismatch between SPARQL and machine learning tools in terms of data model and programming style. Machine learning tools work on data in tabular format and process it using an imperative programming style, while SPARQL is declarative and has as its basic operation matching graph patterns to RDF triples. We posit that a good interface to knowledge graphs from a machine learning software stack should use an imperative, navigational programming paradigm based on graph traversal rather than the SPARQL query paradigm based on graph patterns. In this paper, we present RDFFrames, a framework that provides such an interface. RDFFrames provides an imperative Python API that gets internally translated to SPARQL, and it is integrated with the PyData machine learning software stack. RDFFrames enables the user to make a sequence of Python calls to define the data to be extracted from a knowledge graph stored in an RDF database system, and it translates these calls into a compact SPQARL query, executes it on the database system, and returns the results in a standard tabular format. Thus, RDFFrames is a useful tool for data preparation that combines the usability of PyData with the flexibility and performance of RDF database systems.
Data is present in abundance, but the problem of imbalanced dataset crops up time and again, vexing classifiers and reducing accuracy. This paper introduces K Nearest Neighbor OveRsampling (KNNOR) Algorithm - a novel data augmentation technique that considers the distribution of data and takes into account the k nearest neighbors while generating artificial data points. The KNNOR algorithm has outperformed the state-of-the-art augmentation algorithms by enabling classifiers to achieve much higher accuracy after injecting artificial minority datapoints into imbalanced datasets. This method is useful especially in health datasets where an imbalance is common and can even be applied to images of lower dimensions.
Institution pages aggregate content on ResearchGate related to an institution. The members listed on this page have self-identified as being affiliated with this institution. Publications listed on this page were identified by our algorithms as relating to this institution. This page was not created or approved by the institution. If you represent an institution and have questions about these pages or wish to report inaccurate content, you can contact us here.
49 members
Muhammad Imran
  • Hamad Bin Khalida University
Preslav Nakov
  • Arabic Language Technologies
Sofiane Abbar
  • Social Computing
Ferda Ofli
  • Social Computing
Doha, Qatar