Recent publications
Uplift modeling studies the impact of different treatments on an outcome and has several applications, such as targeted marketing and medical therapy assessment. Usually, such studies are retro-spective in nature rather than controlled experiments; hence, pre-existing bias in the observed data can distort the true effect of treatment. Previous studies on uplift modeling has largely overlooked this bias, focusing instead on model-based solutions for specific applications. In this paper, we introduce a new data-driven framework for uplift modeling based on matching.We represent individuals in treatment and control groups using weighted bipartite graphs, and bipartite graph matching is then employed to retain similar individuals, thereby reducing bias. Subsequently, appropriate datasets are generated from the matched pairs for learning predictive models for individuals. Our proposed model-independent framework facilitates robust inferences, making it adaptable to a wide range of applications and settings. In particular, we demonstrate how uplift modeling can be applied to discrimination discovery and prevention in discrimination-aware data mining. We evaluate our framework on three datasets representing three different applications. The results reveal that our framework produces better prediction performances in comparison to others and is easy to apply in practice.
Security and protection of the data is the core objective of every organization, but since cyber-attacks got more advanced than ever before, the data is compromised more often, resulting in financial loss, life loss, or privacy breaches as its consequences. There must be a system that can deal with the increasing number of cyber-attacks in flight operations, which are increasing in numbers and sophistication. Since we know that the traditional intrusion detection system is not capable enough to protect the data and as many human lives are at stake in flight operations, an unfortunate data corruption attack could give rise to a catastrophe. In this paper, we proposed a blockchain-based intrusion detection system for flight operations framework to protect the data’s privacy and avoid data corruption in flight operations. Blockchain not only protects data from corruption but also circumvents the challenges faced by intrusion detection systems which include trust and consensus building between different nodes in a network that can enhance the capability of the intrusion detection system.
Authorship attribution refers to examining the writing style of authors to determine the likelihood of the original author of a document from a given set of potential authors. Due to the wide range of authorship attribution applications, a plethora of studies have been conducted for various Western, as well as Asian, languages. However, authorship attribution research in Urdu language has merely commenced, although Urdu is widely acknowledged as a prominent South Asian language. Furthermore, the existing studies on authorship attribution in Urdu have addressed a considerably easier problem of having less than twenty candidate authors, which is far from the real-world settings. Therefore, the findings from these studies may not be applicable to the real-world settings. To that end, we have made three key contributions: firstly, we have developed a large authorship attribution corpus for Urdu, a low-resource language. The corpus has over 2.6 million tokens and 21,938 news articles by 94 authors, making it a closer substitute to the real-world settings. Secondly, we have analyzed hundreds of stylometry features used in the literature to identify 194 features that are applicable to Urdu language and subsequently classified them into five categories. Finally, we have performed 66 experiments using two heterogeneous datasets to evaluate the effectiveness of four traditional and three deep learning techniques. The experimental results show: a) our developed corpus for a low-resource language is many folds larger than existing corpora and also more challenging than its counterparts. b) Convolutional Neutral Networks is the most effective technique as it achieves a nearly perfect F1 score of 0.989 for one corpus and 0.910 for our newly developed corpus.
A Fermatean fuzzy set is a more powerful tool to deal with uncertainties in the given information as compared to intuitionistic fuzzy set and Pythagorean fuzzy set and has energetic applications in decision-making. Aggregation operators are very helpful for assessing the given alternatives in the decision-making process, and their purpose is to integrate all the given individual evaluation values into a unified form. In this research article, some new aggregation operators are proposed under the Fermatean fuzzy set environment. Some deficiencies of the existing operators are discussed, and then, new operational law, by considering the interaction between the membership degree and nonmembership degree, is discussed to reduce the drawbacks of existing theories. Based on Hamacher’s norm operations, new averaging operators, namely, Fermatean fuzzy Hamacher interactive weighted averaging, Fermatean fuzzy Hamacher interactive ordered weighted averaging, and Fermatean fuzzy Hamacher interactive hybrid weighted averaging operators, are introduced. Some interesting properties related to these operators are also presented. To get the optimal alternative, a multiattribute group decision-making method has been given under proposed operators. Furthermore, we have explicated the comparison analysis between the proposed and existing theories for the exactness and validity of the proposed work.
Recently great strides have been made in the field of automatic speech recognition (ASR) by using various deep learning techniques. In this study, we present a thorough comparison between cutting-edged techniques currently being used in this area, with a special focus on the various deep learning methods. This study explores different feature extraction methods, state-of-the-art classification models, and vis-a-vis their impact on an ASR. As deep learning techniques are very data-dependent different speech datasets that are available online are also discussed in detail. In the end, the various online toolkits, resources, and language models that can be helpful in the formulation of an ASR are also proffered. In this study, we captured every aspect that can impact the performance of an ASR. Hence, we speculate that this work is a good starting point for academics interested in ASR research.
Many websites over the Internet are producing a variety of textual data; such as news, research articles, ebooks, personal blogs, and user reviews. In these websites, the textual data is so large that the process of finding pertinent information by a user often becomes cumbersome. To overcome this issue, “Text-based Recommendation Systems (RS)” are being developed. They are the systems with the capability to find the relevant information in a minimal time using text as the primary feature. There exist several techniques to build and evaluate such systems. And though a good number of surveys compile the general attributes of recommendation systems, there is still a lack of comprehensive literature review about the text-based recommendation systems. In this paper, we present a review of the latest studies on text-based RS. We have conducted this survey by collecting literature from preeminent digital repositories, that was published during the period 2010-2020. This survey mainly covers the four major aspects of the textual based recommendation systems used in the reviewed literature. The aspects are datasets, feature extraction techniques, computational approaches, and evaluation metrics. As benchmark datasets carry a vital role in any research, publicly available datasets are extensively reviewed in this paper. Moreover, for text-based RS many proprietary datasets are also used, which are not available in the public. But we have consolidated all the attributes of these publically available and proprietary datasets to familiarize these attributes to new researchers. Furthermore, the feature extraction methods from the text are briefed and their usage in the construction of text-based RS are discussed. Later, various computational approaches that use these features are also discussed. To evaluate these systems, some evaluation metrics are adopted. We have presented an overview of these evaluation metrics and diagramed them according to their popularity. The survey concludes that Word Embedding is the widely used feature selection technique in the latest research. The survey also deduces that hybridization of text features with other features enhance the recommendation accuracy. The study highlights the fact that most of the work is on English textual data, and News recommendation is the most popular domain.
Objectives
Real-time COVID-19 spread mapping and monitoring to identify lockdown and semi-lockdown areas using hotspot analysis and geographic information systems and also near future prediction modeling for risk of COVID-19 in Punjab, Pakistan.
Study design
Data for all COVID-19 cases were collected until 20 October 2020 in Punjab Province.
Methods
The methodology included geotagging COVID-19 cases to understand the trans-mobility areas for COVID-19 and characterize risk. The hotspot analysis technique was used to identify the number of areas in danger zones and the number of people affected by COVID-19. The complete lockdown areas were marked down geographically to be selected by the government of Pakistan based on increased numbers of cases.
Results
As per predictive model estimates, almost 9.2 million people are COVID-19 infected by 20 October 2020 in Punjab Province. The compound growth rate of COVID-19 decreased to 0.012% per day and doubling rate increased to 364.5 days in Punjab Province. Based on Pueyo model predictions from past temporal data, it is more likely that Punjab and Pakistan entered into peak around the first week of July 2020, and the decline of growth rate (and doubling rate) of reported cases started afterward. Hospital load was also measured through the Pueyo model, and mostly, people in the 60+ years age group are expected to dominate the hospitalized population.
Conclusions
Pakistan is experiencing a high number of COVID-19 cases, with the maximum share from Punjab, Pakistan. Statistical modeling and compound growth estimation formulation were done through the Pueyo model, which was applied in Pakistan to identify the compound growth of COVID-19 patients and predicting numbers of patients shortly by slightly modifying it as per the local context.
Compared to facial expression recognition, expression synthesis requires a very high-dimensional mapping. This problem exacerbates with increasing image sizes and limits existing expression synthesis approaches to relatively small images. We observe that facial expressions often constitute sparsely distributed and locally correlated changes from one expression to another. By exploiting this observation , the number of parameters in an expression synthesis model can be significantly reduced. Therefore, we propose a constrained version of ridge regression that exploits the local and sparse structure of facial expressions. We consider this model as masked regression for learning local receptive fields. In contrast to the existing approaches, our proposed model can be efficiently trained on larger image sizes. Experiments using three publicly available datasets demonstrate that our model is significantly better than 0 , , 1 and 2-regression, SVD based approaches, and kernelized regression in terms of mean-squared-error, visual quality as well as computational and spatial complexities. The reduction in the number of parameters allows our method to generalize better even after training on smaller datasets. The proposed algorithm is also compared with state-of-the-art GANs including Pix2Pix, CycleGAN, StarGAN and GANi-mation. These GANs produce photo-realistic results as long as the testing and the training distributions are similar. In contrast, our results demonstrate significant generalization of the proposed algorithm over out-of-dataset human photographs , pencil sketches and even animal faces.
Nowadays, video cameras are increasingly used for surveillance, monitoring, and activity recording. These cameras generate high resolution image and video data at large scale. Processing such large scale video streams to extract useful information with time constraints is challenging. Traditional methods do not offer scalability to process large scale data. In this paper, we propose and evaluate cloud services for high resolution video streams in order to perform line detection using Canny edge detection followed by Hough transform. These algorithms are often used as preprocessing steps for various high level tasks including object, anomaly, and activity recognition. We implement and evaluate both Canny edge detector and Hough transform algorithms in Hadoop and Spark. Our experimental evaluation using Spark shows excellent scalability and performance compared to Hadoop and standalone implementations for both Canny edge detection and Hough transform. We obtained a speedup of 10.8x and 9.3x for Canny edge detection and Hough transform respectively using Spark. These results demonstrate the effectiveness of parallel implementation of computer vision algorithms to achieve good scalability for real-world applications.
Named Entity Recognition (NER) refers to the identification of proper nouns from natural language text and classifying them into Named Entity (NE) types, such as person, location, organization, etc. Due to the widespread applications of NER, numerous NER techniques and benchmark datasets have been developed for the Western as well as Asian languages. Even though Shahmukhi script of the Punjabi language has been used by nearly three-fourth of the Punjabi speakers worldwide, Gurmukhi has been the main focus of the research activities. Specifically, a benchmark NER corpus for Shahmukhi is nonexistent, which has thwarted the commencement of NER research for the Shahmukhi script. To this end, this paper presents the development and specifications of the first-ever NER corpus for Shahmukhi. The newly developed corpus is composed of 318,275 tokens and 16,300 NEs, including 11,147 persons, 3,140 locations, and 2,013 organizations. To establish the strength of our corpus, we have compared the specifications of our corpus with its Gurmukhi counterparts. Furthermore, we have
demonstrated the usability of our corpus using five supervised learning techniques, including two state-of-the-art deep
learning techniques. The results are compared and valuable insights about the behaviours of the most effective technique
are discussed.
The popular social networks such as Facebook, Twitter, and Foursquare closely monitor user activities to recommend different services and events. Among others, venue recommendation proposes users the most appropriates venues based on the user preferences. It offers facility to the user to mark the check‐ins when a venue is visited. The traditional venue recommendation systems have opted collaborative filtering to propose recommendations. However, collaborative filtering overlooked certain critical issues, including real‐time recommendations, cold start, and scalability, for venue recommendations. Moreover, real‐time physical factors such as distance from the venue are also not considered in traditional venue recommendation systems. Furthermore, parsing and processing of huge volume of unstructured data is the main challenge for conventional recommender systems, particularly when dealing with real‐time recommendations. For efficient scaling, significant computational and storage resources for recommendation systems are desired. This article proposes a Real‐Time Venue Recommendation (RTVR) model that utilizes cloud‐based MapReduce framework to process, compare, mine, and manage large data sets for generating recommendations. The results showed that the proposed model has improved accuracy for real‐time recommendations. The proposed RTVR is more scalable as it exploits a cloud‐based architecture. Moreover, the proposed techniques are verified using formal verification methods.
Named Entity Recognition (NER) plays a pivotal role in various natural language processing tasks, such as machine translation and automatic question-answering systems. Recognizing the importance of NER, a plethora of NER techniques for Western and Asian languages have been developed. However, despite having over 490 million Urdu language speakers worldwide, NER resources for Urdu are either non-existent or inadequate. To fill this gap, this article makes four key contributions. First, we have developed the largest Urdu NER corpus, which contains 926,776 tokens and 99,718 carefully annotated NEs. The developed corpus has at least doubled the number of manually tagged NEs as compared to any of the existing Urdu NER corpora. Second, we have generated six new word embeddings using three different techniques, fastText, Word2vec, and Glove, on two corpora of Urdu text. These are the only publicly available embeddings for the Urdu language, besides the recently released Urdu word embeddings by Facebook. Third, we have pioneered in the application of deep learning techniques, NN and RNN, for Urdu named entity recognition. Finally, we have performed 10-folds of 32 different experiments using the combinations of a traditional supervised learning and deep learning techniques, seven types of word embeddings, and two different Urdu NER datasets. Based on the analysis of the results, several valuable insights are provided about the effectiveness of deep learning techniques, the impact of word embeddings, and variations of datasets.
Large-scale data centers are composed of thousands of servers organized in interconnected racks to offer services to users. These data centers continuously generate large amounts of telemetry data streams (e.g., hardware utilization metrics) used for multiple purposes, including resource management, workload characterization, resource utilization prediction, capacity planning, and real-time analytics. These telemetry streams require costly bandwidth utilization and storage space, particularly at medium-long term for large data centers. This paper addresses this problem by proposing and evaluating a system to efficiently reduce bandwidth and storage for telemetry data through real-time modeling using Markov chain based methods. Our proposed solution was evaluated using real telemetry datasets and compared with polynomial regression methods for reducing and reconstructing data. Experimental results show that data can be lossy compressed up to
for bandwidth utilization and
for storage space, with reconstruction accuracy close to
.
Since last few years, computers have become a prominent part of the court of law. Courts generate an enormous amount of unstructured text on daily basis. Extraction of the desired information from this unstructured legal text is one of the major issues. So, there is a need to develop an intelligent system that can automatically find useful and critical information from available text. Such system will help judges and lawyers in their judgments and case preparations, common people in understanding law and finding appropriate lawyer for their legal issues. Therefore, in this research, Punjab University Legal Mining System (PULMS) is developed using three different supervised machine learning algorithms; Conditional Random Field (CRF), Maximum Entropy (MaxEnt) and Trigrams N Tag (TNT). To train the system, 304 criminal miscellaneous judgments of the Lahore High Court (LHC) of Pakistan are manually tagged for nine Named Entities (NE). After training, among three machine learning algorithms the system achieved significant precision, recall, and f-measure using CRF which are 0.97, 0.87, and 0.89 respectively.
Topographic maps are large scale representation of earth features. Most important characteristic of these maps is third dimension representation of objects by using contour lines. Topographic maps are the base layers of any spatial data infrastructure. Advances in spatial sciences demand more accurate and up to date topographic representation. While traditional methods of topographic surveying are time consuming and impractical in complex topographic regions. Remote sensing and GIS techniques are being used to develop topographic data of such remote areas with great accuracy, less labor and time consumption. This study uses high resolution satellite stereo images to generate Digital Elevation Model (DEM) at 1 m contour interval of complex topographic region of Baluchistan, Pakistan. Stereo images are two satellite images of same position of the earth captured along the same track with different angles. For handling and processing the stereo images Ground Control Points (GCPs) are used. GCP is a permanent referencing points needs to orient the stereo images such that they are in the same orientation as the camera was at the time of exposure. Overlapping the stereo pair according to the well distributed ground control points generates the 3D surface which is generated from 2D stereo images. This 3D surface is being compared and validated with 90 m STRM’s and 30 m ASTER datasets of the same area. The pattern of contours at an interval of 1 m is being analyzed at all three sources of DEM and most accurate pattern is being identifies through this research.
In many parts of the world, including the Hindukush-Karakoram-Himalayan (HKH) region, the population depends on snow and glaciers-melt waters to grow food, generate electricity and store water for their use throughout the year. The annual and seasonal variation in the snow cover area (SCA) due to its response to the climatic variables directly influences the water supplies. The aim of current study is to evaluate the spatio-temporal trends in the annual and seasonal snow cover at basin-wide scale. Impact of topography on the SCA is analyzed by revealing the trends in SCA in different elevation bands. It also investigates the trends in hydro-climatic factors (temperature, precipitation and river flows) in the Chitral River basin (Hindukush region) and their linkage to the SCA variations. Snow cover is estimated using cloud-free 8-day MODIS snow products on 17-year time-period (2000‒2016). Hydro-climatic data of the river flow (1989–2014), temperature, and precipitation (1965–2013) were obtained from Water and Power Development Authority (WAPDA), and Pakistan Meteorological Department (PMD). Trend analysis of the SCA and hydro-climatic variables was carried out using Mann-Kendall’s trend test and Sen’s slope. The results reveal: 1) a significant increasing trend in the SCA at basin-wide scale and at all elevation zones i.e. A to E (1471‒7708 m ASL), except zone B (2500‒3500 m ASL), 2) decreasing and constant trends in the mean temperature and total precipitation, respectively, over the same time-period as SCA indicating possible reasons for increasing SCA, 3) a slight decrease in the mean annual and summer flow (1989–2014) possibly due to the summer cooling, reduced snowmelt and slightly decreasing summer precipitation over this time-period, and 4) strong dependency of the Chitral River flow on snowmelt, driven by the temperature seasonality. Modeling snowmelt runoff under future climate projections in the study area may help to manage the water resources properly.
Image de-fencing is often used by digital photographers to remove regular or near-regular fence-like patterns from an image. The goal of image de-fencing is to remove a fence object from an image in such a seamless way that it appears as if the fence never existed in the image. This task is mainly challenging due to a wide range intra-class variation of fence, complexity of background, and common occlusions. We present a novel image de-fencing technique to automatically detect fences of regular and irregular patterns in an image. We use a data-driven approach that detects a fence using encoded images as feature descriptors. We use a variant of the histograms of oriented gradients (HOG) descriptor for feature representation. We modify the conventional HOG descriptor to represent each pixel rather than representing a full patch. We evaluated our algorithm on 41 different images obtained from various sources on the Internet based on a well-defined selection criteria. Our evaluation shows that the proposed algorithm is capable of detecting a fence object in a given image with more than 98% accuracy and 87% precision.
Considering the importance of early diagnosis of breast cancer, a supervised patch-wise texton-based approach has been developed for the classification of mass abnormalities in mammograms. The proposed method is based on texture-based classification of masses in mammograms and does not require segmentation of the mass region. In this approach, patches from filter bank responses are utilised for generating the texton dictionary. The methodology is evaluated on the publicly available Digital Database for Screening Mammography database. Using a naive Bayes classifier, a classification accuracy of 83% with an area under the receiver operating characteristic curve of 0.89 was obtained. Experimental results demonstrated that the patch-wise texton-based approach in conjunction with the naive Bayes classifier constructs an efficient and alternative approach for automatic mammographic mass classification.
Institution pages aggregate content on ResearchGate related to an institution. The members listed on this page have self-identified as being affiliated with this institution. Publications listed on this page were identified by our algorithms as relating to this institution. This page was not created or approved by the institution. If you represent an institution and have questions about these pages or wish to report inaccurate content, you can contact us here.
Information