Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage

A framework for the clustering and categorization of CISA reports

Article

Full-text available

Oct 2022

Ireneusz Czarnowski

This paper presents a framework for text clustering and categorisation. The proposed clustering approach is based on a modified existing similarity-based clustering algorithm, which was originally developed for well-structured data. In this study, the clustering algorithm is used to map text documents into clusters, in order to discover groups of topical documents. The clusters produced in this way are also used for the categorisation of new documents that are uploaded to the system. The algorithms are discussed using as an example the analysis of text documents including Industrial Control Systems (ICS) Advisory Reports and Common Vulnerabilities and Exposures (CVE) recommendations, together available and provided by the Cybersecurity and Infrastructure Security Agency (CISA). Experiments are carried out, although the main focus is on the clustering algorithm. Based on the experimental results, it can be concluded that the proposed similarity-based clustering algorithm can be considered as an alternative approach for text clustering.

Transforming Web Data Into Knowledge - Implications for Management

Article

Full-text available

Aug 2019

Much of one’s online behavior, including browsing, shopping, posting, is recorded in databases on companies’ computers on a daily basis. Those data sets are referred to as web data. The patterns which are the indicators of one’s interests, habits, preferences or behaviors are stored within those data. More useful than an individual indicator is when a company records data on all its users and when it gains an insight into their habits and tendencies. Detecting and interpreting such patterns can help managers to make informed decisions and serve their customers better. Utilizing data mining with respect to web data is said to turn them into web knowledge. The research study conducted in this paper demonstrates how data mining methods and models can be applied to the web-based forms of data, on the one hand, and what the implications of uncovering patterns in web content, the structure and their usage are for management.

Enhancement of Web pages Retrieval over adapting crawler

Book

Full-text available

Oct 2017

Dunia Hamid Hameed

PhD thesis

Thesis

Full-text available

Mar 2016

A new approach for web usage mining using case based reasoning

Article

Full-text available

Jul 2020

In this study, we present a new approach for Web Usage Mining using Case Based Reasoning. Case-Based Reasoning techniques are a knowledge-based problem-solving approach which is based on the reuse of previous work experience. Thus, the past experience can be deemed as an efficient guide for solving new problems. Web personalization systems which have the capability to adapt the next set of visited pages to individual users according to their interests and navigational behaviors have been proposed. The proposed architecture consists of a number of components, namely, basic log preprocessing, pattern discovery methods (By Case Based Reasoning and peer to peer similarity—Clustering—association rules mining methods), and recommendations. One of the issues considered in this study is that there are no recommendations to those who are different from the existing users in the log file. Also, it is one of the challenges facing the recommendations systems. To deal with this problem, Apriori algorithm was designed individually in order to be utilized in presenting recommendations; in other words, in cases where recommendations may be inadequate, using association rules can enhance the overall system performance recommendations. A new method used in this study is clustering algorithms for Nominal web data. Our evaluations show that the proposed method along with Standard case-classified Log provides more effective recommendations for the users than the Logs with no case classification.

Efficiency of Web Crawling for Geotagged Image Retrieval

Article

Full-text available

Jun 2019

Nancy Fazal

Approach for social media content-based analysis for vacation resorts

Article

Full-text available

Sep 2019

The impact of social networks on our liveskeeps increasing because they provide content,generated and controlled by users, that is constantly evolving. They aid us in spreading news, statements, ideas and comments very quickly. Social platforms are currently one of the richest sources of customer feedback on a variety of topics. A topic that is frequently discussed is the resort and holiday villages and the tourist services offered there. Customer comments are valuable to both travel planners and tour operators. The accumulation of opinions in the web space is a prerequisite for using and applying appropriate tools for their computer processing and for extracting useful knowledge from them. While working with unstructured data, such as social media messages, there isn’t a universal text processing algorithm because each social network and its resources have their own characteristics. In this article, we propose a new approach for an automated analysis of a static set of historical data of user messages about holiday and vacation resorts, published on Twitter. The approach is based on natural language processing techniques and the application of machine learning methods. The experiments are conducted using softwareproduct RapidMiner.

Simulasi Algoritma Apriori dan FP-Growth Dalam Menentukan Rekomendasi Kodefikasi Barang Pada Transaksi Persediaan

Article

Full-text available

Feb 2024

Keberhasilan proses pembangunan memerlukan dukungan optimal dalam pertukaran data dan informasi antar instansi guna mencapai integrasi sistem yang seimbang antara pemerintah dan para pengguna. SAKTI, sebuah aplikasi keuangan tingkat instansi, telah dirancang untuk mengelola segala aspek keuangan, mulai dari perencanaan hingga pertanggungjawaban anggaran. Aplikasi SAKTI ini mengintegrasikan semua aplikasi satuan kerja yang ada, bertujuan untuk meningkatkan efektivitas, efisiensi, transparansi, dan akuntabilitas dalam pengelolaan keuangan. Meskipun telah diimplementasikan sejak awal tahun 2022, operator komitmen masih menghadapi kendala dalam penentuan kodefikasi barang, terutama karena kurangnya familiaritas dengan tugas tersebut dan jumlah barang yang banyak sebagai referensi. Kesalahan yang dilakukan oleh operator komitmen dapat berdampak pada proses pendetailan aset pada modul persediaan dan aset. Dalam penelitian ini, peneliti menggunakan metode Algoritma Apriori dan frequent pattern growth (FP-growth) sebagai alat untuk menemukan sejumlah aturan asosiasi dari data transaksi barang yang disimpan dalam basis data aplikasi SAKTI. Hasil simulasi menunjukkan bahwa aturan yang memenuhi minimum support dan minimum confidence, dengan pemilihan terbanyak adalah Ballpoint Standar Tecno, refill tisu plastik, Lak Ban Hitam 2 Inchi Merk Daimaru, dan Ballpoint Kenko K1 (0,5) sebesar 100%.

The State-of-the-Art and Challenges on Recommendation System’s: Principle, Techniques and Evaluation Strategy

Article

Full-text available

Sep 2023

In this digital era, users and service providers are facing various decisions that prompt data over-burden. The choices should be separated and focused on or altered so that the actual data is passed with significant subtleties to the service provider or to the intended user. A recommender framework or engine handles the information overload problem by customizing and filtering the large volume of data and generating the customer’s appropriate information dynamically with personalized content. This comprehensive study focuses on several recommender systems (RecSys) methodologies and discusses the problems or issues associated with different principles and techniques. In addition to the various principles and techniques, this study elaborates on several similarity measures, including conventional and non-conventional measures, with their merits and demerits also points out both ranking and non-ranking performance metrics. Further, we have studied different articles, including journals and conferences. Based on the studies, we outline current research challenges as future directions. We have briefly discussed various datasets utilized in the recommender domain for evaluating and validating the recommendation task.

Identifying Drone Web Sites in Multiple Countries and Languages with a Single Model

Article

Full-text available

Jan 2023
J Data Sci

A text-based, bag-of-words, model was developed to identify drone company websites for multiple European countries in different languages. A collection of Spanish drone and non-drone websites was used for initial model development. Various classification methods were compared. Supervised logistic regression (L2-norm) performed best with an accuracy of 87% on the unseen test set. The accuracy of the later model improved to 88% when it was trained on texts in which all Spanish words were translated into English. Retraining the model on texts in which all typical Spanish words, such as names of cities and regions, and words indicative for specific periods in time, such as the months of the year and days of the week, were removed did not affect the overall performance of the model and made it more generally applicable. Applying the cleaned, completely English word-based, model to a collection of Irish and Italian drone and non-drone websites revealed, after manual inspection, that it was able to detect drone websites in those countries with an accuracy of 82 and 86%, respectively. The classification of Italian texts required the creation of a translation list in which all 1560 English word-based features in the model were translated to their Italian analogs. Because the model had a very high recall, 93, 100, and 97% on Spanish, Irish and Italian drone websites respectively, it was particularly well suited to select potential drone websites in large collections of websites.

Sosyal Bilimlerde Veri Madenciliğinin Pazarlama Alanında Kullanımı

Article

Full-text available

Dec 2022

Bahar Türk

Geçmişi ve bugünü anlamanın, geleceğe daha net bakmamıza yardım ettiği söylenebilir. Özellikle bilgi çağında, dijitalleşmenin de katkısıyla oluşan devasa veriler bu anlamlandırmayı daha önemli kılmaktadır. Bunu başarabilmek için elimizdeki en etkili yöntemlerden biri ise veri madenciliğidir. Veri madenciliği söz konusu verilerin içerisinde anlamlı ilişkileri, kalıpları ve eğilimleri keşfetmeye dayalı üretkenliği arttırmaya yönelik bir araçtır. Sosyal bilimlerde ve pazarlama alanında sıklıkla kullanılan veri madenciliği, keşfettiği anlamlı kalıplar ve ilişkilerle, müşterilerin gelecekteki davranışlarını tahmin etmeye yönelik öngörü geliştirmekte; ürün tekliflerinin nasıl yapılandırılması gerektiği gibi satış ve hizmet fonksiyonlarını destekleyerek işletmeler için birçok avantaj yaratmaktadır. Bu bağlamda çalışmada, sosyal bilimlerde veri madenciliği ve uygulamalarına ilişkin genel bilgi verilmesi, ardından pazarlama alanında veri madenciliği kullanımının değerlendirilmesi amaçlanmıştır. Bu sayede veri madenciliği kavramının sosyal bilimciler açısından daha net anlaşılmasına ve benimsenmesine, pazarlama alanında veri madenciliği uygulamalarının artmasına, dolayısıyla teoriye ve sektöre sağlayacağı katkıyı arttırmasına destek olacağı düşünülmektedir.

Customized Learning in Online Tutoring Systems by Mining Learning Units from Tasks and Examples

Article

Full-text available

Mar 2022

HJ-Biplot as a Tool to Give an Extra Analytical Boost for the Latent Dirichlet Assignment (LDA) Model: With an Application to Digital News Analysis about COVID-19

Article

Full-text available

Jul 2022

This work objective is to generate an HJ-biplot representation for the content analysis obtained by latent Dirichlet assignment (LDA) of the headlines of three Spanish newspapers in their web versions referring to the topic of the pandemic caused by the SARS-CoV-2 virus (COVID-19) with more than 500 million affected and almost six million deaths to date. The HJ-biplot is used to give an extra analytical boost to the model, it is an easy-to-interpret multivariate technique which does not require in-depth knowledge of statistics, allows capturing the relationship between the topics about the COVID-19 news and the three digital newspapers, and it compares them with LDAvis and heatmap representations, the HJ-biplot provides a better representation and visualization, allowing us to analyze the relationship between each newspaper analyzed (column markers represented by vectors) and the 14 topics obtained from the LDA model (row markers represented by points) represented in the plane with the greatest informative capacity. It is concluded that the newspapers El Mundo and 20 M present greater homogeneity between the topics published during the pandemic, while El País presents topics that are less related to the other two newspapers, highlighting topics such as t_12 (Government_Madrid) and t_13 (Government_millions).

An Effective Clustering-Based Web Page Recommendation Framework for E-Commerce Websites

Article

Full-text available

Jul 2021

The burgeoning e-commerce market has presented companies with the opportunity to grow their businesses through online platforms. But, the researchers have concluded that just 2.86% of e-commerce website visits lead to a purchase and one of the reasons for this missed opportunity is an unpleasant website browsing experience. Therefore, a pleasant browsing experience is the need of the hour whereby the web page recommendation systems (WPRS) provide high-quality navigation experience by providing suggestions about the web pages of interest and by taking the website users to their desired web pages in fewer clicks. In this context, this paper presents a method to improve the browsing experience of the website users by proposing two hybrid algorithms based on clustering for web page recommendation systems, namely a hybrid partitioning-based heuristic sequence clustering (HSC) algorithm inspired from K-medoid and DBSCAN algorithms and a hybrid tree-based sequence clustering (TSC) algorithm inspired from B-Trees and BIRCH algorithm. The testing has been performed using CTI, BMSWebView1, BMSWebView2 and MSNBC datasets. To measure the performance, the algorithm considered for the study has been evaluated using parameters like precision, recall, F1 measures and execution time. Also, an in-depth comparative analysis of state-of-the-art web page recommendation systems with the recommendation system considered for the study has been done. The results indicate that the proposed clustering-based framework was able to generate superior results than the other classes of algorithms.

ChJS-12-01-02 Final

Article

Full-text available

May 2021

The Friedman Test is used for problems similar to a wine contest, where we want to check if there is any difference between the wines. We have analyzed the problems where the judges might find ties between the wines, and produced exact tables for the problem. Using the tables instead of an asymptotic estimate might circumvent errors, at least for the case of small number of wines and judges

Online Web Navigation Assistant

Article

Full-text available

Apr 2021

The problem of finding relevant data while searching the internet represents a big challenge for web users due to the enormous amounts of available information on the web. These difficulties are related to the well-known problem of information overload. In this work, we propose an online web assistant called OWNA. We developed a fully integrated framework for making recommendations in real-time based on web usage mining techniques. Our work starts with preparing raw data, then extracting useful information that helps build a knowledge base as well as assigns a specific weight for certain factors. The experiments show the advantages of the proposed model against alternative approaches.

Suitability Determination of Machine Learning Techniques for the Operational Quality Assessment of Geophysical Survey Results

Article

Full-text available

Dec 2020

Well logging, also known as a geophysical survey, is one of the main components of a nuclear fuel cycle. This survey follows directly after the drilling process, and the operational quality assessment of its results is a very serious problem. Any mistake in this survey can lead to the culling of the whole well. This paper examines the feasibility of applying machine learning techniques to quickly assess the well logging quality results. The studies were carried out by a reference well modelling for the selected uranium deposit of the Republic of Kazakhstan and further comparing it with the results of geophysical surveys recorded earlier. The parameters of the geophysical methods and the comparison rules for them were formulated after the reference well modelling process. The classification trees and the artificial neural networks were used during the research process and the results obtained for both methods were compared with each other. The results of this paper may be useful to the enterprises engaged in the geophysical well surveys and data processing obtained during the logging process.

Data Mining and Knowledge Discovery in Databases

Chapter

Jan 2019

Ana Azevedo

The term knowledge discovery in databases or KDD, for short, was coined in 1989 to refer to the broad process of finding knowledge in data, and to emphasize the “high-level” application of particular data mining (DM) methods. The DM phase concerns, mainly, the means by which the patterns are extracted and enumerated from data. Nowadays, the two terms are, usually, indistinctly used. Efforts are being developed in order to create standards and rules in the field of DM with great relevance being given to the subject of inductive databases. Within the context of inductive databases, a great relevance is given to the so-called DM languages. This chapter explores DM in KDD.

A Perspective on Data Mining Integration with Business Intelligence

Chapter

Jan 2011

Business Intelligence (BI) is an emergent area of the Decision Support Systems (DSS) discipline. Over the past years, the evolution in this area has been considerable. Similarly, in the last years, there has been a huge growth and consolidation of the Data Mining (DM) field. DM is being used with success in BI systems, but a truly DM integration with BI is lacking. The purpose of this chapter is to discuss the relevance of DM integration with BI, and its importance to business users. From the literature review, it was observed that the definition of an underlying structure for BI is missing, and therefore a framework is presented. It was also observed that some efforts are being done that seek the establishment of standards in the DM field, both by academics and by people in the industry. Supported by those findings, this chapter introduces an architecture that can conduct to an effective usage of DM in BI. This architecture includes a DM language that is iterative and interactive in nature. This chapter suggests that the effective usage of DM in BI can be achieved by making DM models accessible to business users, through the use of the presented DM language.

Support Vector Machine and Back Propagation Neural Network Approach for Text Classification

Article

Full-text available

Jun 2017

Text classification is the process of inserting text into one or additional categories. Text categorization has many of significant application, Mostly in the field of organization, and for browsing within great groups of document. It is sometimes completed by means of "machine learning.". Since the system is built based on a wide range of document features."Feature selection." is an important approach within this process, since there are typically several thousand possible features terms. Within text categorization, The target goal of features selection is to improve the efficiency of procedures and reliability of classification by deleting features that have no relevance and non-essential terms. While keeping terms which hold enough data that facilitate with the classification task. The target goal of this work is to increase the efficient text categorization models. Within the "text mining" algorithms, a document is appearing as "vector" whose dimension is that the range of special keywords in it, which can be very large. Classic document categorization may be computationally costly. Therefore, feature extraction through the singular valued decomposition is employed for decrease the dimensionality of the documents, we are applying classification algorithms based on "Back propagation" and "Support Vector Machine." methodology. before the classification we applied "Principle Component Analysis." technique in order to improve the result accuracy . We then compared the performance of these two algorithms via computing standard precision and recall for the documentscollection.

DESIGN OF A RECOMMENDER SYSTEM FOR ONLINE SHOPPING USING DECISION TREE AND APRIORI ALGORITHM

Article

Dec 2018

With the growing data available on the Internet, customization of the web sites information has become a requirement for users. A procedure for the appropriate customization of web data is configured by automatic extraction of combined knowledge of the log file and user profile information. In this paper, integrating decision tree and association rules for user profile information and log information of website in an online shopping store is targeted. The tangible results of such a framework for decision makers and marketers are customization of web pages and statistical analysis for sale improvement. Applying association rules, the website users' patterns are mined and utilizing decision tree users are classified and their interests are determined. By combining the results of two algorithms and its analysis, the behavior models from user profile, user interests in terms of age and gender, and the most visited web pages by subject can be achieved.

Property-based biomass feedstock grading using k-Nearest Neighbour technique

Article

Oct 2019
ENERGY

Energy generation from biomass requires a nexus of different sources irrespective of origin. A detailed and scientific understanding of the class to which a biomass resource belongs is therefore highly essential for energy generation. An intelligent classification of biomass resources based on properties offers a high prospect in analytical, operational and strategic decision-making. This study proposes the k-Nearest Neighbour (k-NN) classification model to classify biomass based on their properties. The study scientifically classified 214 biomass dataset obtained from several articles published in reputable journals. Four different values of k (k=1,2,3,4) were experimented for various self normalizing distance functions and their results compared for effectiveness and efficiency in order to determine the optimal model. The k–NN model based on Mahalanobis distance function revealed a great accuracy at k=3 with Root Mean Squared Error (RMSE), Accuracy, Error, Sensitivity, Specificity, False positive rate, Kappa statistics and Computation time (in seconds) of 1.42, 0.703, 0.297, 0.580, 0.953, 0.047, 0.622, and 4.7 respectively. The authors concluded that k–NN based classification model is feasible and reliable for biomass classification. The implementation of this classification models shows that k–NN can serve as a handy tool for biomass resources classification irrespective of the sources and origins.

MULTI-CLASS TEXT SUPERVISED CLASSIFICATION ON ROMANIAN FINANCIAL BANKING REVIEWS

Conference Paper

May 2019

Development of Ensemble Probabilistic Machine Learning Models for Rainfall Predictions

Conference Paper

Mar 2024

Rain is of paramount importance for Indian agriculture, as it serves as the primary source of water for crops, sustaining agricultural productivity and ensuring food security for millions of people. In India’s predominantly rain-fed agriculture, timely and adequate rainfall is crucial for successful crop growth, making it a lifeline for farmers and a determining factor in the country’s agricultural output. Accurate rainfall predictions are essential for various applications, including agriculture, water resource management, and disaster preparedness. Ensemble machine learning models have demonstrated their capability to enhance the accuracy and reliability of rainfall predictions compared to single models. This article presents a comparative analysis of different ensemble techniques applied to rainfall prediction tasks. We explore various ensemble approaches, including Averaging, Max Voting, and Stacking, and evaluate their performance using rainfall day-wise datasets from Pantnagar (29.0222° N, 79.4908° E), Uttarakhand, India, from 2010 to 2022.

A hybrid AHP-PROMETHEE II onshore wind farms multicriteria suitability analysis using kNN and SVM regression models in northeastern Greece

Article

Full-text available

Dec 2023
RENEW ENERG

Wind energy presents a high growth potential in the EU as an emission reduction strategy and to achieve the climate neutrality goal by 2050. Wind farms suitability analysis is one of the primary goals in the spatial planning of wind energy developments. This research paper introduces a hybrid spatial multicriteria GIS-based framework that combines Analytic Hierarchy Process (AHP), PROMETHEE II and Machine Learning algorithms to determine and predict the most efficient onshore wind farm locations by generating suitability index mappings. The methodology allows to overcome PROMETHEE II limitations in raster driven suitability analysis, utilizing machine learning regression methods as the k Nearest Neighbor and Support Vector Machines to predict a graduating mapping of suitability index for wind farm locations in northeastern Greece. The best configured models presented a RMSE of 0.0344 and 0.0154 respectively, indicating a quite high predictive performance. Suitability results indicate that 56.10% of the feasible locations in the Thrace area present a positive outranking character for the kNN model and 56.79% for the SVR model. The proposed framework, enriched by PROMETHEE II capabilities, assists energy and spatial planners in identifying suitable sites for wind farm siting and enables rational decision making that enhances efficient wind energy investments.

INTERESSE DOS INTERNAUTAS EM PORTAIS DE NOTÍCIAS BRASILEIROS ENTRE 2017 E 2018

Article

Full-text available

Apr 2023

Nos últimos anos, os cidadãos brasileiros têm se interessado cada vez mais sobre política, especialmente quando o assunto afeta a economia, e assim procurando notícias em portais online. Neste trabalho, analisamos alguns portais de notícias nacionais para identificar quais temas são mais presentes na sessão de notícias mais lidas e entender as preferências dos usuários. Para isso, o sistema desenvolvido faz a coleta das páginas mais lidas de cada portal e categoriza as notícias encontradas. Para testar e validar o sistema, catalogamos as notícias mais lidas do ano de 2017 e 2018 de três portais: UOL, Veja e Estadão. Os dados demonstram que os usuários do portal UOL têm uma preferência por notícias de entretenimento, enquanto no Estadão, o público é dividido entre Política e Entretenimento, e na Veja a preferência é por notícias de política, economia e opinião. Os resultados mostram que notícias de política tiveram um aumento de leituras no ano de 2018, comparado ao ano de 2017.

Approach to Relevance Based Data Filtering in Data Retrieval Tasks

Chapter

Jan 2023

The research is devoted to solving the problem of conjugation of the virtual information space and the physical world in terms of data retrieval. Wherein the methods for solving the problem of extracting data from the virtual information space are determined by these data themselves (data-driven). The paper discusses ways to solve the problem of thematic content obtaining (data retrieval) from an unstructured set of information resources or news feeds. The problems of the “growing bubble” of unprocessed documents that arise during the “blind” collection of documents are discussed and ways to solve these problems are proposed in the paper. To reduce the resource consumption of the problem of forming a periodically updated search base, three approaches to the automatic collection of “raw data” are proposed. The proposed approaches to developing the sub-search systems are the part of a large class of modern methods and algorithms of adaptive heterogeneous data filtering for content retrieval and aggregation in subject oriented knowledge bases formation. A possible field of application and relevance are determined by the fact that for the closed cyber-physical systems the use of sub-search systems is proposed with unlimited and unstructured information space as an input that is being processed in real time. One of the possible implementations of proposed methods is in the development of the knowledge bases for science-technical documentation.

Tracing Sites Requests in Using Web Applications

Chapter

Full-text available

Jan 2023

Quite often, developers face low performance, hanging, and other problems when they’re developing sites. To solve such problems, we need to trace site requests. Existing tracing methods do not allow tracing the progress of requests from a client’s web browser to a server or group of servers. In this paper, we propose distributed tracing mechanism that allows tracking requests starting from the browser. For generating complete client-to-server tracing, the client application must be able to initiate the appropriate request. For the execution of these actions, we need to use a unique library. In the paper, we consider the algorithm of such a library. A popular tracer (OpenTracing) is used on the serverside. Based on the proposed methodology, a library was developed. The library's work was tested. Testing has shown that using the library, and we can track the complete chain of requests from a browser to the server. Trace result is presented in graphical view. This allows analyzing received data and finding bottlenecks when queries are passing. The novelty of the proposed solution is that the request is traced from the client application and to the client application. That is, the full path of the request is shown. The result is presented in a graphical form that is convenient for analysis. The library is designed primarily for the development of client-server applications and for support services.

PERSPECTIVE DEVELOPMENT DIRECTIONS OF THE AGRICULTURAL FIELD IN THE TERRITORIES LIBERATED FROM OCCUPATION

Article

Dec 2022

Rashad Novruzov Rashad Novruzov

It is known that in the former Soviet Union, Azerbaijan was a country that exported cotton and fruit, tobacco, wine, and canned fruit and vegetable products. In the early 1990s, the collapse of the union, the deterioration of economic relations between the former republics and the loss of traditional markets led to a sharp decrease in production. In such a situation, Armenia's groundless territorial claim against Azerbaijan, the coming to power of the Popular Front, their incompetent management, and internal strife worsen the political and economic situation in the country. strained. As a result, Armenia's occupation of 20 percent of our territories and the creation of problems of more than one million refugees and internally displaced persons dealt a heavy blow to the agricultural sector. With this, the material and technical base created over many years weakened, product markets were lost, and the production of agricultural products decreased sharply. Thus, in 1990-1993, the balance between the prices of industrial products and agricultural products was disturbed in the country, and a difficult situation arose in the development of the social and production infrastructure of the village. The construction of schools, culture, household services and health facilities was practically stopped. During this period, the depreciation of the main funds accelerated, the level of equipment armament of the agricultural and processing industry decreased. Production the application of the achievements of scientific and technical progress in the processes was limited. Due to the mentioned reasons, our country turned from an exporting country to an importing country. A number of measures were taken to overcome the crisis in the country, and since 1993, confident steps were taken to strengthen state building and revive the economy. The main task was to form market relations, develop entrepreneurship and improve domestic production by effectively using existing potentials. National Leader Heydar Aliyev, who returned to the leadership of the country decided to take decisive steps, establish stability in the country and implement economic reforms. For this purpose, under the leadership of the great leader in 1993-1995 agrarian policy directions for the next 5-10 years were determined and a number of measures were implemented. Azerbaijan has been under the aggression of Armenia for 30 years. The purpose of the study is to assess the condition of the agricultural sector in the territories freed from occupation by our victorious army, to determine the measures to be implemented and to prepare proposals for socio-economic development goals. The methodology of the research is based on the analysis of the measures implemented by Azerbaijan after regaining its independence, the creation of a legal framework leading the country from recession to dynamic development, and the analysis of a number of consistent and systematic relationships. The applied importance of the research is that it can be used in the preparation of socio-economic development programs and measures of the liberated territories. The results of the study are to use the positive experiences gained in Azerbaijan for the socio-economic development of the territories freed from occupation. With this, the development of agricultural production in Karabakh can be achieved on the basis of new techniques and technologies. As a result of the implementation of the proposed proposals, modern agricultural production and processing enterprises and specialized cooperatives can be created in these areas. Originality and scientific innovation of research. In the article, three factors of the development of the agricultural sector in the liberated territories of Azerbaijan are considered. Keywords: de-occupied territories, investment, resources, users, targets, reforms.

BİNAR СE -CU-O KATALİZATORLARIN AKTİVLİYİNİN ONLARIN TURŞU – ƏSAS XASSƏLƏRİNDƏN ASILILIĞI

Article

Dec 2022

Səlimə Məmmədova, Tünzalə İbrahimova Səlimə Məmmədova, Tünzalə İbrahimova

Məqalədə binar serium-mis oksid katalizatorları üzərində etanolun dehidrogenləşməsi reaksiyasının tədqiqi aparılmışdır. Müəyyən edilmişdir ki, Ce-Cu-O katalizatorlarında etanol əsasən sirkə aldehidi, aseton,etilasetat, etilen . və karbon dioksidə çevrilir. Öyrənilən şəraitdə serium-mis okid katalizatorlarında buten 1-in buten 2-yə izomerləşməsi reaksiyasında alınan məhsul trans və sis-buten 2-dir. 350C –dən yüksək temperaturda isə dərin oksidləşmə məhsulları yəni CO vəCO2 əmələ gəlir. Müəyyən edilmişdir ki, katalizatorların tərkibində seriumun miqdarının artması ilə trans və sis buten-2-nin çıxımı azalır ki, bu da katalizator səthinin turşuluğunun azalmasından xəbər verir. Müəyyən edilmişdir ki, serium-mis oksid katalizatorlarında etanolun dehidrogenləşməsinin səthin turşuluğunun artması ilə sirkə aldehidinin məhsuldarlığı və onun selektivliyi minimumdan keçir. Tədqiqatlar göstərdi ki, Ce-Cu-O kaatalizatorunda etanol əsasən aşağı temperaturda sirkə aldehidi, etilasetata və 3500C- dən yüksək temperaturda sirkə aldehidi və aseton alınır. Tədqiqatlar nəticəsində müəyyən olunuşdur ki, serium – mis oksid katalizatorları buten-1 buten 2-yə izomerləşməsi reaksiyasında kiçik aktivliyə malikdir.Seriumla zəngin olan nümunələrdə buten 1-in izomerləşməsi reaksiyası 2500 C temperaturdan başlayır və buten—nin çıxımının cəmi 15%- dən artıq olmur .Ce-Cu-Ovkatalizatorlarının trans-sis izomerlərinin nisbəti 0,17-0,56% intervalında dəyişir. Eyni nisbətlərdə olan katalizatorun izomerləşməsi minimum bərabərdir. 2500C temperaturda buten-1-in buten-2- yə izomerləşməsi reaksiyasında tərkibində misin miqdarı çox olan nümunələr aktivlik göstərir, eyni zamanda yüksək temperaturda tərkibi seriumla zəngin olan nümunələr də aktivlik göstərirlər. Heterogen katalitik reaksiyalar ilkin qaz maddələrinin bərk katalizator səth ilə qarşılıqlı təsirini əhatə edən mürəkkəb bir prosesdir .Məlumdur ki, reksiyanın sürətinə aktiv mərkəzlərin sayı təsir göstərir.Ona görə də katalizatorun aktivliyini artırmaq üçün aktiv mərkəzlərin sayının artırılması vacib bir məsələdir.Sintez edilmiş katalizatorların xüsusi səthini təyin etmək üçün azotun istilik desorbsiyası üsulu ilə ölçülmüşdür.Ce-Cu oksid katalizatorlarında xüsusi səth katalizatorun tərkibində serium miqdarının artması ilə əvvəlcə azalır sonra artmağa başlayır.Ce-Cu= 3:7 nümunəsində xüsusi səth 7,1 m2/q- a qədər azalır. 9:1 nisbətində isə bu göstərici 16,5 m2/q-a qədər artır. O, katalitik sistemə daxil olan ilkin oksidlər yəni Ce və Cu oksidlərin qiymətləri ölçülmüşdür və uyğun olaraq 6,5 və 0,7 m2/q- a bərabərdir. Açar sözlər : etanol, dehidrogenləşmə sirkə aldehidi, binar katalizatorlar, izomerləşmə.

YÜKSƏK PARAFİNLİ NEFTİN REOLOJİ PARAMETRLƏRİNƏ NANOTƏRKİBLİ KOMPOZİTİN TƏSİRİNİN TƏDQİQİ

Article

Dec 2022

Fikrat Seyfiyev, Aysel Qasımzadə Mehparə Adıgözəlova, Fərid Əhmadli

Hazırda quyu-yığım sistemində asfalten-qatran-parafinin çökməsi prosesi, bu neftlərin nəqlinin asanlaşdırılması, neft kəmərlərinin ötürücülüyünün artırılması diqqət mərkəzində olan aktual problemlərdən olaraq qalmaqdadır.Bir çox ölkələrdə asfalten-qatran,parafin çöküntülərinin yaranması nəqli çətinləşdirməklə yanaşı, korroziya problemlərinin artmasına, karbohidrogen itkisinə və iqtisadi cəhətdən xərclərin artmasına səbəb olur. Bu çöküntülərin stabilliyini artıran və çöküntü miqdarına səbəb olan faktorlardan biri də neftin tərkibində suyun miqdarının çox olmasıdır.Neftin tərkibində su dispers fazalarının faiz miqdarı artdıqca kolloidlik artır. Çünki, asfalten,qatran və parafin də neft sistemində asılı vəziyətdə qalmaqla kolloid sistem əmələ gətirirlər, su disper fazaları da həmin kolloidiyə daxil olduqda yaranan çöküntülər daha sıx rabitələrlə bir-birinə bağlanırlar ki, bu cür çöküntülərin yaranması əleyhinə tərbirlər görmək çətinlik yaradır. Neft müxtəlif komponentləri özündə birləşdirən nanosistemdir. Bu nanosistemi asfaltenlər,qatranlar,parafinlər,müxaniki qarışıqlar və su dispers fazaları təşkil edir. Çöküntünü yaradan komponentlər də həmin nanosistemi yaradan komponentlərdir ki, xüsusi quruluş əmələ gətirərək neftlərin reologiyasına təsir göstərirlər. Quruluşun mərkəzində asfaltenlər, sonra qatranlar,sonra isə parafinlər cəmləşirlər. Bu quruluşun ərtafina mexaniki qarışıqlar yapışaraq daha da mürəkkəbləşmə yaradırlar. Ən sonda isə su fazası quruluşu əhatə edir. Nəticədə su daxilində olan rabitələr mexaniki qarışıq, parafin, qatran və asfalten daxilində müxtəlif çevrilmələr yaradır. Eyni zamanda su həmin qeyd olunan komponentlər tərəfindən adsorbsiya olunaraq peptidləşir ki, ən çox neftin reologiyasına təsir edən faktorlardandır. Reologiya dedikdə ən önəmli parametr özlülükdür ki, peptidləşmiş quruluşlar da özlülüyün qiymətini artırır. Yaranmış quruluşlar digər quruluşlarla birləşərək zəncirvari sistem əmələ gətirirlər. Həmin zəncirvari sistemin yaranması ilə neft daxilində kinetik və aqreqativ davamılılıq pozulur. Davamlılığın pozulması nəticəsində koaqulyasiya prosesi baş verir. Bir-birinə birləşmiş quruluşlar neft sistemi daxilində böyük kütlə əmələ gətirdikdən sonra isə sedimentasiya prosesi ilə başlayır. Sedimentasiya prosesi çöküntü yaranması deməkdir. Sedimentasiya nə qədər sürətli getsə çöküntünün miqdarı və qalınlığı daha çox olur.Bu növ neftlərin nəqlini asanlaşdırmaq məqsədilə böyük kapital qoyuluşu tələb etmədən kimyəvi reagentlərin tətbiqi ilə nəql prosesinin səmərəliliyinin artırılması qarşıda duran əsas vəzifələrdən biridir. Bu mqsədlə laboratoriya şəraitində fərdi- Difron-3970 və ND-NDP-1 reagentlərinin və yeni hazırlanmış nanotərkibli Difron-3970+Cu və ND-NDP-1+Cu kompozisiyaların ayrı-ayrılıqda və birgə təsiri SOCAR-ın 28 May yatağının 412 saylı quyusundan götürülmüş neft nümünəsində tədqiq edilmişdir.Tətbiq edilən səthi aktiv maddələrdən yeni nanokompozisiya yüksək parafinli neft nümunəsinin donma temperaturuna, onun susuzlaşmasına və ondan asfalten-qatran-parafin çökmələrinə qarşı daha effektiv olmuşdur. Belə ki, nanohissəcik vəsəthi aktiv reagentdən ibarət kompozisiya neftin donma temperaturunu +15°С-dən uyğun olaraq Difron-3970+Cureagenti -2˚C və ND-NDP-1+Cu reagent isə -8˚C-yə kimi aşağı salır. Eyni zamanda deemulsasiya qabiliyyəti baxımından fərdi reagentlər olan Difron-3970 və ND-NDP-1 reagentləri 60% sulaşması olan nefti uyğun olaraq 15% və 4% susuzlaşdırdığı halda nanokompozitlər isə 10% və 2% miqdara qədər susuzlaşdırır. Ilk dəfə olaraq laboratoriya şəraitində “Soyuq borucuq” üsulu ilə Difron-3970 və ND-NDP-1 reagentlərinəCu nanohissəciklərinin əlavəsi əsasında hazırlanmış yeni kompozisiyanın asfalten-qatran-parafin çöküntülərinin metallik səth üzərinə çökməsinə təsiri tədqiq edilmişdir.Təcrübələr ”Soyuq borucuğ”un 200C temperaturunda depressor aşqarların müxtəlif qatılıqlarında (100-700 q/t) aparılmışdır. Təcrübələrdən alınan nətəcələrə əsaslanaraq fərdi reagentlərin və nanohissəciyin əlavəsi ilə hazırlanan kompozisiyanıneffektliyi və eyni zamanda borucuğun səthində toplanan parafin çöküntülərinin maksimal faiz miqdarı hesablanmışdır. Ən yüksək effektivlik“ND-NDP-1+Cu”depressor aşqarının 700 q/t qatılığında 99% müşahidə olmuşdur. Açar sözlər: nanohissəcik, kompozisiya, soyuq borucuq, yüksək parafinli neft, deemulsasiya, donma temperaturu.

TECHNOLOGIES OF DATA PROCESSING AND CLEANING, NOISE IDENTIFICATION AND REMOVAL AT TIME SERIES

Article

Dec 2022

Elvin Jafarov Elvin Jafarov

The efficiency of analytical information processing is primarily due to the quality of the input data array. Cleaning the input data is an important step in any analysis. The presence of noise and anomalies can significantly affect the result of the study and lead to an erroneous conclusion. However, as well as excessive cleaning, accompanied by the loss of potentially valuable elements of observation. Despite the constant optimization of systems for collecting and processing information, the development of an effective methodology for eliminating inaccuracies in a data set is still an area of increased interest in the scientific community. The continuously increasing volume of information flows predetermines the need to form an adequate tool for cleaning time series from noise. Especially relevant in modern conditions is the task of improving the accuracy of identifying noise elements. The article provides an overview of existing methods for identifying and eliminating the noise component in one-dimensional and multidimensional time series, used in foreign practice, their features and shortcomings are emphasized. Foreign approaches to the classification of technologies are considered and analyzed. Based on the results of the analysis, a set of the most effective techniques was determined. Keywords: time series, data cleaning, noise, filtering and noise removal, array of data.

Intelligent Feature Selection on Multivariate Dataset using Advanced Data Profiling

Conference Paper

Jun 2022

A Method of Data Clustering for Detecting Outlier from K-Means Clusters

Article

Full-text available

Jan 2021

Classification in data mining is one of the major functionalities that is performed either by predicting the value of unknown class labels on the basis of previously labeled data or to make groups of the dataset on the basis of some implicit similarity measure. Clustering works on unsupervised datasets and converts datasets to groups on the basis of some measures like Euclidean distance in K Means Clustering. The performance of K Means can significantly be affected by outliers. Outliers are not dealt in the K Means algorithm. This paper proposes a change in the K Means algorithm to accommodate the method for outlier detection on the basis of the threshold value. The threshold value of the outlier named as clus_span is computed by taking distance of each point from each other point and dividing it by the total number of points. All the points of a dataset that do not qualify the value of the minimum threshold are considered as outliers. New K Means with this add-in is tested on benchmark dataset for identification of outliers and compared with the existing K means algorithm in terms of accuracy. An improvement in performance is evident.

Sitios web municipales de Mexico: Perspectiva interdisciplinaria de computacion y administracion publica

Book

Full-text available

Jun 2022

Con una perspectiva interdisciplinaria de computacion y de administracion publica, este libro presenta un panorama descriptivo y exploratorio de los sitios web (portales web, paginas web) de los gobiernos municipales de Mexico, en su situacion entre los años 2020 y 2021. Inicialmente, enfrenta el reto de encontrar todos (o la mayor cantidad posible de) estos sitios web, considerando que las pocas listas o directorios de esta informacion disponibles en fuentes gubernamentales se encuentran en un estado de actualizacion incierto. El reto se supera aplicando algunas tecnicas de web data mining y tecnicas manuales. Un producto importante de esta investigacion es un dataset que concentra las direcciones digitales de un importante numero de websites municipales, superior al de las fuentes oficiales. Se producen otros dos datasets: uno de informacion sociodemografica de los habitantes y otro de las caracteristicas administrativas y politicas de los gobiernos de los municipios. Con estos tres datasets (que los autores ofrecen para descarga gratuita en la web), se analizan, se descubren y se representan mediante tecnicas estadisticas y de aprendizaje automatico los perfiles sociodemograficos y gubernamentales de los municipios que tienen website oficial, permitiendo diferenciarlos de aquellos que no lo tienen. Los resultados consisten en una serie de analisis de estadistica descriptiva, mapas y modelos de aprendizaje automatico supervisado. Los modelos se producen usando algoritmos generadores de arboles y de reglas clasificadores (los modelos producidos pueden descargarse gratuitamente). Los resultados se complementan con una propuesta para facilitar el estudio continuo de los websites municipales mexicanos a mediano y largo plazo. La propuesta consiste en implementar un directorio de estos websites que pueda actualizarse continuamente en modo semi- automatizado y un repositorio automatizado de replicas de estos websites. Con ello, se facilitara su observacion y su analisis, tanto transeccional como longitudinal.

An uncertain diagnostic system of the constructional and technological preferences

Conference Paper

Full-text available

Sep 2016

Anna Bryniarska

In this paper will be defined the diagnostic agent that finds information about the constructional and technological preferences of the customers for a reference technical object. Based on this agent can be implemented an uncertain system that can be used to define customers preferences. For imprecise information from users this system would give their preferences and exact data which can be used to improve production.

Monitoring drug trends in the digital environment–New methods, challenges and the opportunities provided by automated approaches

Article

Full-text available

Apr 2021
INT J DRUG POLICY

Developments in information technology have impacted on all areas of modern life and in particular facilitated the growth of globalisation in commerce and communication. Within the drugs area this means that both drugs discourse and drug markets have become increasingly digitally enabled. In response to this, new methods are being developed that attempt to research and monitor the digital environment. In this commentary we present three case studies of innovative approaches and related challenges to software-automated data mining of the digital environment: (i) an e-shop finder to detect e-shops offering new psychoactive substances, (ii) scraping of forum data from online discussion boards, (iii) automated sentiment analysis of discussions in online discussion boards. We conclude that the work presented brings opportunities in terms of leveraging data for developing a more timely and granular understanding of the various aspects of drug-use phenomena in the digital environment. In particular, combining the number of e-shops, discussion posts, and sentiments regarding particular substances could be used for ad hoc risk assessments as well as longitudinal drug monitoring and indicate “online popularity”. The main challenges of digital data mining involve data representativity and ethical considerations.

Research on the Overall Attitude Towards Mobile Learning in Social Media: Emotions Mining Approach

Chapter

Mar 2021

In this paper, we address the importance of classification and social media mining of human emotions. We compared different theories about basic emotions and the application of emotion theory in practice. Based on Plutchik’s classification, we suggest creating a specialized lexicon with terms and phrases to identify emotions for research of general attitudes towards mobile learning in social media. The approach can also be applied to other areas of scientific knowledge that aim to explore the emotional attitudes of users in social media. It is based on the Natural Language Processing and more specifically uses text mining classification algorithms. For test purposes, we’ve retrieved a number of tweets on users’ attitudes towards mobile learning.

Botnet Threats to E-Commerce Web Applications and Their Detection

Chapter

Jan 2021

Security issues in e-commerce web applications are still exploratory, and in spite of an increase in e-commerce application research and development, lots of security challenges remain unanswered. Botnets are the most malicious threats to web applications, especially the e-commerce applications. Botnet is a network of BOTs. It executes automated scripts to launch different types of attack on web applications. Botnets are typically controlled by one or more hackers known as Bot masters and are exploited for different types of attacks including Dos (denial of service), DDos (distributed denial of service), phishing, spreading of malware, adware, Spyware, identity fraud, and logic bombs. The aim of this chapter is to scrutinize to what degree botnets can cause a threat to e-commerce security. In the first section, an adequate overview of botnets in the context of e-commerce security is presented in order to provide the reader with an understanding of the background for the remaining sections.

Clustering Examples in Web-based Tutoring Systems based on Relevance of Concepts

Conference Paper

Oct 2020

Botnet Threats to E-Commerce Web Applications and Their Detection

Chapter

Jan 2018

Security issues in e-commerce web applications are still exploratory, and in spite of an increase in e-commerce application research and development, lots of security challenges remain unanswered. Botnets are the most malicious threats to web applications, especially the e-commerce applications. Botnet is a network of BOTs. It executes automated scripts to launch different types of attack on web applications. Botnets are typically controlled by one or more hackers known as Bot masters and are exploited for different types of attacks including Dos (denial of service), DDos (distributed denial of service), phishing, spreading of malware, adware, Spyware, identity fraud, and logic bombs. The aim of this chapter is to scrutinize to what degree botnets can cause a threat to e-commerce security. In the first section, an adequate overview of botnets in the context of e-commerce security is presented in order to provide the reader with an understanding of the background for the remaining sections.

Data Mining and Business Intelligence

Chapter

Jan 2016

Ana Azevedo

Data Mining (DM) is being applied with success in Business Intelligence (BI) environments, and several examples of applications can be found. BI and DM have different roots and, as a consequence, have significantly different characteristics. DM came up from scientific environments; thus, it is not business oriented. DM tools still demand heavy work in order to obtain the intended results. On the contrary, BI is rooted in industry and business. As a result, BI tools are user-friendly. This chapter reflects on this difference from a historical perspective. Starting with a separated historical perspective of each one, BI and DM, the author then discusses how they converged into the current situation, when DM is used, and integrated, in BI environments with success.

A Customized Children Friendly and Secure Search Engine

Conference Paper

Feb 2020

Implementation of Web mining in predicting the stock market price of chemical products group trend

Article

Full-text available

Jul 2020

Market forecasting, like stock's market, with high volume of transactions affect researchers and investors and get their attention. The risk and turnover are two important issue factors in any investment decision. Understanding market momentum gives the ability to predict future movements. The ability to predict in a market economy, enables to achieve a higher turnover by reducing risk and avoiding financial losses. News plays an important role in the process of evaluating the current stock price. The development of data mining methods, computational intelligence and machine learning algorithms led to new models of prediction. phpCrawler is a php-base content crawler that operates on DomCralwer Problematic sentence structure and Guzzle packages for storing web data. With this tool, the news releases are stored and categorized from 17 news agencies. Then, by using text mining and support vector machine with different kernel, predict stock price direction. In this research use 948990, news has been stored from 17 news agencies. More than 300,000 news regarding political and economic categories were used, and moreover, stock prices of chemicals between November 2017 until March 2018 (123 trading days) were studied. The results show that by using the linear kernel Support Vector Machine algorithm, the prediction accuracy of average price movement reached to 83%. Using nonlinear kernel Support Vector Machine with poly kernel increased two percent prediction accuracy to 85% on average and other kernel had poorer prediction.

Towards an automated repository for indexing, analysis and characterization of municipal e-government websites in Mexico

Preprint

Full-text available

Jun 2020

This article addresses a problem in the electronic government discipline with special interest in Mexico: the need for a concentrated and updated information source about municipal e-government websites. One reason for this is the lack of a complete and updated database containing the electronic addresses (web domain names) of the municipal governments having a website. Due to diverse causes, not all the Mexican municipalities have one, and a number of those having it do not present information corresponding to the current governments but, instead, to other previous ones. The scarce official lists of municipal websites are not updated with the sufficient frequency, and manually determining which municipalities have an operating and valid website in a given moment is a time-consuming process. Besides, website contents do not always comply with legal requirements and are considerably heterogeneous. In turn, the evolution development level of municipal websites is valuable information that can be harnessed for diverse theoretical and practical purposes in the public administration field. Obtaining all these pieces of information requires website content analysis. Therefore, this article investigates the need for and the feasibility to automate implementation and updating of a digital repository to perform diverse analyses of these websites. Its technological feasibility is addressed by means of a literature review about web scraping and by proposing a preliminary manual methodology. This takes into account known, proven, techniques and software tools for web crawling and scraping. No new techniques for crawling or scraping are proposed because the existing ones satisfy the current needs. Finally, software requirements are specified in order to automate the creation, updating, indexing, and analyses of the repository.

Archive devices in the Humanities the digital transformation

Article

Full-text available

Jun 2020

Manuel Cebral-Loureda

The concept of archiving as an object of study in the humanities is worked by approaches from the end of the 20th century, such as structuralism and post-structuralism. Distancing from the great modern theoretical constructions, these approaches understand communication and culture through the analysis of signs and their factual arrangement in records of all kinds - texts, images, symbols -, avoiding considering any transcendental orientation. With the digital revolution, new technologies arise that propitiate new archive types. Through data mining, new correlations and textual structures are discovered. This article wonders if there are pre-digital problems that persist in the new digital medium, as well as the way in which structuralist and post-structuralist debates can rethink some of the controversies that today affect digital Humanities.

Social Network Analysis and Mining: Privacy and Security on Twitter

Conference Paper

Full-text available

Jan 2020

Web Data Mining Extracting From 'Big Data'

Article

Full-text available

Dec 2015

Abdul Karim Siddiqui

Being an interdisciplinary subject of study Data Mining has become new and curious subject among researchers. As our capabilities of both generating and collecting data have been increasing rapidly, it has become dynamic and fast expanding field with great strength. The thirsts of required data include the computerization of business, scientific, and government transactions; informative data search on different topics, digital images, e-purchasing of online products etc. In addition, popular use of the World Wide Web as a global information system has flooded us with a tremendous amount of data and information. This explosive growth in stored or transient data has generated an urgent need for new techniques and automated tools that can intelligently assist us in transforming the vast amounts of data into useful information and knowledge.

They Know What You Will Do Next Click

Chapter

Jan 2020

Serra Çelik

This chapter focuses on predicting web user behaviors. When web users enter a website, every move they make on that website is stored as web log files. Unlike the focus group or questionnaire, the log files reflect real user behavior. It can easily be said that having actual user behavior is a gold value for the organizations. In this chapter, the ways of extracting user patterns (user behavior) from the log files are sought. In this context, the web usage mining process is explained. Some web usage mining techniques are mentioned.

Efficient Ranking Framework for Information Retrieval Using Similarity Measure

Chapter

Jan 2020

The information on the web is increasing day by day and to manage such vast amount of information is really a difficult task. The user finds it really hard to capture the desired information as per their need and maximum amount of time is spent in framing proper query and filtering the resultant web pages. The search engine plays a major role in filtering the information and ranking the desired result. The quest for accurate information is still a dream and in this regard this paper presents an approach that tries to optimize the ranking algorithm by employing document clustering and similarity measures. In this paper we present an outline of different ranking algorithms and proposed an approach where PageRank algorithm is optimized by using document clustering. It also employs content mining along with structural mining that help to reduce the computational complexity of the algorithm and thereby diminish the time in performing the ranking of the web pages.

Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage

Abstract

Recommended publications

Do serial verb constructions describe single events?: A study of co-speech gestures in Avatime

Subjective, individual and aggregate references in educational research

Social norms and individual tax-compliance

An Efficient GA-Based Algorithm for Mining Negative Sequential Patterns