Article

Importance of Web Scraping in E-Commerce and E-Marketing

Authors:
To read the full-text of this research, you can request a copy directly from the author.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... The method of accessing and retrieving an unstructured information from the web, to create an organized information that will be stored in focal database to be dissected in spread sheets is also known as Web scraping or web scratching [4]. Web scraping which is also known as data mining according to [10], is the process of aggregating considerable amounts of data from the web and then storing it unto a database for future analysis and subsequent use. The automatic collection and processing of information from the internet can be referred to as web scraping [11]. ...
... The former is more versatile, it supports the extraction of information from both online and offline sources, while the latter is specific to the extraction of information from website. One common misconception is the use of Web scraping and data mining interchangeably as was reported by [10,16]. Conversely, data mining involves advanced pattern-matching algorithms and high-end statistical analyses to assist in identifying trends and patterns [18]. ...
Article
Full-text available
Data scraping is a concept that involves the extraction of relevant data from a pool of information stored in a computer. Data scraping is universally known as web scraping because the web contains massive amount of information that is easily accessible and extracted. Web scraping is valuable to all field of human endeavour. The paper gives a vivid conceptualization of data scraping and some minor misconception between data mining, web crawling and web scraping. Furthermore, the phases and procedure of data scraping were outlined. The merit of web scraping over API scraping were explicated. Moreover, the numerous software and tools that support the scraping of websites were stated. Even though web scraping has vast prominence, there are also some technical issues and challenges associated with it. Finally, some of the legal and ethical issues related to information extraction were discussed and it is obvious that data scraping is permitted as long as users comply to the terms and conditions of the target site.
... Reference [26] used web crawling and scraping methods to collect HTML data from an e-commerce website in order to determine when a product has been updated. Also, the work presented in [27] studies the relevance of web scraping in online marketing and e-commerce. The authors outline the benefits of scraping method and give a real-world example that ecommerce companies and internet marketers can use. ...
Article
Big Data (BD) scraping systems are among the recommended approaches for large-scale web data extraction. However, these systems for collecting large amounts of data face many challenges, including processing, storage, and data extraction reliability. Due to its potentials, cloud computing is becoming a viable solution to support BD scraping systems. This paper tenders a cloud based-web scraping framework for weather BD extraction and analysis. The aim is to extract weather BD from web sources, analyze this data and use it for visualization and forecasting purposes, and this by enabling elastic and on-demand resources. The framework is implemented using Selenium and Amazon Web Services and tested with Morocco weather data. Analysis of the performance and scalability of the proposed cloud-based scrapper shows that it offers greater efficiency in terms of data collection and analysis and in terms of forecast quality, thanks to the ability to take advantage of cloud resources.
... Sehingga, banyaknya website freelance akan mengakibatkan sulitnya para freelancer untuk melakukan sumarisasi keseluruhan informasi lowongan. Permasalahan tersebut dapat diselesaikan menggunakan teknik web scraping yang bertujuan mengekstrak kerangka halaman website untuk keperluan mengumpulkan data (Henrys, 2021;Thomas & Mathur, 2019) yang dapat diolah menjadi rangkuman informasi dalam waktu singkat (Eminagaoglu, 2022;Ridwan & Hermawan, 2019). ...
Article
Full-text available
Abstract— In digitalization era, internet is at the center of all lines of community activity, just like the field of work. Currently, many platforms provide job vacancies, especially for freelancers. To obtain this information, users usually need to open several websites to find information about suitable job vacancies. Web scraping offers solution to overcome these problems. Based on research that has been done, the BeautifulSoup and Selenium libraries will be used to collect data. To search for data, vector space model method is used to find the level of data similarity between the query and the document. In exploring data, the average near-perfect recall value is 98%, while the average precision value is 56%. This is because data search uses three parameters, so the possibility of retrieving irrelevant data is more significant if the document contains a word in the user's query, even though the context does not match. Utilizing the Streamlit framework in Python can display the data processing results and help users navigate the web scraping process, data processing, and data search. This study aims to implement the web scraping method to retrieve data from freelance websites: Freelance, Project, and Sribulancer. By applying the vector space model method, users can search data from several websites without opening freelance websites one by one. Using data visualization in the form of a web application using the Streamlit framework, the web scraping results can also be processed to be presented in a more helpful form and save the user's time
... The second stage is to collect data with Web Scraping Techniques. Web Scraping is the process of retrieving semi-structured document information from the internet in the form of web pages in the form of HTML or XHTML (Kurniawati et al., 2017), (Darmawiguna et al., 2019), (Henrys, 2021). This data collection was taken from E-Commerce Shopee, Tokopedia and Lazada pages based on E-Commerce URL id categories of food and beverages in the Yogyakarta area. ...
Article
Full-text available
The government in the National Economic Recovery (PEN) launched a strategy to increase business activity, namely by maximizing the potential for product sales through E-Commerce. E-Commerce sites in Indonesia display product information with different descriptions and prices. Through the pages on E-Commerce can collect data to form a useful information. Collection of data variations requires a method or system that can automatically collect the data provided on the Web page. In addition to data collection, another thing that needs to be considered is the meaning of knowledge from the collection of information that has been obtained. For these data to be useful, it is necessary to design a place for data management. Data must be selected and arranged properly so that it is easy to use so that it gives a good meaning. This research requires methods and tools in the process. The method used is Web Scrapping while the tools used are Tableau. There are two objectives in this research. First, obtain product price data for creative economy entrepreneurs in the Special Region of Yogyakarta through E-Commerce, especially Tokopedia and Bukalapak. Second, present product price data visually to facilitate the process of price distribution analysis so that it can provide information to local governments, especially product managers from creative economy business actors to support strategic policies in determining product prices from creative economy business actors in the Special Region of Yogyakarta to support promotion and tourism.
... A parse tree is created for the pages parsed, using which data from HTML pages are extracted. Research carried out by Lunn et al. [28] comprises data extraction from Indeed.com, a job searching website with keywords and locations specified using two libraries, namely Beautifulsoup4, to extract data from HTML and XML files and lxml to process XML and HTML information in Python. Kasereka [29], in his paper, suggested the use of Beautiful Soup to retrieve particular content from a web page, remove HTML tags, and save the information. ...
Chapter
Full-text available
Big data analytics gives organizations a way to analyze huge data sets and gather new information. It helps answer basic questions about business operations and business performance. It also helps discover unknown patterns in vast datasets or combinations thereof. In the current data-driven world, it becomes increasingly essential that big data techniques are applied and analyzed for organizational growth. More specifically, with the large availability of data on the Web, whether from social media, websites, online portals, or platforms, to name but a few, it is important for organizations to know how to mine that data in order to extract useful knowledge. Web scraping represents a fundamental approach in this regard. Therefore, this paper aims to provide an updated literature review about the most advanced Web Scraping techniques to better equip scholars and managers with helpful knowledge on how to mine most effectively online data. The paper starts with presenting the basic design of a web scraper and the applications of web scraping in diverse sectors and areas. Next, the different Web scraping methods and Web scraping technologies are presented. Finally, a procedure to develop Web scraping with various tools is proposed before a conclusion wraps up the paper.
... This method has the advantage of retrieving thousands of photos "from the wild." And we can use the keywords in the query to automatically label the images [25]. For scraping, we used the Selenium and Beautiful Soup libraries for scraping purpose. ...
Article
Full-text available
Since the coronavirus pandemic unexpectedly and forcibly moved classroom activities to a totally remote format, there is a critical need for progress in the online educational system. Additionally, as online education is the wave of the future and requires enhanced infrastructure, learning and teaching process. The main issue with the present video is analysis of student involvement is done using a call-based online classroom system. Teachers frequently worry about how well their pupils will understand new information. Such analysis was unintentionally performed in the offline mode, but it is challenging in an online setting. This study introduces an autonomous method for monitoring student emotions to gauge their level of participation in class. This is accomplished by taking a screenshot of the students' video stream and sending the faces that are discovered to an emotion detection mode. The suggested architecture's emotion detection model was created by optimising the VGG16 pre-trained image classifier model. The average student engagement index is then determined. We then have also tested the model using the face recognition system for the result. We saw significant performance setting dependability of the suggested system's employment in real-time offering this research potential reach.
... E-ticarette web kazımanın avantajlarından biri de pazar analizidir. Kasereka (2021) Bu çalışma e-perakende firmalarına ve pazarlama araştırması yapanların daha da artacağı tahmin edilerek bu çalışmadan elde edilen sonuçların gelecekte yapılacak çalışmalara ışık tutacağı düşünülmektedir. ...
Conference Paper
Full-text available
Pazarlama araştırmalarında, geleneksel pazarlama araştırmalar yöntemlerine ilave çevirimiçi veri toplama araçları ve analizleri geliştirilmiştir. Web kazıma, Web' ten büyük miktarda veri toplama ve daha sonra bunları firma hedefleri doğrultulusunda yapılan analizlerde kullanma işlemidir. Web kazıma ile fiyat verileri, piyasa değişkenleri, eğilimler, rakipler hakkında fikir edinilir. Birçok pazarlama araştırması yapan e-perakende firmalarının kullanabileceği Web kazıma uygulamaları onlar için yararlı sonuçlar üretir. Bu çalışmada Web kazıma teknikleri hakkında bilgiler verilecek; kullanım alanları, avantajlarını açıklanarak e-perakende firmalarının çevirimiçi pazarlamacılar için faydalı olabilecek bir uygulama yapılacaktır. Bu kapsamda bir e-perakende firması sitesinden belirli bir ürün grubu için veri kazıma yapılmıştır. Sonuç olarak Web kazıma ile e-perakende firmalarının çevirimiçi ürünlerin fiyat bilgiler elde edilebileceği gösterilmiştir. Bu yöntemlerle, e-perakende firmalarının ve pazarlama araştırması yapanların daha da artacağı tahmin edilerek, gelecekte yapılacak çalışmalara katkı sunacağı düşünülmektedir.
... We have used python to scrape the image through the firefox web driver. The benefit of this approach is that (a) it retrieves thousands of images "from the wild'' and (b) we can automatically label the images using the keywords in the query [25]. We have used Selenium and Beautiful Soup libraries for scraping purposes. ...
Preprint
Full-text available
There is a crucial need for advancement in the online educational system due to the unexpected, forced migration of classroom activities to a fully remote format, due to the coronavirus pandemic. Not only this, but online education is the future, and its infrastructure needs to be improved for an effective teaching-learning process. One of the major concerns with the current video call-based online classroom system is student engagement analysis. Teachers are often concerned about whether the students can perceive the teachings in a novel format. Such analysis was involuntarily done in the offline mode, however, is difficult in an online environment. This research presents an autonomous system for analyzing the students' engagement in the class by detecting the emotions exhibited by the students. This is done by capturing the video feed of the students and passing the detected faces to an emotion detection mode. The emotion detection model in the proposed architecture was designed by finetuning VGG16 pre-trained image classifier model. Lastly, the average student engagement index is calculated. We received considerable performance setting reliability of the use of the proposed system in real-time giving a future scope to this research.
Preprint
Full-text available
In the current context of the automotive market in Colombia there is an exorbitant rise in prices of second-hand vehicles and in the absence of structured tools to access quickly and reliably to the market information, it is necessary to create a centralized dataset that reflects the inventory offered on the web, in order to facilitate the comparison, collection and decision making in the process of buying and selling second-hand vehicles. Under this premise, this applied research article proposes the creation of a minimum viable product that allows to extract, prepare, process, transform and compare the information via Web Scraping and propose decision alternatives using prescriptive analytics based on decision trees and benchmarking for users of the second-hand car and motorcycle market in Colombia. The results of the developed tool are satisfactory since they offered the user two options according to the sought characteristics of the motor vehicle, one of minimum price and the other of minimum mileage-lowest price. Finally, to go further, it is recommended to explore more complete frameworks for a more personalized experience.
Chapter
There is a crucial need for advancement in the online educational system due to the unexpected, forced migration of classroom activities to a fully remote format, due to the coronavirus pandemic. Not only this, but online education is the future, and its infrastructure needs to be improved for an effective teaching–learning process. One of the major concerns with the current video call-based online classroom system is student engagement analysis. Teachers are often concerned about whether the students can perceive the teachings in a novel format. Such analysis was involuntarily done in the offline mode, however, is difficult in an online environment. This research presents an autonomous system for analyzing the students’ engagement in the class by detecting the emotions exhibited by the students. This is done by capturing the video feed of the students and passing the detected faces to an emotion detection mode. The emotion detection model in the proposed architecture was designed by fine-tuning VGG16 pre-trained image classifier model. Lastly, the average student engagement index is calculated. Authors received considerable performance setting reliability of the use of the proposed system in real time giving a future scope to this research.KeywordsEmotion detectionCNNVGG16EducationTransfer learningEngagement
Task-Specific Information Retrieval System for Software Engineers
  • R Adam