To read the full-text of this research, you can request a copy directly from the author.
... PostgreSQL is an open source object-relational database system. It is fully atomicity, consistency, isolation, durability (ACID) compliant and runs on all major operating systems including Linux [51]. In this section, we study the storage engine (SE) of PostgreSQL and apply necessary changes to make it NVM-aware. ...
This paper explores the implications of employing non-volatile memory (NVM) as primary storage for a data base management system (DBMS). We investigate the modifications necessary to be applied on top of a traditional relational DBMS to take advantage of NVM features. As a case study, we modify the storage engine (SE) of PostgreSQL enabling efficient use of NVM hardware. We detail the necessary changes and challenges such modifications entail and evaluate them using a comprehensive emulation platform. Results indicate that our modified SE reduces query execution time by up to 45% and 13% when compared to disk and NVM storage, with average reductions of 19% and 4%, respectively. Detailed analysis of these results shows that while our modified SE is able to access data more efficiently, data is not close to the processing units when needed for processing, incurring long latency misses that hinder the performance. To solve this, we develop a general purpose library that employs helper threads to prefetch data from NVM hardware via a simple API. Our library further improves query execution time for our modified SE when compared to disk and NVM storage by up to 54% and 17%, with average reductions of 23% and 8%, respectively.
... The principal data storage of the web server is a PostgreSQL database [15]. ...
... Otras características aportan potencia y flexibilidad adicional son las restricciones, disparadores, reglas e integridad transaccional, también presenta algunas funcionalidades que no se encuentran en BD comerciales, como tipos de datos definibles por el usuario, herencia y reglas. Estas características lo califican como una excelente opción para gestionar BD, como se menciona en (Momjian, 2001). ...
The Cuban Telecommunications Company depends to a large extent on the proper functioning of its organizational units. The role of the Operation and Maintenance telephone centers distributed throughout the country in all its territorial divisions is essential to guarantee compliance with the income and expenditure of the planned budget. This research proposes the design and development of a computer system that facilitates the processing of information and the established mechanisms for the control of material resources, which have a direct influence on the diagnosis of the outside plant network and the management of the workforce in the entity's operating centers. Particular attention is paid to the Subscriber Network segment, where the bulk of the productive actions are carried out, in order to improve the accuracy and reliability of the information and reduce the manual work time dedicated to the activity. A study was carried out on the existing systems related to the field of action. To develop the application, agile Scrum methodology, Python programming language, Django Framework, PostgreSQL database manager, and JavaScript and HTML components were used. A web application was implemented that supports the operational functioning of the centers, reduces the time required in the generation of data and in the preparation of documents. The positive results of the evaluation of the experts on the software are exposed and its contribution to the company during the period of its production is evidenced.
... Nevertheless, the obtained performance is often not sufficient for online queries and interactive applications. For example, the AS query takes 1.24 hours on the column database Mon-etDB [27], 8.20 hours on the row database Postgres 4 [38], , and 2.05 hours on the graph database Neo4j 5 , [55], even though we fully cached the data in main memory in all of them. In contrast, GQ-Fast executes the AS query in just 5.66 seconds, which presents an improvement of many orders of magnitude. ...
We study a class of graph analytics SQL queries, which we call relationship queries. Relationship queries are a wide superset of fixed-length graph reachability queries and of tree pattern queries. Intuitively, it discovers target entities that are reachable from source entities specified by the query. It usually also finds aggregated scores, which correspond to the target entities and are calculated by applying aggregation functions on measure attributes, which are found on the target entities, the source entities and the paths from the sources to the targets. We present real-world OLAP scenarios, where efficient relationship queries are needed. However, row stores, column stores and graph databases are unacceptably slow in such OLAP scenarios. We briefly comment on the straightforward extension of relationship queries that allows accessing arbitrary schemas. The GQ-Fast in-memory analytics engine utilizes a bottom-up fully pipelined query execution model running on a novel data organization that combines salient features of column-based organization, indexing and compression. Furthermore, GQ-Fast compiles its query plans into executable C++ source codes. Besides achieving runtime efficiency, GQ-Fast also reduces main memory requirements because, unlike column databases, GQ-Fast selectively allows more dense forms of compression including heavy-weighted compressions, which do not support random access. We used GQ-Fast to accelerate queries for two OLAP dashboards in the biomedical field. It outperforms Postgres by 2-4 orders of magnitude and outperforms MonetDB and Neo4j by 1-3 orders of magnitude when all of them are running on RAM. In addition, it generally saves space due to the appropriate use of compression methods.
... Перенесем все рассуждения на базу данных PostgreSQL [6], описанную в статье [1]. При создании дополнительной таблицы добавим строк со значением , равным 0, и на этих 2 ⋅ строках построим дерево отрезков. ...
Purpose: This article is a continuation of the article [1], which considered a method of optimizing aggregation queries on a continuous range of PostgreSQL databases rows using a segment tree [2–4]. The disadvantage of the considered approach was that the operation of inserting new values into the table took too much time, as it led to a complete rebuilding of the tree. The purpose of this work is to develop a method that allows to significantly speed up the insertion of new items into a table while maintaining up-to-date information in the segment tree. In this article, we propose a method that can significantly speed up the insertion of new items into the table while maintaining up-to-date information in the segment tree. At the same time the gain in aggregating queries is fully preserved. The method is based on the peculiarity of the segment tree, which allows to update elements in it for O(log_2 N) asymptotic. Then we can reduce the problem of adding new values to updating existing elements, which can be solved more efficiently. For this purpose, additional neutral elements are allocated during the construction of the tree. When new values are added to the table, one of these additional elements is updated. When the free elements in the tree run out the tree is rebuilt, again allocating additional memory. The result of this work is the development of an algorithm for updating elements in the tree and its implementation in the form of PostgreSQL Extensions. The speed of the obtained solution has been measured. Conclusions on the results of work and plans for further optimization of operations are made.
... • We propose a novel VKG-based federation approach for the semantic integration of ESN data using Ontop [23] and PostgreSQL [24]. This approach utilizes the expanded CA ontology and virtual knowledge graph to integrate various ESN data using shared semantics seamlessly. ...
The rapid proliferation of Environmental Sensor Networks (ESNs) used for monitoring environmental systems such as meteorology and air quality, as well as advances in database technologies (
e.g.
, SQL) has made significant progress in sensor data management. Notwithstanding the strength of these databases, they can inevitably lead to a data heterogeneity problem as a result of isolated databases with distinct data schema, which are expensive to be accessed and pre-processed when the data is consumed spanning multiple databases. Recently, knowledge graphs have been used as one of the most popular integration frameworks to address this data heterogeneity problem from the perspective of establishing an interoperable semantic schema (
a.k.a.
ontology). However, the majority of proposed knowledge graphs in this domain are a product of an ETL (Extraction-Transform-Load) approach with all the data physically stored in a triplestore. In contrast, this paper examines an approach of virtualizing knowledge graphs on top of the SQL databases as the means to provide a federated data integration approach for enhanced heterogeneous Environmental Sensor Networks' data access, bringing with it the promise of more cost-efficiency in terms of I/O, storage,
etc
. In addition, this work also considers some motivating application scenarios regarding the efficiency of time series data access. Based on a performance comparison between the proposed integration approach and some popular triplestores, the proposed approach has a significant edge over triplestores in multiple time series structuring and acquisition.
... The index files have been integrated as tables into a PostgreSQL database for efficient data storage and quick response to user queries. 44 The main framework used for the front end of the Web server is Vue, with Vuetify used to assist in implementing user interface (UI) features such as displaying table data and simple icon graphics. 45 Flask is used as the server in the backend to respond to user requests. ...
... In this study, we employ the state-of-the-art spatial database, PostgreSQL [29], to integrate multi-source geospatial data and to enhance the efficiency of damage evaluation. PostgreSQL provides a spatial extension, PostGIS, which supports geospatial information services such as spatial indexing, spatial queries, and more. ...
The road transportation of explosives is highly concerning due to its substantial impact on social safety. For the safety management of explosive transportation, e.g., transport route planning and emergency rescue, explosion consequence evaluation is of paramount importance. The consequence evaluation of explosion accidents is affected by many factors, especially spatial features, such as the location of transport vehicles, the distribution of buildings, and the presence of individuals around the road, etc. However, there is still a lack of quantification methods for building damage evaluation, human casualty evaluation that considers real-time population density, and efficient interactive damage evaluation methods. In this paper, we formalize three typical scenarios of damage evaluation for explosive road transportation accidents, i.e., explosion point-based, road segment-based, and route-based damage evaluation. For each scenario, we propose a Height-aware Hierarchical Building Damage (HHBD) model and a Shelter-aware Human Casualty (SHC) model for building damage evaluation and human casualty evaluation, respectively. We also develop a GIS-based interactive visualization platform that integrates multi-source geospatial data and that enables efficient geospatial computation. In addition, a case study of liquefied natural gas (LNG) transportation in Wuhan is demonstrated in order to verify the effectiveness and efficiency of the proposed system. The research results can support the decision-making process of explosive transportation safety warnings and emergency rescue.
... The count of rNMPs are normalized on the total number of rNMPs in the chosen rNMP library and the total nucleotide frequency from the background reference genome as previously described (Xu and Storici, 2021b). The rNMPID database is built using Rust (Matsakis et al., 2014), TypeScript (Bierman et al., 2014), PostgreSQL (Momjian et al., 2001) and additional libraries including Tokio (Tokio Team, 2023), SQLx (Launchbadge Team, 2023), Reactjs (Rawat and Mahajan, 2020), Plotly (Johnson et al., 2012), Ant Design (Ant Design Team, 2023), and JBrowse (Buels et al., 2016), etc. It comprises four different modules, namely Sample Analysis, Genome Browser, Download, and Resource (Fig. 1A). ...
Motivation: Ribonucleoside monophosphates (rNMPs) are the most abundant non-standard nu-cleotides embedded in genomic DNA. If the presence of rNMP in DNA cannot be controlled, it can lead to genome instability. The actual positive functions of rNMPs in DNA remain mainly unknown. Considering the association between rNMPs embedment and various diseases and cancer, the phenomenon of rNMPs embedment in DNA has become a prominent area of research in recent years.
Results: We introduce the rNMPID database, which is the first database revealing rNMP-embedment characteristics, strand bias, and preferred incorporation patterns in the genomic DNA of samples from bacterial to human cells of different genetic backgrounds. The rNMPID database uses datasets generated by different rNMP-mapping techniques. It provides the researchers with a solid foundation to explore the features of rNMPs embedded in the genomic DNA of multiple sources, and their association with cellular functions, and, in future, disease. It also significantly benefits researchers in the fields of genetics and genomics who aim to integrate their studies with the rNMP-embedment data.
Availability: rNMPID is freely accessible on the web at https://www.rnmpid.org.
... Among all the relational database management systems (RDBMS) that use SQL to administer, define, manage and manipulate information (Oracle, Microsoft SQL Server, MySQL, SQLite), one of the most used and best cataloged today is PostgreSQL, as it is open source and has great stability, power, robustness, and ease of administration and implementation. Additionally, it uses a client/server system with threads for the correct processing of queries to the database [31,32]. ...
Three-dimensional block models are the most widely used tool for the study and evaluation of ore deposits, the calculation and design of economical pits, mine production planning, and physical and numerical simulations of ore deposits. The way these algorithms and computational techniques are programmed is usually through complex C++, C# or Python libraries. Database programming languages such as SQL (Structured Query Language) have traditionally been restricted to drillhole sample data operation. However, major advances in the management and processing of large databases have opened up the possibility of changing the way in which block model calculations are related to the database. Thanks to programming languages designed to manage databases, such as SQL, the traditional recursive traversal of database records is replaced by a system of database queries. In this way, with a simple SQL, numerous lines of code are eliminated from the different loops, thus achieving a greater calculation speed. In this paper, a floating cone optimization algorithm is adapted to SQL, describing how economical cones can be generated, related and calculated, all in a simple way and with few lines of code. Finally, to test this methodology, a case study is developed and shown.
... Se elige como gestor de base de datos a PostgreSQL, un sistema de código abierto y gratuito, que funciona en varias plataformas, incluyendo Windows y Linux. Tiene soporte para BD extendido como procedimientos almacenados, una excelente documentación y muy buena seguridad, como se menciona en (Momjian, 2001); además de ser utilizado ampliamente en la Empresa. ...
This work shows the main aspects related to the development of a computing
application to manage the radio spectrum license usage in ETECSA issued by the
UPTCER (for its acronyms in Spanish). A relational database was designed while
linking operation stations, the equipment and the radio frequencies in use, the calculation
of their expiration indicators, the status of the requests, access to documents,
as well as data related to the installed technique corresponding to the Indoor Plant
Transportation network. The technologies selected for the implementation of the
system are also explained. This software proves good results in terms of management
and support for decision-making in the entity.
... Table 2 summarizes the pros and cons of these platforms. [16] for its database, SQLAlchemy [17] for its Object Relational Mapping (ORM) [18], and Apache Solr [19] for its search engine. CKAN solves the problems of insufficient functions by utilizing plug-ins, also referred to as an extension. ...
In this study, we proposed Smart Open Data as a Service (SODAS) as a new open data platform based on the international standards Data Catalog Vocabulary (DCAT) and Comprehensive Knowledge Archive Network (CKAN) to facilitate the release and sharing of data. We first analyze the five problems in the legacy CKAN and then draw up corresponding solutions through three core strategies: CKAN expansion, DCATv2 support, and extendable DataMap. We then define four components and nine function blocks of SODAS for each core strategy. As a result, SODAS drives Open Data Portal, Open Data Reference Model, DataMap Publisher, and Analytics and Development Environment (ADE) Provisioning for connecting the defined function blocks. We confirm that each function works correctly through the SODAS Web portal, and then we apply SODAS to actual data distribution sites to prove its efficiency and practical use. SODAS is the first open data platform that provides secure interoperability between heterogeneous platforms based on international standards, and it enables domain-free data management with flexible metadata.
... Django also provides an optional administrative interface for creating, reading, updating, and deleting dynamically generated data. • Database -PostgreSQL is used as the database in the Dohrnii system [31]. Django has built-in support for PostgreSQL which facilitates the process of developing Django applications with PostgreSQL. ...
The reproducibility of code and the whole environment in which that code needs to be executed takes up much of the time of software engineers, computer scientists, and anyone who writes code. There are several reasons why we can not get the same results if the original data is available to us. This may be because the environment is not configured the same way as the source code author, we do not have enough knowledge of the technology used, poorly written documentation , lack of documentation, different versions of libraries, tools, development frameworks, mistakes made by one who wants to reproduce the code or there are errors in the code or errors that occurred after a certain period of time. In this paper, we introduce a new system for reproducible coding environments called Dohrnii, as a cloud-based solution. The main purpose is to save the time lost when we want to reproduce the results of a project for the first time. Every Dohrnii environment contains a video, description, instance (virtual machine), resources, and evaluation. This means that with this system it will be possible to reproduce environments in which the code needs to be executed, for that there will be a video where it will be shown how to reproduce the code, instance (virtual machine), description, additional resources, and part for an evaluation to determine if what is in the video corresponds to what is in the instance.
... Most popular examples of traditional RDBMS are: Oracle [5] [6], MySQL [7][8], Microsoft SQL Server [9][10], Post-greSQL [11] [12], and IBM Db2 [13] [14]. ...
Databases are considered to be integral part of modern information systems. Almost every web or mobile application uses some kind of database. Database management systems are considered to be a crucial element from both business and technological standpoint. This paper divides different types of database management systems into two main categories (relational and non-relational) and several sub categories. Ranking of various sub categories for the month of July, 2021 are presented in the form of popularity score calculated and managed by DB-Engines. Popularity trend for each category is also presented to look at the change in popularity since 2013. Complete ranking and trend of top 20 systems has shown that relational models are still most popular systems with Oracle and MySQL being two most popular systems. However, recent trends have shown DBMSs like Time Series and Document Store getting more and more popular with their wide use in IOT technology and BigData, respectively.
... TaxMan was developed to facilitate phylogenetic studies by automating sequence acquisition, consensus building, alignment, and taxon selection. It was developed in Perl 5.8.6 and requires a set of prerequisites to be installed in the environment, such as BLAST (Tatusova and Madden, 1999), PostgreSQL (Momjian, 2001), Emboss (Rice et al., 2000), PHRAP (de la Bastide and McCombie, 2007), and POA (Lee et al., 2002). TaxMan accepts GenBank files of the taxa to be analyzed and a file with gene synonyms to be considered, which will be used to extract the gene information automatically from the GenBank files. ...
The reconstruction of phylogenomic trees containing multiple genes is best achieved by using a supermatrix. The advent of NGS technology made it easier and cheaper to obtain multiple gene data in one sequencing run. When numerous genes and organisms are used in the phylogenomic analysis, it is difficult to organize all information and manually align the gene sequences to further concatenate them. This study describes SPLACE, a tool to automatically SPLit, Align, and ConcatenatE the genes of all species of interest to generate a supermatrix file, and consequently, a phylogenetic tree, while handling possible missing data. In our findings, SPLACE was the only tool that could automatically align gene sequences and also handle missing data; and, it required only a few minutes to produce a supermatrix FASTA file containing 83 aligned and concatenated genes from the chloroplast genomes of 270 plant species. It is an open-source tool and is publicly available at https://github.com/reinator/splace.
... The PostgreSQL database manager, the Postgis spatial extension and the Hibernate and Hibernate Spatial data persistence frameworks were also employed. (Hibernate, 2019;Hibernate Spatial, 2019;Momjian, 2001) In view of the complexity of working with data interpolation procedures, we opted to use the AgDataBox API (Application Programming Interface) (Bazzi et al. 2019), for geostatistical analysis and generation of thematic maps using ordinary kriging (KRI) and inverse of the distance raised to a power (IDW). ...
Due to the importance of increasing the quantity and quality of world agricultural production, the use of technologies to assist in production processes is essential. Despite this, a timid adoption by precision agriculture (PA) technologies is verified by the Brazilian fruit producers, even though it is one of the segments that had been stood out in recent years in the country's economy. In the PA context, yield maps are rich sources of information, especially by species harvested through machines, where the measurement of volumes harvested at georeferenced points is easier, allowing the generation of yield maps. In orchards intended for fresh fruit market, it is more difficult to generate yield data/maps, since it is linked to the volume harvested manually and, more importantly, to the quality of the fruit. One factor that makes it difficult to measure yield is that the harvest is done at different times because to maintain their quality, the fruits of an area are only when they reach the stipulated maturity point. To construct a system that permits of contemplating the complexity of the manual fruit harvesting processes, this paper aims to present a system that allows the yield mapping of hand-harvested orchards. The system is comprised of hardware components (intended to obtain the location of the harvester as well as the unloading record of their harvesting device at the unloading site) and software that allows processing the data obtained by the hardware device and create a mapped environment from which fruits were harvested, allowing the construction of yield maps. In addition to the yield maps, the system allows identifying the yield level of each worker performing the harvest by the number of discharges performed and the time spent. The system has been developed in partnership between the Federal Technological University of Paraná and Embrapa Grape & Wine and has been tested in apple orchards in southern Brazil. The system is expected to positively impact the sector by enabling monitoring of the quality and quantity of fruit from the orchards and providing more appropriate management aiming at the stability of the field production. Although tested only in apple cultivation, the system is promising for other segments of fruit growing, such as the production of pears, orange, fig, among others.
... The server is a computer running in the cloud that receives the data from the client, as shown in Table 4, saves the results in a PostgreSQL database (Momjian, 2001), and applies the map-matching algorithm on the OSM road network to estimate traffic flow for different road links. In this study, we use an AWS EC2 instance t3a.large server, which has a dual-core 2.5 GHz AMD EPYC 7000 series CPU with 8 GB RAM and 5 Gbps network bandwidth. ...
Traffic flow estimation is required for road infrastructure management tasks such as road development planning, routing, and navigation. Determining traffic flow on a citywide scale is challenging because of the expensive costs and portability of current devices. Portable sensing devices such as drive cams and smartphones are an effective source of monitoring the road infrastructure environment because of their continuous interaction with the surrounding. However, the use of such devices to estimate real-time traffic flow has not been fully explored. In this study, we optimize the vehicle detection neural network for inference on lightweight edge devices and develop a client-server framework to reduce and share the computational load to make accurate real-time traffic flow processing from moving camera videos. We conduct extensive research work for various input network sizes and frame rates combinations for three widely used edge devices-Jetson Xavier AGX, Jetson Xavier NX, and Jetson Nano. We obtain a traffic flow reconstruction accuracy ranging from 73.1% to 80.8% evaluated using ground truth data. With the proliferation of moving cameras in vehicles (dash cams, stereo cams, etc.) and inexpensive edge devices, we expect our real-time traffic flow estimation algorithm to have a very promising future. Our findings may serve as a useful reference for several domains in the area of real-time artificial intelligence applications and emerging edge computing devices.
... Storage of these kinds are called databases, which can be SQL and NoSQL, relational, document, key-value-based, or graph databases. Widely used implementations include MySQL Server [25], MongoDB [26], PostgreSQL [27], Redis [28], InfluxDB [29], TimescaleDB [30], and Apache Cassandra [31]. ...
The use of mature, reliable, and validated solutions can save significant time and cost when introducing new technologies to companies. Reference Architectures represent such best-practice techniques and have the potential to increase the speed and reliability of the development process in many application domains. One area where Reference Architectures are increasingly utilized is cloud-based systems. Exploiting the high-performance computing capability offered by clouds, while keeping sovereignty and governance of proprietary information assets can be challenging. This paper explores how Reference Architectures can be applied to overcome this challenge when developing cloud-based applications. The presented approach was developed within the DIGITbrain European project, which aims at supporting small and medium-sized enterprises (SMEs) and mid-caps in realizing smart business models called Manufacturing as a Service, via the efficient utilization of Digital Twins. In this paper, an overview of Reference Architecture concepts, as well as their classification, specialization, and particular application possibilities are presented. Various data management and potentially spatially detached data processing configurations are discussed, with special attention to machine learning techniques, which are of high interest within various sectors, including manufacturing. A framework that enables the deployment and orchestration of such overall data analytics Reference Architectures in clouds resources is also presented, followed by a demonstrative application example where the applicability of the introduced techniques and solutions are showcased in practice.
... The server is a computer running in the cloud that receives the data from the client, as shown in Table 4, saves the results in a PostgreSQL database (Momjian, 2001), and applies the map-matching algorithm on the OSM road network to estimate traffic flow for different road links. In this study, we use an AWS EC2 instance t3a.large server, which has a dual-core 2.5 GHz AMD EPYC 7000 series CPU with 8 GB RAM and 5 Gbps network bandwidth. ...
Analysis of traffic flow parameters is necessary for Intelligent Transportation Systems (ITS) and autonomous driving research. Deep learning-based vehicle detection techniques have been widely used in reconstructing traffic flow parameters from video images. This research proposes a novel cross-sectional traffic flow estimation algorithm to reconstruct traffic volume from moving camera videos. We develop a vehicle detection dataset with more than one million annotations of vehicles with orientation and train a YOLOv4 based object detection network. We leverage the accurate vehicle detection model in tracking and estimating the distance of detected vehicles using Simple Online and Realtime Tracking (SORT) and photogrammetry techniques. The estimated distances and forward bearing of the observing vehicle are then utilized to calculate the GPS position of detected vehicles and used in the algorithm to estimate cross-sectional traffic flow. We utilize the proposed algorithm to estimate the traffic flow of 580 OpenStreetMap (OSM) road links and achieve an average accuracy of 84.30% verified against 11 traffic police sensor data in Susono city in Japan. The proposed large-scale dataset and cross-sectional traffic flow estimation algorithm open new avenues for ITS and autonomous driving research.
... Although MongoDB is ideal for storing semi-structured data such as JSON documents, many analytic tools such as Tableau and statistical methods still prefer tabular (i.e., relational) data. Therefore, the JSON format data is transformed into tabular data rows stored in relational tables in a PostgreSQL [14] database. ...
The COVID-19 pandemic has significantly affected people’s behavioral patterns and schedules because of stay-at-home orders and a reduction of social interactions. Therefore, the shape of electrical loads associated with residential buildings has also changed. In this paper, we quantify the changes and perform a detailed analysis on how the load shapes have changed, and we make potential recommendations for utilities to handle peak load and demand response. Our analysis incorporates data from before and after the onset of the COVID-19 pandemic, from an Alabama Power Smart Neighborhood with energy-efficient/smart devices, using around 40 advanced metering infrastructure data points. This paper highlights the energy usage pattern changes between weekdays and weekends pre– and post–COVID-19 pandemic times. The weekend usage patterns look similar pre– and post–COVID-19 pandemic, but weekday patterns show significant changes. We also compare energy use of the Smart Neighborhood with a traditional neighborhood to better understand how energy-efficient/smart devices can provide energy savings, especially because of increased work-from-home situations. HVAC and water heating remain the largest consumers of electricity in residential homes, and our findings indicate an even further increase in energy use by these systems.
... Computer systems often have many parameters. For example, our case study looks at 20 parameters, while larger systems such as Spark [64] have 100 knobs and PostgreSQL over 300 [43]. ...
Current auto-tuning frameworks struggle with tuning computer systems configurations due to their large parameter space, complex interdependencies, and high evaluation cost. Utilizing probabilistic models, Structured Bayesian Optimization (SBO) has recently overcome these difficulties. SBO decomposes the parameter space by utilizing contextual information provided by system experts leading to fast convergence. However, the complexity of building probabilistic models has hindered its wider adoption. We propose BoAnon, a SBO framework that learns the system structure from its logs. BoAnon provides an API enabling experts to encode knowledge of the system as performance models or components dependency. BoAnon takes in the learned structure and transforms it into a probabilistic graph model. Then it applies the expert-provided knowledge to the graph to further contextualize the system behavior. BoAnon probabilistic graph allows the optimizer to find efficient configurations faster than other methods. We evaluate BoAnon via a hardware architecture search problem, achieving an improvement in energy-latency objectives ranging from x-factors improvement over the default architecture. With its novel contextual structure learning pipeline, BoAnon makes using SBO accessible for a wide range of other computer systems such as databases and stream processors.
... Java technology was the programming language adopted, through the JEE platform (Andrade, 2015;Deitel, 2016). The PostgreSQL database manager, the Postgis spatial extension and the Hibernate, and Hibernate Spatial data persistence frameworks were also employed (Hibernate, 2020; Hibernate Spatial, 2020; Momjian, 2001). ...
Yield mapping technologies can help to increase the quantity and quality of agricultural production. Current systems only focus on the quantification of the harvest, but the quality has equal or greater importance in some perennial crops and impacts directly on the financial profitability. Therefore, a system was developed to quantify and relate the quality obtained in the classification line with the plants of the orchard and for decision-making. The system is comprised of hardware, which obtains the location of the harvester bag during harvesting and unloading at the unloading site, and software that processes the collected data. The cloud of real-time data contributed from the different collectors (bins) allows the construction of yield maps, considering the multi-stage harvesting system. Further, the system enables the creation of a detailed map of the plants and fruits harvested. As the harvest focuses on quality, it takes place in stages, depending on the ripening of the fruits. In addition to the yield maps, the system allows identification of the efficiency of each worker undertaking the harvest by the number of performed discharges and by the time spent. The system was developed in partnership with the Federal Technological University of Paraná and Embrapa Uva & Vinho and was tested in apple orchards in southern Brazil. Although the system was evaluated with only data from apple cultivation, monitoring the quality and quantifying other orchard fruits can positively impact the fruit sector.
... 20 . Postgres is a powerful, open-source object-relational database system that has earned a strong reputation for reliability, feature robustness, and performance, Momjian (2001). It fits well to MLFlow as it does not need to handle Big Data and it is simple to use. ...
The amount of sensors in process industry is continuously increasing as they are getting faster, better and cheaper. Due to the rising amount of available data, the processing of generated data has to be automatized in a computationally efficient manner. Such a solution should also be easily implementable and reproducible independently of the details of the application domain. This paper provides a suitable and versatile usable infrastructure that deals with Big Data in the process industry on various platforms using efficient, fast and modern technologies for data gathering, processing, storing and visualization. Contrary to prior work, we provide an easy-to-use, easily reproducible, adaptable and configurable Big Data management solution with a detailed implementation description that does not require expert or domain-specific knowledge. In addition to the infrastructure implementation, we focus on monitoring both infrastructure inputs and outputs, including incoming data of processes and model predictions and performances, thus allowing for early interventions and actions if problems occur.
Artificial Intelligence (AI) technologies have enabled researchers to develop tools to monitor real-world events and user behavior using social media platforms. Twitter is particularly useful for gathering invaluable information related to diseases and public health to build real-time disease surveillance systems. Such systems offer a cost-effective and efficient alternative to the passive, expensive, and time-consuming process of using data from healthcare organizations and hospitals. In this paper, we propose a novel system of TepiSense to automatically perform disease surveillance of epidemic-prone diseases. Our system classifies tweets related to diseases and further identifies ‘indication’ tweets that highlight the presence of patients. Our system consists of four distinct modules of pre-processing, feature extractor, classifier, and evaluator. TepiSense compares the performance of 3 feature extraction techniques, 9 machine learning models, and 3 Large Language Models (LLMs). To test the performance of our framework, we build dataset of Twitter Epidemic Surveillance Corpus (TESC) containing 23.9K English and 13K labelled Urdu tweets related to six diseases: COVID19, Hepatitis, Malaria, Flu, Dengue, and HIV/AIDS. Our results show that LLM of MBERT achieves the highest F-measure values of 0.96 and 0.83 for topic and indication tweets classification, respectively. Furthermore, we compute the correlation of signals generated by our framework with real-world cases to test the efficacy on COVID19 disease. We notice that real-world cases have a correlation of 0.58-0.63 with the indication category tweets. Finally, we develop an interactive and user-friendly dashboard to disseminate the analytics of our system. Overall, our system offers a powerful tool for real-time disease surveillance using social media with potential implications for public health policy and decision-making.
On an Amazon EC2 or Azure VM Linux host, the PostgreSQL software is installed by the package manager provided by the Red Hat OS, which is based on the default configuration. For customization, you have to work with the configuration options. You can create multiple PostgreSQL clusters on Linux hosts to segregate business requirements for separate windows of maintenance, with different ports for access. You have to work on several configuration tasks after installing the PostgreSQL cluster software, such as initializing the PostgreSQL cluster database, configuring the database parameters to expand the default memory, installing and configuring extensions, setting up firewall and network rules, and creating user databases. As the default PostgreSQL port 5432 is not secure, you have to create a non-default port for database access to protect the database from internal and external threats and vulnerabilities.
PostgreSQL is an open-source database that was originally created by academicians and has been subsequently enhanced by contributions from individuals and institutions, which provide cost-effective solutions for its software maintenance. PostgreSQL is an object-relational database system that was built to support simple to complex data types, and it is a preferred choice on the cloud for its portability and scalability. The PostgreSQL database can be installed on personal computers as well as large scale computers. The open-source PostgreSQL database has no official support from the development community. Corporations lean on several commercial products that provide additional wrapper components for high availability, distributed computing, and backup and recovery, among other extended features.
Dalam lintasan evolusi teknologi informasi yang tak pernah berhenti, telah memperlihatkan perubahan global yang dipicu oleh kemunculan berbagai inovasi, mulai dari komputer pribadi hingga internet dan situs web. Setiap langkah dalam perkembangan ini telah memberikan kontribusi penting dalam mengubah cara berinteraksi, bekerja, dan menjalankan bisnis. Pentingnya perubahan ini menjadi semakin jelas di sektor logistik, yang merupakan tulang punggung dari rantai pasokan global. Namun, perubahan ini tidak terjadi secara instan, terutama bagi perusahaan-perusahaan yang telah lama beroperasi dengan cara yang lebih konvensional. Salah satu contohnya adalah PT Galena Perkasa, sebuah perusahaan logistik yang telah membangun reputasi solid dalam menyediakan layanan berkualitas. Penelitian ini bertujuan untuk merancang dan membangun Order Management System (OMS) berbasis web menggunakan framework Laravel pada PT Galena Perkasa. OMS dirancang untuk mengelola seluruh proses pesanan, mulai dari penerimaan hingga pengiriman, dengan meningkatkan efisiensi dan responsivitas operasional perusahaan. Dalam penelitian ini, metode Scrum Agile digunakan untuk memastikan pengembangan sistem dilakukan secara iteratif dan responsif terhadap perubahan kebutuhan. Penggunaan framework Laravel dan basis data PostgreSQL dipilih untuk menjamin keamanan data dan kinerja optimal. Hasil dari penelitian ini menunjukkan bahwa OMS yang dikembangkan berhasil meningkatkan efisiensi operasional PT Galena Perkasa dengan mengurangi kesalahan manusiawi dan mempercepat siklus pengiriman produk
Purpose
Pakistan has disproportionately high maternal and neonatal morbidity and mortality. There is a lack of detailed, population-representative data to provide evidence for risk factors, morbidities and mortality among pregnant women and their newborns. The Pregnancy Risk, Infant Surveillance and Measurement Alliance (PRISMA) is a multicountry open cohort that aims to collect high-dimensional, standardised data across five South Asian and African countries for estimating risk and developing innovative strategies to optimise pregnancy outcomes for mothers and their newborns. This study presents the baseline maternal and neonatal characteristics of the Pakistan site occurring prior to the launch of a multisite, harmonised protocol.
Participants
PRISMA Pakistan study is being conducted at two periurban field sites in Karachi, Pakistan. These sites have primary healthcare clinics where pregnant women and their newborns are followed during the antenatal, intrapartum and postnatal periods up to 1 year after delivery. All encounters are captured electronically through a custom-built Android application. A total of 3731 pregnant women with a mean age of 26.6±5.8 years at the time of pregnancy with neonatal outcomes between January 2021 and August 2022 serve as a baseline for the PRISMA Pakistan study.
Findings to date
In this cohort, live births accounted for the majority of pregnancy outcomes (92%, n=3478), followed by miscarriages/abortions (5.5%, n=205) and stillbirths (2.6%, n=98). Twenty-two per cent of women (n=786) delivered at home. One out of every four neonates was low birth weight (<2500 g), and one out of every five was preterm (gestational age <37 weeks). The maternal mortality rate was 172/100 000 pregnancies, the neonatal mortality rate was 52/1000 live births and the stillbirth rate was 27/1000 births. The three most common causes of neonatal deaths obtained through verbal autopsy were perinatal asphyxia (39.6%), preterm births (19.8%) and infections (12.6%).
Future plans
The PRISMA cohort will provide data-driven insights to prioritise and design interventions to improve maternal and neonatal outcomes in low-resource regions.
Trial registration number
NCT05904145 .
Automatic knowledge graph construction aims to manufacture structured human knowledge. To this end, much effort has historically been spent extracting informative fact patterns from different data sources. However, more recently, research interest has shifted to acquiring conceptualized structured knowledge beyond informative data. In addition, researchers have also been exploring new ways of handling sophisticated construction tasks in diversified scenarios. Thus, there is a demand for a systematic review of paradigms to organize knowledge structures beyond data-level mentions. To meet this demand, we comprehensively survey more than 300 methods to summarize the latest developments in knowledge graph construction. A knowledge graph is built in three steps: knowledge acquisition, knowledge refinement, and knowledge evolution. The processes of knowledge acquisition are reviewed in detail, including obtaining entities with fine-grained types and their conceptual linkages to knowledge graphs; resolving coreferences; and extracting entity relationships in complex scenarios. The survey covers models for knowledge refinement, including knowledge graph completion, and knowledge fusion. Methods to handle knowledge evolution are also systematically presented, including condition knowledge acquisition, condition knowledge graph completion, and knowledge dynamic. We present the paradigms to compare the distinction among these methods along the axis of the data environment, motivation, and architecture. Additionally, we also provide briefs on accessible resources that can help readers to develop practical knowledge graph systems. The survey concludes with discussions on the challenges and possible directions for future exploration.
Operational and performance characteristics of flash SSDs have long been associated with a set of Unwritten Contracts due to their hidden, complex internals and lack of control from the host software stack. These unwritten contracts govern how data should be stored, accessed, and garbage collected. The emergence of Zoned Namespace (ZNS) flash devices with their open and standardized interface allows us to write these unwritten contracts for the storage stack. However, even with a standardized storage-host interface, due to the lack of appropriate end-to-end operational data collection tools, the quantification and reasoning of such contracts remain a challenge. In this paper, we propose zns.tools, an open-source framework for end-to-end event and metadata collection, analysis, and visualization for the ZNS SSDs contract analysis. We showcase how zns.tools can be used to understand how the combination of RocksDB with the F2FS file system interacts with the underlying storage. Our tools are available openly at \url{https://github.com/stonet-research/zns-tools}.
Numerous irregular graph datasets, for example social networks or web graphs, may contain even trillions of edges. Often, their structure changes over time and they have domain-specific rich data associated with vertices and edges. Graph database systems such as Neo4j enable storing, processing, and analyzing such large, evolving, and rich datasets. Due to the sheer size and irregularity of such datasets, these systems face unique design challenges. To facilitate the understanding of this emerging domain, we present the first survey and taxonomy of graph database systems. We focus on identifying and analyzing fundamental categories of these systems (e.g., document stores, tuple stores, native graph database systems, or object-oriented systems), the associated graph models (e.g., RDF or Labeled Property Graph), data organization techniques (e.g., storing graph data in indexing structures or dividing data into records), and different aspects of data distribution and query execution (e.g., support for sharding and ACID). 51 graph database systems are presented and compared, including Neo4j, OrientDB, and Virtuoso. We outline graph database queries and relationships with associated domains (NoSQL stores, graph streaming, and dynamic graph algorithms). Finally, we outline future research and engineering challenges related to graph databases.
Private blockchain as a replicated transactional system shares many commonalities with distributed database. However, the intimacy between private blockchain and deterministic database has never been studied. In essence, private blockchain and deterministic database both ensure replica consistency by determinism. In this paper, we present a comprehensive analysis to uncover the connections between private blockchain and deterministic database. While private blockchains have started to pursue deterministic transaction executions recently, deterministic databases have already studied deterministic concurrency control protocols for almost a decade. This motivates us to propose Harmony, a novel deterministic concurrency control protocol designed for blockchain use. We use Harmony to build a new relational blockchain, namely HarmonyBC, which features low abort rates, hotspot resiliency, and inter-block parallelism, all of which are especially important to disk-oriented blockchain. Empirical results on Smallbank, YCSB, and TPC-C show that HarmonyBC offers 2.0x to 3.5x throughput better than the state-of-the-art private blockchains.
Data management and analysis are challenging with big Earth observation (EO) data. Expanding upon the rising promises of data cubes for analysis-ready big EO data, we propose a new geospatial infrastructure layered over a data cube to facilitate big EO data management and analysis. Compared to previous work on data cubes, the proposed infrastructure, GeoCube, extends the capacity of data cubes to multi-source big vector and raster data. GeoCube is developed in terms of three major efforts: formalize cube dimensions for multi-source geospatial data, process geospatial data query along these dimensions, and organize cube data for high-performance geoprocessing. This strategy improves EO data cube management and keeps connections with the business intelligence cube, which provides supplementary information for EO data cube processing. The paper highlights the major efforts and key research contributions to online analytical processing for dimension formalization, distributed cube objects for tiles, and artificial intelligence enabled prediction of computational intensity for data cube processing. Case studies with data from Landsat, Gaofen, and OpenStreetMap demonstrate the capabilities and applicability of the proposed infrastructure.
Web applications are exposed to many threats and, despite the best defensive efforts, are often successfully attacked. Reverting the effects of an attack on the state of such an application requires a profound knowledge about the application, to understand what data did the attack corrupt. Furthermore, it requires knowing what steps are needed to revert the effects without modifying legitimate data created by legitimate users. Existing intrusion recovery systems are capable of reverting the effects of the attack but they require modifications to the source code of the application, which may be unpractical. We present
Sanare
, a pluggable intrusion recovery system designed for web applications that use different data storage systems to keep their state. Sanare does not require any modification to the source code of the application or the web server. Instead, it uses a new deep learning scheme that we also introduce in the article,
Matchare
, that learns the matches between the HTTP requests and the database statements, file system operations, and web service requests that the HTTP requests caused. We evaluated Sanare with three open source web applications: WordPress, GitLab and ownCloud. In our experiments, Matchare achieved precision and recall higher than 97.5% with a performance overhead of less than 18% to the application.