About
254
Publications
29,836
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,159
Citations
Citations since 2017
Introduction
Additional affiliations
April 1993 - present
July 1974 - present
Publications
Publications (254)
Data access and management for multi-cloud applications proves challenging where there is a need for efficient access to large pre-existing, legacy data-sets. To address this problem, we created an indexing subsystem incorporated into Onedata data management system achieving a global multi-cloud integration of legacy storage systems containing larg...
Many modern applications, both scientific and commercial, are deployed to cloud environments and often employ multiple types of resources. That allows them to efficiently allocate only the resources which are actually needed to achieve their goals. However, in many workloads the actual usage of the infrastructure varies over time, which results in...
Scientific and commercial endeavors could benefit from cross-organizational, decentralized collaboration, which becomes the key to innovation. This work addresses one of its challenges, namely efficient access control to assets for distributed data processing among autonomous data centers. We propose a group membership management framework dedicate...
Deep reinforcement learning has been recently a very active field of research. The policies generated with the use of this class of training algorithms are flexible and thus have many practical applications. In this article we present the results of our attempt to use the recent advancements in reinforcement learning to automate the management of r...
Deep Reinforcement Learning has been recently a very active field of research. The policies generated with use of that class of training algorithms are flexible and thus have many practical applications. In this paper we present the results of our attempt to use the recent advancements in Reinforcement Learning to automate the management of resourc...
Data access and management on a large, global scale is currently at the center of scientific interest. This follows from the need for data access from multi- and hybrid-cloud applications. In most cases existing solutions provide sufficient functionality to scale computing resources but scaling resources in terms of efficient data access e.g. for d...
Cloud bursting is an application deployment model wherein additional computing resources are provisioned from public clouds in cases where local resources are not sufficient, e.g. during peak demand periods. We propose and experimentally evaluate a cloud-bursting solution for scientific workflows. Our solution is portable thanks to using Kubernetes...
Click-Through Rate estimation is a crucial prediction task in Real-Time Bidding environments prevalent in display advertising. The estimation provides information on how to trade user visits in various systems. Logistic Regression is a popular choice as the model for this task. Due to the amount, dimensionality and sparsity of data, it is challengi...
Reinforcement learning has been recently a very active field of research. Thanks to combining it with Deep Learning, many newly designed algorithms improve the state of the art. In this paper we present the results of our attempt to use the recent advancements in Reinforcement Learning to automate the management of heterogeneous resources in an env...
Automatic personality recognition from source code is a scarcely explored problem. We propose personality recognition with handcrafted features, based on lexical, syntactic and semantic properties of source code. Out of 35 proposed features, 22 features are completely novel. We also show that n-gram features are simple but surprisingly good predict...
We propose a comprehensive solution for reproducibility of scientific workflows. We focus particularly on Kubernetes-managed container clouds, increasingly important in scientific computing. Our solution addresses conservation of the scientific procedure, scientific data, execution environment and experiment deployment, while using standard tools i...
Reinforcement learning is a very active field of research with many practical applications. Success in many cases is driven by combining it with Deep Learning. In this paper we present the results of our attempt to use modern advancements in this area for automated management of resources used to host distributed software. We describe the use of an...
Reinforcement learning is a very active eld of research with many practical applications. Success in many cases is driven by combining it with Deep Learning. In this paper we present the results of our attempt to use modern advancements in this area for automated management of resources used to host distributed software. We describe the use of an a...
Automatic personality recognition from source code is a scarcely explored problem. We propose personality recognition with hand-crafted features, based on lexical, syntactic and semantic properties of source code. Out of 35 proposed features, 22 features are completely novel. We also show that n-gram features are simple but surprisingly good predic...
We propose a comprehensive solution for reproducibility of scientific workflows. We focus particularly on Kubernetes-managed container clouds, increasingly important in scientific computing. Our solution addresses conservation of the execution environment and the application deployment using standards-based approaches to avoid maintainability issue...
We present a scientific workflow data management solution that combines global data access with a block-level optimization of data transfer, wherein only the data blocks that are used by a remote job are transferred over the network, significantly reducing data movement for specific common data access patterns. We propose the implementation of the...
In affiliate marketing, an affiliate offers to handle the marketing effort selling products of other companies. Click-fraud is damaging to affiliate marketers as they increase the cost of internet traffic. There is a need for a solution that has an economic incentive to protect marketers while providing them with data they need to reason about the...
The paper presents the design and implementation of a computer system dedicated to the optimization of a hot strip rolling process. The software system proposed here involves the flexible integration of virtual models of various devices used in the process: furnace, descalers, rolling stands, accelerated cooling systems, and coiler. The user can co...
Computational science is rapidly developing, which pushes the boundaries in data management concerning the size and structure of datasets, data processing patterns, geographical distribution of data and performance expectations. In this paper we present a solution for harmonizing data access performance, i.e. finding a compromise between local and...
Many problems, like recommendation services, sensor networks, anti-crime protection, sophisticated AI services, need online data processing coming from the environment in the form of data streams consisting of events. The novelty of the approach in the field of stream processing lies in a synergistic effort toward optimization of such systems and a...
A hybrid HPC/Cloud architecture is a potential solution to the ever-increasing demand for high-availability on-demand resources for eScience applications. eScience applications are primarily compute-intensive, and thus require HPC resources. They usually also include pre- and post-processing steps, which can be moved into the Cloud in order to keep...
Developing and deploying a global and scalable data access service is a challenging task. We assume that the globalization is achieved by creating and maintaining appropriate metadata while the scalability is achieved by limiting the number of entities taking part in keeping the metadata consistency. In this paper, we present different consistency...
Open-data research is an important factor accelerating the production and analysis of scientific results as well as worldwide collaboration; still, very little data is being shared at scale. The aim of this article is to analyze existing data-access solutions along with their usage limitations. After analyzing the existing solutions and data-access...
The recent years have significantly changed the perception of web services and data storages, as clouds became a big part of IT market. New challenges appear in the field of scalable web systems, which become bigger and more complex. One of them is designing load balancing algorithms that could allow for optimal utilization of servers' resources in...
Many problems, like recommendation services, sensor networks, anti-crime protection, sophisticated AI services, need online data processing coming from the environment in the form of data streams consisting of events. The novelty of the approach in the field of stream processing lies in a synergistic effort toward optimization of such systems and a...
There is high demand for storage related services supporting scientists in their research activities. Those services are expected to provide not only capacity but also features allowing for more flexible and cost efficient usage. Such features include easy multiplatform data access, long term data retention, support for performance and cost differe...
Nowadays, as large amounts of data are generated, either from experiments, satellite imagery or via simulations, access to this data becomes challenging for users who need to further process them, since existing data management makes it difficult to effectively access and share large data sets. In this paper we present an approach to enabling easy...
Advances in neural network models and deep learning mark great impact on sentiment analysis, where models based on recursive or convolutional neural networks show state-of-the-art results leaving behind non-neural models like SVM or traditional lexicon-based approaches. We present Tree-Structured Gated Recurrent Unit network, which exhibits greater...
This paper approaches author profiling of mails and blogs in English with Classification Restricted Boltzmann Machines. We propose an author profiling framework with no need of hand-crafted features and only minor use of text preprocessing or feature engineering. The clas-sifier evaluated on the PAN-AP-13 corpus achieves competetive results: 36.59%...
This paper discusses author profiling of English-language mails and blogs using Classification Restricted Boltzmann Machines. We propose an author profiling framework with no need for handcrafted features and only minor use of text preprocessing and feature engineering. The classifier achieves competitive results when evaluated with the PAN-AP-13 c...
Human societies appear in many types of simulations. Particularly, a lot of new computer games contain a virtual world that imitates the real world. A few of the most important and the most difficult society elements to be modelled are the social context and individuals cooperation. In this paper we show how the social context and cooperation abili...
The paper describes the material database, which was developed and included in the VirtRoll computer system dedicated to the design of optimal hot strip rolling technologies. The structure and functionalities of the database are described in the first part of the paper. The integration between the database and the system through the Scalarm platfor...
The concept of Open Science emerges as a powerful new trend, allowing researchers to exchange and reuse valuable knowledge, data and analyses. Innovative tools are needed to facilitate such global scientific collaboration, which is the main objective of the Onedatasystem. It aspires to provide a Open Science platform based on openness and decentral...
Sensitivity analysis is widely used in numerical simulations applied in industry. The robustness of such applications is crucial, which means they have to be fast and precise at the same time. However, the conventional approach to sensitivity analysis assumes realization of multiple execution of computationally intensive simulations to discover inp...
The main goal of this tutorial is to demonstrate the Scalarm platform as a tool supporting parameter studies on different computing infrastructures like clusters, grids and clouds. Parameter study (also called parameter sweep) is an approach where the same application is executed many times with different input parameter values. Afterwards, results...
This paper examines the machine learning approach to authorship attribution of articles in the Polish language. The focus is on the effect of the data volume, number of authors and thematic homo-geneity on authorship attribution quality. We study the impact of feature selection under various feature selection criteria, mainly chi square and informa...
KEYWORDS Model-based simulations, parameter studies, high-performance computing, rolling technology design. ABSTRACT The paper describes a computer system for simulating metal-lurgical rolling processes that consist of multiple steps, each of which is performed by a different type of devices. Both devices and processed materials are described with...
Data management is currently one of the predominant issues in both large scale as well as consumer computing systems. While most data is still stored in regular files and managed by various filesystems, current trends show that users no longer treat their data as files but rather objects, which is particularly evident on mobile devices and Cloud ba...
Sensitivity Analysis is widely used in numerical simulations applied in industry. The robustness of such applications is crucial, which means that they have to be fast and precise at the same. However, conventional approach to Sensitivity Analysis assumes realization of multiple execution of computationally intensive simulations to discover input/o...
The Big Data revolution means that large amounts of data have not only to be stored, but also to be processed to unlock the potential of access to information and knowledge for scientific research. As a result, scientific communities require simple and convenient global access to data which is effective, secure and shareable. In this article we ana...
It is now several years since scientists in Poland can use the resources of the distributed computing infrastructure – PLGrid. It is a flexible, large-scale e-infrastructure, which offers a homogeneous, easy to use access to organizationally distributed, heterogeneous hardware and software resources. It is built in accordance with good organization...
In this paper we present the results of a two-year study aimed at developing a full-fledged computer environment supporting post-stroke rehabilitation. The system was designed by a team of computer scientists, psychologists and physiotherapists. It adopts a holistic approach to rehabilitation. In order to extend the rehabilitation process, the appl...
It is now several years since scientists in Poland can use
the resources of the distributed computing infrastructure – PLGrid. It
is a flexible, large-scale e-infrastructure, which offers a homogeneous,
easy to use access to organizationally distributed, heterogeneous hardware
and software resources. It is built in accordance with good organization...
The objective of this work was development of the computer system VirtRoll, which allows designing of the arbitrary rolling line and performing numerical simulations using high efficiency hardware architectures. Selection of the mechanical, thermal, microstructural and phase transformation models, which allow decreasing the computing costs while th...
Statistically Similar Representative Volume Element (SSRVE) is used to simplify computational domain for microstructure representation of material in multiscale modelling. The procedure of SSRVE creation is based on optimization loop which allows to find the highest similarity between SSRVE and an original material microstructure. The objective fun...
To satisfy requirements of data globalization and high performance access in particular, we introduce the originally created onedata system which virtualizes storage systems provided by storage resource providers distributed globally. onedata introduces new data organization concepts together with providers’ cooperation procedures that involve use...
With the continuously increasing amount of online resources and data such use cases as discovery, maintenance and inter-operation become more and more complex. In particular, data management is becoming one of the main issues with respect to both scientific (large scale simulations or data mining applications) as well as consumer use cases (accessi...
In this paper we present the optimization of the energy consumption for the multi-frontal solver algorithm executed over two dimensional grids with point singularities. The multi-frontal solver algorithm is controlled by so-called elimination tree, defining the order of elimination of rows from particular frontal matrices, as well as order of memor...
With the cloud paradigm and the concept of everything as a service (XasS), our ability to leverage the potential of distributed computing resources seems greater than ever. On the other hand, data farming is a methodology based on the idea that by repeatedly running a simulation model on a vast parameter space, enough output data can be gathered to...
In the early days of computing, files where just a natural way of storing information which reflected the way one would file their punch cards in a cabinet drawer. Unfortunately, the requirement to fragment information into such chunks, is a huge bottleneck for the evolution of global information space that the Internet has become. The concept of f...
The growing importance of security operations in urban terrain has triggered many attempts to address the perceived gaps in the readiness of security forces for this type of combat. One way to tackle the problem is to employ simulation techniques. Simulations are widely used to support both mission rehearsal and mission analysis, but these two appl...
The paper describes the application of heterogeneous computational infrastructures to study complex metallurgical processes. This goal is achieved by integration of a domain-oriented system (VirtRoll) with a platform for massive parameter studies (Scalarm) on the basis of Service Oriented Architecture (SOA). In particular, technological and securit...