Ciprian-Octavian Truică

Ciprian-Octavian Truică
Aarhus University | AU · Department of Computer Science

PhD

About

64
Publications
42,637
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
267
Citations
Introduction
Researcher and Lecturer. My research topics include Natural Language Processing, Machine Learning, Deep Learning, Big Data, Text Mining, Data Mining, Relational and NoSQL Database Management Systems, Information Retrieval, Business Intelligence, High-Performance Computing, and Cloud Computing.
Additional affiliations
February 2019 - present
Aarhus University
Position
  • PostDoc Position
October 2013 - present
Polytechnic University of Bucharest
Position
  • Teaching Assistent
October 2013 - present
Polytechnic University of Bucharest
Position
  • Researcher
Education
February 2015 - August 2015
Université Lumière Lyon 2
Field of study
  • Computer Science
October 2013 - July 2018
Polytechnic University of Bucharest
Field of study
  • Computer Science
October 2011 - July 2013
Polytechnic University of Bucharest
Field of study
  • Computer Science

Publications

Publications (64)
Conference Paper
Full-text available
In this paper we describe the use of multiple types of database management systems in the same application. We will present three web applications: one that uses a PostgreSQL database management system, one that uses a MongoDB NoSQL management system and one that uses both. These applications have been developed using the MVC design pattern and wil...
Conference Paper
Full-text available
In this paper we will examine the key features of the database management system MongoDB. We will focus on the basic operations of CRUD and indexes. For our example we will create two databases one using MySQL and one in MongoDB. We will also compare the way that data will be created, selected, inserted and deleted in both databases. For the index...
Conference Paper
Full-text available
In this paper we will examine the key feature of asynchronous replication in three of the most used relational database management systems. First we will test CRUD operations on single instances of these RDBMS and then we will compare the synchronization time of CRUD operations for asynchronous replication.
Conference Paper
Full-text available
One of the most rapidly evolving and dynamic business sector is the IT domain, where there is a problem finding experienced, skilled and qualified employees. Specialists are essential for developing and implementing new ideas into products. Human resources (HR) department plays a major role in recruitment of qualified employees by assessing their s...
Article
Full-text available
Space Surveillance and Tracking is a task that requires the development of systems that can accurately discriminate between natural and man-made objects that orbit around Earth. To manage the discrimination between these objects, it is required to analyze a large amount of partially annotated astronomical images collected using a network of on-grou...
Article
Full-text available
Water resource management represents a fundamental aspect of a modern society. Urban areas present multiple challenges requiring complex solutions, which include multidomain approaches related to the integration of advanced technologies. Water consumption monitoring applications play a significant role in increasing awareness, while machine learnin...
Conference Paper
Discovering the habits of consumers is essential for effective decision support in smart water networks. While smart water meters can provide detailed consumption data for individual households, additional information can be extracted based on the geographical coordinates to highlight the distribution of consumer behaviors within a given area. In t...
Article
Full-text available
This article provides a comprehensive and up-to-date survey of models and vocabularies for creating linguistic linked data (LLD) focusing on the latest developments in the area and both building upon and complementing previous works covering similar territory. The article begins with an overview of some recent trends which have had a significant im...
Article
Full-text available
This paper presents an overview of the LL(O)D and NLP methods, tools and data for detecting and representing semantic change, with its main application in humanities research. The paper’s aim is to provide the starting point for the construction of a workflow and set of multilingual diachronic ontologies within the humanities use case of the COST A...
Article
Full-text available
Developing artificial learning systems that can understand and generate natural language has been one of the long-standing goals of artificial intelligence. Recent decades have witnessed an impressive progress on both of these problems, giving rise to a new family of approaches. Especially, the advances in deep learning over the past couple of year...
Article
Full-text available
Misinformation is considered a threat to our democratic values and principles. The spread of such content on social media polarizes society and undermines public discourse by distorting public perceptions and generating social unrest while lacking the rigor of traditional journalism. Transformers and transfer learning proved to be state-of-the-art...
Conference Paper
Full-text available
Improving consumer profile evaluation is essential for effective water resource management, by creating a more accurate overview of the water distribution network. When compared to the classification by households, a data-driven approach can deliver more accurate results about the consumer types according to their behavior. Clustering methods are c...
Article
Full-text available
New mass media paradigms for information distribution have emerged with the digital age. With new digital-enabled mass media, the communication process is centered around the user, while multimedia content is the new identity of news. Thus, the media landscape has shifted from mass media to personalized social media. While this progress brings adva...
Article
Full-text available
Document-level Sentiment Analysis is a complex task that implies the analysis of large textual content that can incorporate multiple contradictory polarities at the phrase and word levels. Most of the current approaches either represent textual data using pre-trained word embeddings without considering the local context that can be extracted from t...
Preprint
Full-text available
Extracting top-k keywords and documents using weighting schemes are popular techniques employed in text mining and machine learning for different analysis and retrieval tasks. The weights are usually computed in the data preprocessing step, as they are costly to update and keep track of all the modifications performed on the dataset. Furthermore, c...
Conference Paper
Full-text available
Profiling consumers in a water distribution system is essential for achieving sustainability in terms of resource management and urban development. Unsupervised learning can provide data-driven decision support for evaluating the water demand patterns in a large network, while various pre-processing methods can be added to expand the level of detai...
Article
Full-text available
Due to the exponential growth of the Internet of Things networks and the massive amount of time series data collected from these networks, it is essential to apply efficient methods for Big Data analysis in order to extract meaningful information and statistics. Anomaly detection is an important part of time series analysis, improving the quality o...
Chapter
Full-text available
This paper introduces DenLAC (Density Levels Aggregation Clustering), an adaptable clustering algorithm which achieves high accuracy independent of the input’s shape and distribution. While most clustering algorithms are specialized on particular input types, DenLAC obtains correct results for spherical, elongated and different density clusters. We...
Article
Full-text available
Topic modeling is a probabilistic graphical model for discovering latent topics in text corpora by using multinomial distributions of topics over words. Topic labeling is used to assign meaningful labels for the discovered topics. In this paper, we present a new topic labeling method that uses automatic term recognition to discover and assign relev...
Conference Paper
Full-text available
Our society is undergoing a data explosion. To deal with these Big Data Sets both scientists and experts from industries are creating models, methods, techniques, algorithms for efficient analysis, sometimes in real-time and with different constraints. The main objective of this paper is to present several examples of datasets used in different res...
Article
Full-text available
The human microbiome has emerged as a central research topic in human biology and biomedicine. Current microbiome studies generate high-throughput omics data across different body sites, populations, and life stages. Many of the challenges in microbiome research are similar to other high-throughput studies, the quantitative analyses need to address...
Preprint
Full-text available
In the current context of Big Data, a multitude of new NoSQL solutions for storing, managing, and extracting information and patterns from semi-structured data have been proposed and implemented. These solutions were developed to relieve the issue of rigid data structures present in relational databases, by introducing semi-structured and flexible...
Article
Full-text available
Extracting top-k keywords and documents using weighting schemes are popular techniques employed in text mining and machine learning for different analysis and retrieval tasks. The weights are usually computed in the data preprocessing step, as they are costly to update and keep track of all the modifications performed on the dataset. Furthermore, c...
Article
Full-text available
In the current context of Big Data, a multitude of new NoSQL solutions for storing, managing, and extracting information and patterns from semi-structured data have been proposed and implemented. These solutions were developed to relieve the issue of rigid data structures present in relational databases, by introducing semi-structured and flexible...
Conference Paper
Full-text available
Community detection is the process of extracting community structured subgraphs from community networks. Most research regarding community detection has focused on the network structure without taking the content associated with the nodes into account. In this paper, we propose a new method for enhancing a co-authorship network's structure using cl...
Conference Paper
Full-text available
Density-based clustering algorithms can accurately identify arbitrary shaped clusters, characteristic which makes them advantageous for many real-life datasets. However, most density-based clustering algorithms are affected by the curse of dimensionality, since they rely on distance metrics and range queries. In this paper, we demonstrate how densi...
Conference Paper
Full-text available
Sentiment analysis plays an important role in automatically finding the polarity and insights of users with regards to a specific subject, events, and entity. In this article, we propose a new topic-document embedding (TOPICDOC2VEC) for detecting the polarity of a text. The TOPICDOC2VEC is constructed as the concatenation between a document embeddi...
Chapter
Full-text available
A large number of real-world optimization and search problems are too computationally intensive to be solved due to their large state space. Therefore, a mechanism for generating approximate solutions must be adopted. Genetic Algorithms, a subclass of Evolutionary Algorithms, represent one of the widely used methods of finding and approximating use...
Article
Full-text available
Water distribution is fundamental to modern society, and there are many associatedchallenges in the context of large metropolitan areas. A multi-domain approach is required fordesigning modern solutions for the existing infrastructure, including control and monitoring systems,data science and Machine Learning. Considering the large scale water dist...
Conference Paper
Full-text available
The TF-IDF model is the most common way of representing documents in the vector space. However, its results are highly dimensional, posing problems to the classic clustering algorithms due to the curse of dimensionality. Recent word embeddings based techniques can reduce the documents representations dimensionality while also preserving the semanti...
Conference Paper
Full-text available
With the high volume of sensitive data generated daily, the need for constructing, analysing, and benchmarking protocols that maintain the confidentiality and data integrity of user information has increased. Thus, in this paper we present a benchmark for testing the runtime performance of encrypting and decrypting files and strings using symmetric...
Conference Paper
Full-text available
Clustering is an important Data Mining operation that groups objects into clusters based on their similarity. The similarity join is a primitive operation used in clustering which retrieves the most similar pairs from two input data-sets based on a dissimilarity function (also named metric). In this article, we transform DBSCAN's (Density-Based Alg...
Preprint
Full-text available
One of the most rapidly evolving and dynamic business sector is the IT domain, where there is a problem finding experienced, skilled and qualified employees. Specialists are essential for developing and implementing new ideas into products. Human resources (HR) department plays a major role in the recruitment of qualified employees by assessing the...
Chapter
Full-text available
When coupled with spatio-temporal context, location-based data collected in mobile cellular networks provide insights into patterns of human activity, interactions, and mobility. Whilst uncovered patterns have immense potential for improving services of telecom providers as well as for external applications related to social wellbeing, its inherent...
Conference Paper
Full-text available
Mobile phone service providers collect large volumes of data all over the globe. Taking into account that significant information is recorded in these datasets, there is a great potential for knowledge discovery. Since the processing pipeline contains several important steps, like data preparation, transformation, knowledge discovery, a holistic ap...
Article
Full-text available
Top-k keyword and top-k document extraction are very popular text analysis techniques. Top-k keywords and documents are often computed on-the-fly, but they exploit weighted vocabularies that are costly to build. To compare competing weighting schemes and database implementations, benchmarking is customary. To the best of our knowledge, no benchmark...
Article
Full-text available
Getting information from large volumes of data is very expensive in terms of resources like CPU and memory, as well as computation time. The analysis of a small data set extracted from the original set is preferred. From this small set, called sample, approximate results can be obtained. The errors are acceptable given the reduced cost necessary fo...
Article
Full-text available
Machine learning algorithms have recently become very popular for different tasks involving data analysis, classification or prediction. They can provide valuable knowledge for very large sets of data and can reach very good accuracy. However, most algorithms are sensitive to the nature of the data sets, as well as different calibrations which can...
Conference Paper
Full-text available
This paper proposes a new solution for topic modeling using contextual cues by applying Automatic Term Recognition (ATR) to extract domain-specific terms in the text preprocessing step. The vocabularies used for topic modeling are constructed using linguistic patterns to determine the inner structure of each document analyzed and, by only taking in...
Conference Paper
Full-text available
Information retrieval from textual data focuses on the construction of vocabularies that contain weighted term tuples. Such vocabularies can then be exploited by various text analysis algorithms to extract new knowledge, e.g., top-k keywords, top-k documents, etc. Top-k keywords are casually used for various purposes, are often computed on-the-fly,...
Article
Full-text available
In order to make accurate and fast keywords and full text searches it is recommended to index the words in the corpus. One way to do this is to use an inverted index to maintain in a structured form the words occurrence in a set of documents. A stemming algorithm can be used to minimize the number of indexed words, so only the root word is kept for...
Conference Paper
Full-text available
Analyzing textual data is a very challenging task because of the huge volume of data generated daily. Fundamental issues in text analysis include the lack of structure in document datasets, the need for various preprocessing steps %(e.g., stem or lemma extraction, part-of-speech tagging, named entities recognition...), and performance and scaling i...
Conference Paper
Full-text available
Many software applications store confidential data in databases. This inevitably creates a security breach and once the access measures have been overcome the content will be fully exposed. For this reason, different algorithms are sometimes used to encrypt the data. This paper will put in balance the compromises that have to be made for deciding t...
Conference Paper
Full-text available
To increase data security and information privacy, different algorithms are sometimes used to encrypt the data. However, as the algorithms are studied in detail, their effectiveness begins to decline. In this paper a new solution is proposed to strengthen the data security: New Scytale algorithm. This approach is based on an old technique improved...
Article
Full-text available
Image segmentation is a branch of the image processing domain which involves partitioning the image into multiple segments referred to as sets of pixels. The segmentation purpose is to change the representation of an image into a form that is analyzed easier and less processed in order to extract information based on the colors in the image. This p...
Conference Paper
Full-text available
Topic Modeling is a type of statistical model that tries to determine the topics present in a corpus of documents. The accuracy measures applied to clustering algorithm can also be used to assess the accuracy of topic modeling algorithms because determining topics for documents is similar with clustering them. This paper presents an experimental va...
Article
Full-text available
In a rapidly growing digital world there is the possibility to query and discover data, but the most important issue is what resources are needed and how quickly data can be accessed. For several years ago, the grid systems, cloud systems and distributed database systems have replaced independent databases, because their computing power is much hig...
Conference Paper
Full-text available
Twitter presents an unparalleled opportunity for researchers from various fields to gather valuable and genuine textual data from millions of people. However, the collection process , as well as the analysis of these data require different kinds of skills (e.g. programing, data mining) which can be an obstacle for people who do not have this backgr...
Conference Paper
Full-text available
Each year the EGC conference gathers researchers and practitioners from the knowledge discovery and management domain to present their latest advances. This year's edition features an open challenge that encourages participants to leverage the EGC rich anthology which spans from 2004 to 2015. The ultimate goal is to highlight the dynamics of the co...