
Gordana M Pavlovic-LazeticUniversity of Belgrade · Faculty of Mathematics
Gordana M Pavlovic-Lazetic
Ph.D in Mathematics
About
51
Publications
6,337
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
279
Citations
Publications
Publications (51)
Email signature is considered imperative for effective business email communication. Despite the growth of social media, it is still a powerful tool that can be used as a business card in the online world which presents all business information including name, contact number and address to recipients. Signatures can vary a lot in their structure an...
Next Generation Sequencing (NGS) analysis has become a widely used method for studying the structure of DNA and RNA, but complexity of the procedure leads to obtaining error-prone datasets which need to be cleansed in order to avoid misinterpretation of data. We address the usage and proper interpretations of characteristic metrics for RNA sequenci...
The correlation of molecular function and protein intrinsic disorder is an important aspect of understanding the relationship between function, sequence and structure. This research was inspired by statistical correlation evaluation method described by Xie et al. (J Proteome Res 6 (2007) 1882–1898, reference study), where the authors analyzed the r...
Purpose
The purpose of this paper is to provide a methodology for automatic annotation of a multimedia collection of intangible cultural heritage mostly in the form of interviews. Assigned annotations provide a way to search the collection.
Design/methodology/approach
Annotation is based on automatic extraction of metadata and is conducted by name...
Background:
In the last decade and a half it has been firmly established that a large number of proteins do not adopt a well-defined (ordered) structure under physiological conditions. Such intrinsically disordered proteins (IDPs) and intrinsically disordered (protein) regions (IDRs) are involved in essential cell processes through two basic mecha...
Hierarchical text categorization (HTC) refers to assigning a text document to one or more most suitable categories from a hierarchical category space. In this paper we present two HTC techniques based on kNN and SVM machine learning techniques for categorization process and byte n-gram based document representation. They are fully language independ...
According to the last census (2015) Serbia has a population of 7.1 million people, of which 19.7% are over 65 years old. It is estimated that around 80,000 people live with dementia, although epidemiological studies have not been conducted yet. Many studies have shown that there is a need for more reliable diagnosis, as well as the education of pro...
We introduce a new language independent text categorization technique based on byte-level n-gram profiles, an n-gram weighting factors scheme, and a simple algorithm for comparing profiles. The technique does not require any morphological analysis of texts, any preprocessing steps, or any prior information about document content or language. We app...
This paper presents a software system called WebMonitoring. The system is designed for solving certain problems in the process of information search on the web. The first problem is improving entering of queries at search engines and enabling more complex searches than keyword-based ones. The second problem is providing access to web page content t...
META-NET is a Network of Excellence partially funded by the European Commission. The network currently consists of 54 research centres in 33 European countries. META-NET forges META, the Multilingual Europe Technology Alliance, a growing community of language technology professionals and organisations in Europe.
Standard Serbian is the standard national language of Serbs and the official language in the Republic of Serbia. It was formed on the basis of Ekavian and Ijekavian Neo-Štokavian South Slavic dialects and its form was determined by the reformer of the written language of the Serbs Vuk Karadžić (1787–1864), who at the same time reformed both the Cyr...
Cpпcки cтaндapдни jeзик je нaциoнaлни cтaндapдни jeзик Cpa и звaнични jeзик y Peпyлици Cpиjи. Фopмиpaн je нa ocнoвици млaђиx eкaвcкиx и иje- кaвcкиx штoкaвcкиx jyжнocлoвeнcкиx диjaлeкaтa y фopми кojy мy je oдpeдиo peфopмaтop пиcaнoг jeзикa кoд Cpa Byк Кapaџић (1787–1864), кojи je иcтoвpe- мeнo peфopмиcao и ћиpилички aлфaeт и пpaвoпиc.
У пocлeдњиx 60 гoдинa Eвpoпa je пocтaлa jeдин- cтвeнa пoлитичкa и eкoнoмcкa cтpyктypa, мaдa je кyл- тypнo и jeзички вeoмa paзнoвpcнa. To знaчи дa je, oд пopтyгaлcкoг дo пoљcкoг, oд итaлиjaнcкoг дo иcлaнд- cкoг, cвaкoднeвнa кoмyникaциja cтaнoвникa Eвpoпe, кao и кoмyникaциja y cфepи пocлoвaњa и пoлитикe, нyжнo cyoчeнa ca jeзичким пpeпpeкaмa. Инcтитy-...
During the last 60 years, Europe has become a distinct political and economic structure, yet culturally and linguistically it is still very diverse. From Portuguese to Polish and Italian to Icelandic, everyday communication between Europe’s citizens as well as communication in the spheres of business and politics is inevitably confronted by languag...
We are witnesses to a digital revolution that is dramatically impacting communication and society. Recent developments in information and communication technology are sometimes compared to Gutenberg’s invention of the printing press. What can this analogy tell us about the future of the European information society and our languages in particular?
Jeзичкe тexнoлoгиje cy coфтвepcки cиcтeми пpojeктo- вaни зa paд ca пpиpoдним jeзицимa.
Language technology is used to develop software systems designed to handle human language and are therefore often called “human language technology”. Human language comes in spoken and written forms. While speech is the oldest and in terms of human evolution the most natural form of language communication, complex information and most human knowled...
Cвeдoци cмo дигитaлнe peвoлyциje кoja дpaмaтичнo yтичe нa кoмyникaциjy и дpyштвo.
META-HET je мpeжa извpcнocти кojy финaнcиpa Eвpoпcкa yниja. Њy тpeнyтнo чинe 54 члaнa, кojи пpeдcтaвљajy 33 eвpoпcкe зeмљe. META-HET пoдcтичe тexнoлoшки caвeз вишejeзичнe Eвpoпe (Multilingual Europe Technology Alliance – META), зajeдницy пpoфecиoнaлaцa и opгaнизaциja ca пo- дpyчja jeзичкиx тexнoлoгиja из Eвpoпe.
The paper presents a new method for extracting information from semi-structured resources, based on finite state transducers.
The method has two clearly distinguished phases. The first phase - pre-processing phase - strongly relies upon the analysis
of the document structure and it is used for locating records of data in the text. The second phase...
A significant number of proteins have been shown to be intrinsically disordered, meaning that they lack a fixed 3 D structure or contain regions that do not posses a well defined 3 D structure. It has also been proven that a protein's disorder content is related to its function. We have performed an exhaustive analysis and comparison of the disorde...
Document classification based on the lexical-semantic network, wordnet, is presented. Two types of document classification in Serbian have been experimented with - classification based on chosen concepts from Serbian WordNet (SWN) and proper names-based classification. Conceptual document classification criteria are constructed from hierarchies roo...
The paper presents a novel, n-gram-based method for analysis of bacterial genome segments known as genomic islands (GIs). Identification of GIs in bacterial genomes is an important task since many of them represent inserts that may contribute to bacterial evolution and pathogenesis. In order to characterize and distinguish GIs from rest of the geno...
There are two approaches to identifying genomic and pathogenesis islands (GI/PAIs) in bacterial genomes: the compositional and the functional, based on DNA or protein level composition and gene function, respectively. We applied n-gram analysis in addition to other compositional features, combined them by union and intersection and defined two meas...
When dealing with data-centric XML documents, it is possible to convert XML documents into a relational database, which can then be queried using SQL. Such relational databases are called XML-enabled databases. On the other hand, the best choice for storing, updating and retrieving document-centric XML documents is usually a native XML database (NX...
Text elements can be counted in several ways. Depending on the counting unit, different views of the structure of a text as
well as of the structure of its parts such as words, may be obtained. In this paper, we present different distributions in
counting words in Serbian, applied to samples chosen from a corpus developed by the Natural Language Pr...
A dataset of 103 SARS-CoV isolates (101 human patients and 2 palm civets) was investigated on different aspects of genome polymorphism and isolate classification. The number and the distribution of single nucleotide variations (SNVs) and insertions and deletions, with respect to a "profile", were determined and discussed ("profile" being a sequence...
Figures S1-S10, Tables S1-S6
Text processing in Serbian is based on the Intex format system of electronic dictionaries. Although lexical recognition is
successful for 75% to 90% of word forms (depending on the type of text), some categories of words remain unrecognized. In
this paper we present two aspects of e-dictionary enhancement that provide for additional recognition of...
We have compared 38 isolates of the SARS-CoV complete genome. The main goal was twofold: first, to analyze and compare nucleotide sequences and to identify positions of single nucleotide polymorphism (SNP), insertions and deletions, and second, to group them according to sequence similarity, eventually pointing to phylogeny of SARS-CoV isolates. Th...
Positions of SNPs, insertions and deletions in C1 group. Positions of SNPs, insertions and deletions on both TWH and ZJ01 scales are given. The total number of SNPs is given. SNPs are in red bold. A minus sign (-) denotes deletion (insertion).
Positions of SNPs in A2 group. Positions are given on the TWH scale. The same notation is applied as in the additional file 1.
Positions of SNPs and insertions in B group. The exact positions on all four scales (TWH, GD01, SZ3 and SZ16) are given. ID of the only annotated isolate (GD01) is in grey box; SNPs in ORFs (or corresponding to those in ORFs, for non-annotated isolates) are in red bold. The total number of SNPs per isolate is given at the bottom, as well as the num...
Positions of SNPs in A1 group. Positions are given on the relative and TWH scales. IDs of annotated isolates are in grey boxes; SNPs in ORFs (or corresponding to those in ORFs, for non-annotated isolate) are in red bold and SNPs in IGRs in blue bold. The total number of SNPs per isolate is given at the bottom, as well as number of SNPs in ORFs and...
Positions of SNPs, insertions and deletions in C2 group. Positions of SNPs, insertions and deletions on both TWH and ZMY 1 scales are given. The total number of SNPs is given. SNPs are in red bold. A minus sign (-) denotes deletion (insertion).
In this paper we present two techniques for using textual and lexical resources, such as corpora and dictionaries, in validation and refinement of Serbian wordnet. We first describe how the existing monolingual Serbian cor-pus, the bilingual Serbian/English (S/E) and Serbian/French (S/F) aligned cor-pora, and the appropriate morphological e-diction...
In this paper we define a set of frequency parameters to be used in synset validation based on corpora. These parameters indicate the coverage of the corpus by wordnet literals, the importance of one sense of a literal in comparison to the others, as well as the importance of one literal in a synset in comparison to other literals in the same synse...
In this paper we describe the resources and tools for the processing of texts written in Serbian. Most of the resources have been developed within the University of Belgrade NLP group located at the Faculty of Mathematics. The main features of these resources, namely available monolingual and multilingual corpora and various e-dictionaries are brie...
In this paper we describe how the existing monolingual Serbian corpus, the bilingual Serbian/English (S/E) and Serbian/French
(S/F) aligned corpora, and the appropriate morphological e-dictionaries, have been used in validation, development, and refinement
of Serbian WordNet. The influence of different derivational processes, e.g. derivation of aug...
The recent results of the research in the construction of the electronic dictionary of Serbo-Croatian are presented. This research involves the development of methodological and theoretical principles for the construction of the lexicon for a highly inflective language. The problems that emerge on different levels of language standardization such a...