Article
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The paper presents the legal, organisational and technical perspectives regarding the implementation of the Slovenian national open access infrastructure for electronic theses and dissertations as well as for research publications. The infrastructure consists of four institutional repositories and a national portal that aggregates content from the university repositories and other Slovenian archives in order to provide a common search engine, recommendation of similar publications, and similar text detection. We have developed the software which is integrated with the universities' information and authentication systems and with the COBISS.SI. During the project the necessary legal background was defined and processes for mandatory submission of electronic theses and dissertations as well as of research publications were designed. The processes for data exchange between the institutional repositories and the national portal, and the processes for similar text detection and recommendation system were established. Bilingual web and mobile applications, a recommendation system and the interface suitable for persons with disabilities are provided to the users from around the world. The repositories are an effective promotion tool for universities and their researchers. It is expected that they will improve the recognition of Slovenian universities in the world. The complex national open access infrastructure with similar text detection support and integration with other systems will enable the storage of almost eighty percent of peer-reviewed scientific papers, annually published by Slovenian researchers. The majority of electronic theses and dissertations yearly produced at the Slovenian higher education institutions will also be accessible.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

Chapter
This paper presents a dataset and supervised learning experiments for term extraction from Slovene academic texts. Term candidates in the dataset were extracted via morphosyntactic patterns and annotated for their termness by four annotators. Experiments on the dataset show that most co-occurrence statistics, applied after morphosyntactic patterns and a frequency threshold, perform close to random and that the results can be significantly improved by combining, with supervised machine learning, all the seven statistic measures included in the dataset. On multi-word terms the model using all statistics obtains an AUC of 0.736 while the best single statistic produces only AUC 0.590. Among many additional candidate features, only adding multi-word morphosyntactic pattern information and length of the single-word term candidates achieves further improvements of the results. KeywordsTerminology extractionSupervised machine learningSlovene language
Book
Full-text available
A concise introduction to the basics of open access, describing what it is (and isn't) and showing that it is easy, fast, inexpensive, legal, and beneficial. The Internet lets us share perfect copies of our work with a worldwide audience at virtually no cost. We take advantage of this revolutionary opportunity when we make our work “open access”: digital, online, free of charge, and free of most copyright and licensing restrictions. Open access is made possible by the Internet and copyright-holder consent, and many authors, musicians, filmmakers, and other creators who depend on royalties are understandably unwilling to give their consent. But for 350 years, scholars have written peer-reviewed journal articles for impact, not for money, and are free to consent to open access without losing revenue. In this concise introduction, Peter Suber tells us what open access is and isn't, how it benefits authors and readers of research, how we pay for it, how it avoids copyright problems, how it has moved from the periphery to the mainstream, and what its future may hold. Distilling a decade of Suber's influential writing and thinking about open access, this is the indispensable book on the subject for researchers, librarians, administrators, funders, publishers, and policy makers.
Article
Full-text available
This article describes some common problems faced in natural language processing. The main problem consist of a user given sentence, which has to be matched against an existing knowledge base, consisting of semantically described words or phrases. Some main problems in this process are outlined and the most common solutions used in natural language processing are overviewed. A sequence matching algorithm is introduced as an alternative solution and its advantages over the existing approaches are explained. The algorithm is explained in detail where the longest subsequences discovery algorithm is explained first. Then the major components of the similarity measure are defined and the computation of concurrence and dispersion measure is presented. Results of the algorithms performance on a test set are then shown and different implementations of algorithm usage are discussed. The work is concluded with some ideas for the future and some examples where our approach can be practically used.
Article
Full-text available
The HUNOR (HUNgarian Open Access Repositories) consortium was established in 2008 by the libraries of Hungarian higher education institutions and the Library of the Hungarian Academy of Sciences to advance national open access practices. The members of HUNOR are dedicated to promoting Hungarian research both nationally and internationally and to achieving effective dissemination of scientific outputs through the implementation of a national infrastructure of open access repositories. Other proposed activities include the organization of a methodology centre, adopting international know-how and standards, the establishment of complementary scientific communication channels, and international relations.As coordinator of HUNOR, the author presents an overview of the Hungarian research repository infrastructure, the achievements, difficulties and goals of the HUNOR Collaboration.
Article
Full-text available
The development of science is accompanied by growth of scholarly publications, primarily in the form of articles in peer-reviewed journals. Scientific work is often evaluated through the number of scientific publications in international journals and their citations. This article discusses the impact of open access (OA) on the number of citations for an institution from the field of civil engineering. We analyzed articles, published in 2007 in 14 international journals with impact factor, which are included in the Journal Citation Reports subject category “Civil Engineering”. The influence of open access on the number of citations was analyzed. The aim of our research was to determine if open access articles from the field of civil engineering receive more citations than non-open access articles. Based on the value of impact factor and ranking in quartiles, we also looked at the influence of the rank of journals on the number of citations, separately for OA and Non OA articles, in databases Web of Science (WOS), Scopus and Google Scholar. For 2,026 studied articles we found out that 22 % of them were published as OA articles. They received 29 % of all citations in the observed period. We can conclude by the significance level 5 % or less that in the databases WOS and Scopus the articles from top ranked journals (first quartile) achieved more citations than Non OA articles. This argument can be confirmed for some other journals from second quartile as well, while for the journals ranked into the third quartile it can’t be confirmed. This could be confirmed only partly for journals from the second quartile, and would not be confirmed for journals ranked into the third quartile. This shows that open access is not a sufficient condition for citation, but increases the number of citations for articles published in journals with high impact.
Article
Full-text available
Purpose DBpedia extracts structured information from Wikipedia, interlinks it with other knowledge bases and freely publishes the results on the web using Linked Data and SPARQL. However, the DBpedia release process is heavyweight and releases are sometimes based on several months old data. DBpedia‐Live solves this problem by providing a live synchronization method based on the update stream of Wikipedia. This paper seeks to address these issues. Design/methodology/approach Wikipedia provides DBpedia with a continuous stream of updates, i.e. a stream of articles, which were recently updated. DBpedia‐Live processes that stream on the fly to obtain RDF data and stores the extracted data back to DBpedia. DBpedia‐Live publishes the newly added/deleted triples in files, in order to enable synchronization between the DBpedia endpoint and other DBpedia mirrors. Findings During the realization of DBpedia‐Live the authors learned that it is crucial to process Wikipedia updates in a priority queue. Recently‐updated Wikipedia articles should have the highest priority, over mapping‐changes and unmodified pages. An overall finding is that there are plenty of opportunities arising from the emerging Web of Data for librarians. Practical implications DBpedia had and has a great effect on the Web of Data and became a crystallization point for it. Many companies and researchers use DBpedia and its public services to improve their applications and research approaches. The DBpedia‐Live framework improves DBpedia further by timely synchronizing it with Wikipedia, which is relevant for many use cases requiring up‐to‐date information. Originality/value The new DBpedia‐Live framework adds new features to the old DBpedia‐Live framework, e.g. abstract extraction, ontology changes, and changesets publication.
Article
Full-text available
Purpose To describe the issues involved in the introduction of mandatory submission of electronic theses at Cranfield University. Design/methodology/approach Background information on how the availability of e‐theses has developed at Cranfield University is provided along with discussions and advice on issues such as the choice of software, thesis submission workflow and timeframes, particularly in relation to the publication of thesis‐related articles. It also looks at metadata issues as well as both retrieval and usage of electronic theses. Finally it describes how the service has expanded from e‐theses to other types of material and to the development and expansion of an institutional repository for Cranfield. Findings It is shown that there are a number of issues that will need to be addressed from the points of view of librarians, academic staff and registry staff and that one effective method of managing the process is to set up a working group with all stakeholders in the process. There is a clear need for administrative procedures to be discussed in detail and a recognition that the time involved in changing regulations may be significant. Practical implications It is clear that most of the issues that have arisen at Cranfield as outlined in the paper will be mirrored at other institutions that are considering the same changes, and so those institutions looking at the area of e‐thesis submission may gain some useful insights. Originality/value This paper provides useful advice on the issues that will arise as institutions go through the process of introducing the mandatory submission of electronic theses.
Conference Paper
Full-text available
This paper describes a simple way of adapting the BM25 ranking formula to deal with structured documents. In the past it has been common to compute scores for the individual fields (e.g. title and body) independently and then combine these scores (typically linearly) to arrive at a final score for the document. We highlight how this approach can lead to poor performance by breaking the carefully constructed non-linear saturation of term frequency in the BM25 function. We propose a much more intuitive alternative which weights term frequencies before the non-linear term frequency saturation function is applied. In this scheme, a structured document with a title weight of two is mapped to an unstructured document with the title content repeated twice. This more verbose unstructured document is then ranked in the usual way. We demonstrate the advantages of this method with experiments on Reuters Vol1 and the TREC dotGov collection.
Conference Paper
Full-text available
Hash-based similarity search reduces a continuous similarity rela- tion to the binary concept "similar or not similar": two feature vec- tors are considered as similar if they are mapped on the same hash key. From its runtime performance this principle is unequaled— while being unaffected by dimensionality concerns at the same time. Similarity hashing is applied with great success for near sim- ilarity search in large document collections, and it is considered as a key technology for near-duplicate detection and plagiarism anal- ysis. This papers reveals the design principles behind hash-based search methods and presents them in a unified way. We introduce new stress statistics that are suited to analyze the performance of hash-based search methods, and we explain the rationale of their effectiveness. Based on these insights, we show how optimum hash functions for similarity search can be derived. We also present new results of a comparative study between different hash-based search methods.
Article
Full-text available
As one of the most successful approaches to building recommender systems, collaborative filtering ( CF ) uses the known preferences of a group of users to make recommendations or predictions of the unknown preferences for other users. In this paper, we first introduce CF tasks and their main challenges, such as data sparsity, scalability, synonymy, gray sheep, shilling attacks, privacy protection, etc., and their possible solutions. We then present three main categories of CF techniques: memory-based, model-based, and hybrid CF algorithms (that combine CF with other recommendation techniques), with examples for representative algorithms of each category, and analysis of their predictive performance and their ability to address the challenges. From basic techniques to the state-of-the-art, we attempt to present a comprehensive survey for CF techniques, which can be served as a roadmap for research and practice in this area.
Article
Purpose – This paper aims to present the experiences of SEAFDEC/AQD library staff in digitizing institutional publications and developing an institutional repository (IR). Design/methodology/approach – SEAFDEC/AQD IR or SAIR provides a reliable means for its researchers to store, preserve, share their research outputs, enable easy access to and increase the visibility of its scientific publications. The repository uses DSpace customized with some add-ons. Details on the digitization hardware and software, layout, delivery format, and persistent identifier used are provided. Findings – As of March 2012, the repository contains 771 items with 541 downloadable PDFs. SAIR had 88,287 item views, 69,249 PDF downloads and 271,978 searches. SAIR is registered to and indexed by OpenDOAR, ROAR, Google Scholar and WorldCat. It is harvested by AVANO Ifremer, BASE, Sciencegate.ch and OAIster. Initial impact based on indicators in webometrics ranking web of world repositories and research centers was presented. Reluctance to contribute to IR has been observed by the library staff among SEAFDEC/AQD researchers. Research limitations/implications – The IR can be an effective tool to promote institutional publications and those written by researchers in peer-reviewed journals and to generate higher citations through increased visibility. IR submission policy and procedures are being drafted by the library staff. Practical implications – SAIR provides free access to all in-house publications of SEAFDEC/AQD. Full-text digitized copies of fish farmer-friendly materials like books, handbooks, policy guidebooks, extension manuals, institutional reports, and newsletters can be downloaded. Originality/value – SAIR is one of only three open access institutional repositories registered in the Philippines. The paper discusses the lessons learned and issues to be addressed in developing an IR of value to other institutions considering similar projects. Future plans and further development are also presented.
Article
Recommender systems have developed in parallel with the web. They were initially based on demographic, content-based and collaborative filtering. Currently, these systems are incorporating social information. In the future, they will use implicit, local and personal information from the Internet of things. This article provides an overview of recommender systems as well as collaborative filtering methods and algorithms; it also explains their evolution, provides an original classification for these systems, identifies areas of future implementation and develops certain areas selected for past, present or future importance.
Article
Professors contribute to Institutional Repositories (IRs) to make their materials widely accessible in keeping with the benefits of Open Access. However, universities' commitment to IRs depends on building trust with faculty and solving copyright concerns. Digital preservation and copyright management in IRs should be strengthened to increase faculty participation.
Article
Various surveys are being conducted to rank business schools in India. They give importance to parameters like placements, brand value and intellectual capital. Intellectual capital of a business school is the sum of its human capital, structural capital and customer capital. The structural capital consists of the published scholarly material of its faculty and students. Use of technologies like institutional repositories for capturing the structural component of the intellectual capital and enabling knowledge sharing in Academic Institutions especially in developing countries like India are emerging.This paper explores creation of a pilot institutional repository at the Icfai Business School, Ahmedabad and discusses a survey conducted to ascertain different considerations for implementing an institutional repository, the current status and future scope.
Article
Purpose – This paper proposes indicators for measuring the success of institutional repositories based on their demonstrated integration with other research initiatives and provides a snapshot of the current state of selected institutional repositories in Canada through a review of their web presence and their integration with university library and research pages. Design/methodology/approach – Using the proposed indicators, an examination of the web sites of selected Canadian universities who are participating in the Canadian Association of Research Libraries Institutional Repository project was undertaken. Findings – Institutional repositories are growing in Canada and that the Canadian IR community is on the way to the proposed model future – integration with existing university research practices. Originality/value – Indicators such as those proposed in the paper can provide a basic framework for evaluating IR projects and highlight areas where the library can generate additional support for these worthwhile projects.
Article
This paper presents an overview of recent developments of the Co-operative Online Bibliographic System and Services (COBISS) system in Slovenia. The COBISS system interconnects over 250 of the largest Slovenian libraries into a uniform Slovenian library information system. Also, COBISS is used by other independent library systems in some of the countries on the territory of the former Yugoslavia. The development of the COBISS shared cataloguing system and services runs parallel with the second software generation (COBISS2) and the new, object-oriented technological platform (COBISS3). On COBISS2 the authority control and other services were introduced, whereas on COBISS3 the development of the interlibrary loan applications, acquisitions and other services is being carried out.
Article
Plagiarism can be of many different natures, ranging from copying texts to adopting ideas, without giving credit to its originator. This paper presents a new taxonomy of plagiarism that highlights differences between literal plagiarism and intelligent plagiarism, from the plagiarist's behavioral point of view. The taxonomy supports deep understanding of different linguistic patterns in committing plagiarism, for example, changing texts into semantically equivalent but with different words and organization, shortening texts with concept generalization and specification, and adopting ideas and important contributions of others. Different textual features that characterize different plagiarism types are discussed. Systematic frameworks and methods of monolingual, extrinsic, intrinsic, and cross-lingual plagiarism detection are surveyed and correlated with plagiarism types, which are listed in the taxonomy. We conduct extensive study of state-of-the-art techniques for plagiarism detection, including character n-gram-based (CNG), vector-based (VEC), syntax-based (SYN), semantic-based (SEM), fuzzy-based (FUZZY), structural-based (STRUC), stylometric-based (STYLE), and cross-lingual techniques (CROSS). Our study corroborates that existing systems for plagiarism detection focus on copying text but fail to detect intelligent plagiarism when ideas are presented in different words.
Article
Purpose – The purpose of this paper is to provide a summary of the experiences of setting up an institutional repository at Loughborough University, focusing on some of the key issues that it was necessary to consider, the choices made and the challenges overcome. Design/methodology/approach – The paper outlines the various decision processes involved during the 12‐month pilot phase. These include: choosing appropriate software; customising DSpace; implementing licences; and gathering content for the repository. Findings – The experiences highlight some of the challenges involved in setting up an institutional repository. Originality/value – This paper gives a direct insight into the different types of work involved in the setting up of an institutional repository and is an example of a system set up outside the boundaries of project funding.
Article
We survey the current techniques to cope with the problem of string matching allowing errors. This is becoming a more and more relevant issue for many fast growing areas such as information retrieval and computational biology. We focus on online searching and mostly on edit distance, explaining the problem and its relevance, its statistical behavior, its history and current developments, and the central ideas of the algorithms and their complexities. We present a number of experiments to compare the performance of the different algorithms and show which are the best choices according to each case. We conclude with some future work directions and open problems. 1
Sistem priporočanja dokumentov in analiza kvalitete vsebinskega priporočanja pri različnih obdelavah vhodnega besedila. Magistrsko delo. Maribor: Fakulteta za elektrotehniko, računalništvo in informatiko
  • M Borovič
Borovič, M. (2012). Sistem priporočanja dokumentov in analiza kvalitete vsebinskega priporočanja pri različnih obdelavah vhodnega besedila. Magistrsko delo. Maribor: Fakulteta za elektrotehniko, računalništvo in informatiko. Pridobljeno 4. junija 2014 s spletne strani: http://dkum.uni-mb.si/IzpisGradiva.php?id=37811
Prost dostop do dosežkov slovenskih znanstvenikov: zbornik prispevkov 4. skupnega posvetovanja Sekcije za specialne knjižnice in Sekcije za visokošolske knjižnice Zveze bibliotekarskih društev Slovenije. Ljubljana: Zveza bibliotekarskih društev Slovenije. Pridobljeno 8. avgusta
  • M Božič
Božič, M. in Zemljič, I. (ur.). (2010). Prost dostop do dosežkov slovenskih znanstvenikov: zbornik prispevkov 4. skupnega posvetovanja Sekcije za specialne knjižnice in Sekcije za visokošolske knjižnice Zveze bibliotekarskih društev Slovenije. Ljubljana: Zveza bibliotekarskih društev Slovenije. Pridobljeno 8. avgusta 2014 s spletne strani http://www.dlib. si/?URN=URN:NBN:SI:DOC-JGYFFYW7
TextProc -a natural language processing framework and its use as plagiarism detection system
  • J Brezovnik
  • M Ojsteršek
Brezovnik, J. in Ojsteršek, M. (2011a). TextProc -a natural language processing framework and its use as plagiarism detection system. International journal of education and information technologies, 5 (3), 293-300. Pridobljeno 4. junija 2014 s spletne strani: http://www.naun. org/multimedia/NAUN/educationinformation/19-872.pdf
Advanced features of Digital library of University of Maribor
  • J Brezovnik
  • M Ojsteršek
Brezovnik, J. in Ojsteršek, M. (2011). Advanced features of Digital library of University of Maribor. International journal of education and information technologies, 5 (1), 34-41. Pridobljeno 4. junija 2014 s spletne strani: http://www.naun.org/main/NAUN/ educationinformation/19-520.pdf
Cooperative Online Bibliographic System and Services
  • Cobiss Si - Slovenian
Greek open access infrastructure
  • Openarchives
  • Gr
Fakultete za gradbeništvo in geodezijo Univerze v Ljubljani
  • Drugg - Repozitorij
Pridobljeno 10. junija 2014 s spletne strani
  • Openaire Guidelines
Electronic theses development at Cranfield University. Program: electronic library and information systems
  • S J Bevan
Bevan, S. J. (2005). Electronic theses development at Cranfield University. Program: electronic library and information systems, 39 (2), 100-111.
Towards better access to scientific information: boosting the benefits of public investments in research
  • P Suber
Suber, P. (2012). Open access. Cambridge: MIT Press. Towards better access to scientific information: boosting the benefits of public investments in research (2012). Bruselj: Evropska komisija. Pridobljeno 4. junija 2014 s spletne strani: http:// ec.europa.eu/research/science-society/document_library/pdf_06/era-communicationtowards-better-access-to-scientific-information_en.pdf
SI -Slovenian Cooperative Online Bibliographic System and Services
  • Podatkov Arhiv Družboslovnih
Arhiv družboslovnih podatkov. Pridobljeno 4. junija 2014 s spletne strani: http://www.adp.fdv. uni-lj.si/ COBISS.SI -Slovenian Cooperative Online Bibliographic System and Services. Pridobljeno 4. junija 2014 s spletne strani: http://www.cobiss.si/ Digitalna knjižnica Slovenije. Pridobljeno 4. junija 2014 s spletne strani: http://www.dlib.si/ Digitalna knjižnica Univerze v Mariboru. Pridobljeno 4. junija 2014 s spletne strani: http://dkum. uni-mb.si/ DiVA: Digitala Vetenskapliga Arkivet. Pridobljeno 4. junija 2014 s spletne strani: http://www.divaportal.org/smash/search.jsf
FRI -repozitorij Fakultete za računalništvo in informatiko
  • Dkmors -Digitalna Knjižnica Ministrstva Za
  • Obrambo
DKMORS -digitalna knjižnica Ministrstva za obrambo. Pridobljeno 4. junija 2014 s spletne strani: http://dk.mors.si/ Dokumentacija spletnega API-ja digitalne knjižnice. Pridobljeno 8. avgusta 2014 s spletne strani: http://repozitorij.uni-lj.si/pomoc.php?id=webapi DRUGG -repozitorij Fakultete za gradbeništvo in geodezijo Univerze v Ljubljani. Pridobljeno 4. junija 2014 s spletne strani: http://drugg.fgg.uni-lj.si/ ePrints.FRI -repozitorij Fakultete za računalništvo in informatiko, Univerze v Ljubljani. Pridobljeno 4. junija 2014 s spletne strani: http://eprints.fri.uni-lj.si/ Guidelines on Data Management in Horizon 2020. Pridobljeno 4. junija 2014 s spletne strani: http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/ h2020-hi-oa-data-mgt_en.pdf Nacionalni portal odprte znanosti. Pridobljeno 8. avgusta 2014 s spletne strani: http:// openscience.si/ NARCIS: the gateway to scholarly information in The Netherlands. Pridobljeno 4. junija 2014 s spletne strani: http://www.narcis.nl/ NORA: Norwegian Open Research Archives. Pridobljeno 4. junija 2014 s spletne strani: http:// www.ub.uio.no/nora/search.html?siteLanguage=eng OA-Netzwerk. Pridobljeno 4. junija 2014 s spletne strani: http://oansuche.open-access.net/ oansearch/ OpenAIRE Guidelines. Pridobljeno 10. junija 2014 s spletne strani https://guidelines.openaire.eu/ wiki/Main_Page OpenAIRE. Pridobljeno 4. junija 2014 s spletne strani: https://www.openaire.eu/ Openarchives.gr: Greek open access infrastructure. Pridobljeno 4. junija 2014 s spletne strani: http://www.openarchives.gr/ PeFprints -repozitorij Pedagoške fakultete Univerze v Ljubljani. Pridobljeno 4. junija 2014 s spletne strani: http://pefprints.pef.uni-lj.si/ PLEIADI: Portale per la Letteratura scientifica Elettronica Italiana su Archivi aperti e Depositi Istituzionali. Pridobljeno 4. junija 2014 s spletne strani: http://www.openarchives.it/pleiadi/ Poland Digital Libraries Federation -Federacja bibliotek cyfrowych. Pridobljeno 4. junija 2014 s spletne strani: http://fbc.pionier.net.pl/owoc RCAAP: Repositório Cientifico de Acesso Aberto de Portugal. Pridobljeno 4. junija 2014 s spletne strani: http://www.rcaap.pt/ RECOLECTA: Recolector de Ciencia Abierta. Pridobljeno 4. junija 2014 s spletne strani: http://www. recolecta.net/