Science topic
Wikipedia - Science topic
Explore the latest publications in Wikipedia, and find Wikipedia experts.
Publications related to Wikipedia (10,000)
Sorted by most recent
Research on Retrieval-Augmented Generation for low-resource languages has been sparse because of limited resources. To address this, we focus on Bangla, a low-resource language, and have created a dataset of 200 question-answer pairs as a basis for our study from Bangla Wikipedia dump data. This paper introduces the TraSe architecture, which enhanc...
Um es vorwegzunehmen, der Rez. empfiehlt die Lektüre des beachtlichen Wälzers sehr. Und zwar am besten in einem Ritt, ohne große Unterbrechungen. Zwei Tage eintauchen und nichts anderes nebenher lesen. Dann erschließt sich ein komplexes Bild der letzten 200 Jahre, in denen Menschen in Deutschland nach Neuorientierung suchten und sich dabei von hind...
Introduction: Libraries hold large amounts of bibliographic data, with great potential for enrichment with linked open data. The New Zealand Thesis Project explored this potential by uploading thesis metadata records from New Zealand institutional repositories to Wikidata, a collaborative linked data knowledge base. Description of Project: Nine New...
To cope with the large number of publications, more and more researchers are automatically extracting data of interest using natural language processing methods based on supervised learning. Much data, especially in the natural and engineering sciences, is quantitative, but there is a lack of datasets for identifying quantities and their context in...
In an era of rampant misinformation, the need for robust, scalable, and adaptable fact-checking systems has never been more pressing. Traditional fact-checking approaches often rely on supervised machine learning models trained on domain-specific datasets, requiring extensive labeled data for each area of interest-be it politics, health, or science...
This paper presents UniBERT, a compact multilingual language model that leverages an innovative training framework integrating three components: masked language modeling, adversarial training, and knowledge distillation. Pre-trained on a meticulously curated Wikipedia corpus spanning 107 languages, UniBERT is designed to reduce the computational de...
The importance of greenery in conditions when the quality of the environment is one of the most pressing problems of life in the city, acquires a new dimension. Excessive construction of the area, increased concentration of residents, high frequency of traffic, presence of a large number of pollutants, lack of natural surroundings, are just some of...
Automated content moderation for collaborative knowledge hubs like Wikipedia or Wikidata is an important yet challenging task due to multiple factors. In this paper, we construct a database of discussions happening around articles marked for deletion in several Wikis and in three languages, which we then use to evaluate a range of LMs on different...
The public release of ChatGPT in late 2022 has resulted in considerable publicity and has led to widespread discussion of the usefulness and capabilities of generative Artificial intelligence (Ai) language models. Its ability to extract and summarise data from textual sources and present them as human-like contextual responses makes it an eminently...
Le grand sociologue français Emile Durkheim (1858-1917) est surtout connu aujourd'hui pour ses recherches sur le suicide ("Le Suicide: Étude de sociologie", 1897) et sur la religion ("Les Formes élémentaires de la vie religieuse : le système totémique en Australie", 1912). Comme ce dernier titre indique, Durkheim a employé sa méthode "génétique" qu...
This is a tutorial review of the paper “Analysis of Collatz Conjecture Rules” by Kirk O. Hahn. The reviewed paper has been published in a peer-reviewed journal, and as such Hahn tried to get it listed on a Wikipedia page as a Proof of Collatz. Such listing was not allowed.
See what you think.
Voice assistants have become an integral
part of modern human-computer interaction, enabling
hands-free operation through natural language processing
(NLP) and speech recognition technologies. This paper
presents the development of a Python-based voice assistant,
Cyrus, designed to perform various tasks, including web
browsing, email automation, an...
Physicist Eugene Guth was fortunate to have one-on-one meetings with Albert Einstein once in Europe and once in the USA. The meeting in Europe was thought to have focused on Einstein's Ph.D. thesis and subsequent work on the viscosity theory of suspensions and rheology. The meeting in the USA was thought to have focused more on general topics in ph...
Terroristische Anschläge sind nicht immer noch, sondern immer mehr und immer wieder omnipräsent und eine zentrale Bedrohung gesellschaftlicher Ordnung. Zuletzt verdeutlichte der Terrorangriff der Hamas auf Israel am 7. Oktober 2023, dass Frieden ein fragiles Gut ist und die Grenzen zum Krieg zunehmend verblassen. Terror wirkt sich insofern disrupti...
This study, conducted as part of Group A01 of the KAKENHI Transformative Research Areas, applies network analysis to 20th-century French philosophers by drawing on Wikipedia’s “Influences/Influenced by” data. By calculating metrics such as degree, betweenness, closeness, and eigenvector centrality, it illuminates the relationships and influences am...
The increasing prevalence of online misinformation has heightened the demand for automated fact-checking solutions. Large Language Models (LLMs) have emerged as potential tools for assisting in this task, but their effectiveness remains uncertain. This study evaluates the fact-checking capabilities of various open-source LLMs, focusing on their abi...
Researchers and practitioners in natural language processing and computational linguistics frequently observe and analyze the real language usage in large-scale corpora. For that purpose, they often employ off-the-shelf pattern-matching tools, such as grep, and keyword-in-context concordancers, which is widely used in corpus linguistics for gatheri...
In this paper, we present a thorough analysis of the impact of Large Language Models (LLMs) on Wikipedia, examining the evolution of Wikipedia through existing data and using simulations to explore potential risks. We begin by analyzing page views and article content to study Wikipedia's recent changes and assess the impact of LLMs. Subsequently, w...
Multi-entity question answering (MEQA) poses significant challenges for large language models (LLMs), which often struggle to consolidate scattered information across multiple documents. An example question might be "What is the distribution of IEEE Fellows among various fields of study?", which requires retrieving information from diverse sources...
Background
Conducting a monitoring study using infoveillance and notified cases might facilitate a proactive and data-driven approach to public health surveillance, risk assessment, and outbreak response. The purpose of this study was to evaluate the potential correlation and association between the annual epidemiological trend of reported cases of...
How has Wikipedia activity changed for articles with content similar to ChatGPT following its introduction? We estimate the impact using differences-in-differences models, with dissimilar Wikipedia articles as a baseline for comparison, to examine how changes in voluntary knowledge contributions and information-seeking behavior differ by article co...
Aims: In Tunisia, during the Coronavirus Disease 19 pandemic, the transition to e-learning was brutal. The aim of this study was to assess undergraduate medical students (UMSs)' perception of the e-learning experience at the Faculty of Medicine of Sousse, and to derive some determinants of its implementation. Methods: Eligible participants were all...
Human communication has long relied on visual media for interaction, and is facilitated by electronic devices that access visual data. Traditionally, this exchange was unidirectional, constrained to text-based queries. However, advancements in human–computer interaction have introduced technologies like reverse image search and large language model...
To achieve equitable performance across languages, multilingual large language models (LLMs) must be able to abstract knowledge beyond the language in which it was acquired. However, the current literature lacks reliable ways to measure LLMs' capability of cross-lingual knowledge transfer. To that end, we present ECLeKTic, a multilingual closed-boo...
Large Language Models (LLMs) have demonstrated substantial potential in addressing complex reasoning tasks, yet their general-purpose nature often limits their effectiveness in specialized domains such as maritime navigation. To bridge this gap, we introduce Llamarine, the first open-source LLM designed specifically for maritime navigation. Llamari...
El proyecto de investigación consta de dos partes (Hardware y Software) y tiene como finalidad el demostrar que con materiales dados de baja de las unidades educativas tales como pizarras estáticas y movibles, pueden ser reutilizadas para la creación de productos tecnológicos de punta (estantería multimedia) el mismo que tiene la forma de un árbol...
Knowledge is fundamental to societal development and individual wellbeing, with digital technologies becoming crucial for its production and consumption. This study explores the use of Wikipedia log files as a big data source to measure knowledge divides, focusing on knowledge consumption. Using the Wikipedia API to extract billions of pageviews, t...
Multi-entity question answering (MEQA) represents significant challenges for large language models (LLM) and retrieval-augmented generation (RAG) systems, which frequently struggle to consolidate scattered information across diverse documents. While existing methods excel at single-document comprehension, they often struggle with cross-document agg...
Knowledge is fundamental to societal development and individual wellbeing, with digital technologies becoming crucial for its production and consumption. This study explores the use of Wikipedia log files as a big data source to measure knowledge divides, focusing on knowledge consumption. Using the Wikipedia API to extract billions of pageviews, t...
Visual Question Answering requires models to generate accurate answers by integrating visual and textual understanding. However, VQA models still struggle with hallucinations, producing convincing but incorrect answers, particularly in knowledge-driven and Out-of-Distribution scenarios. We introduce FilterRAG, a retrieval-augmented framework that c...
Lexical simplification improves text accessibility by replacing complex words with simpler alternatives, making texts easier to read for second-language learners and individuals with reading difficulties. This study presents a new automated method for simplifying Persian text, addressing the unique linguistic challenges of the language. A specializ...
Text-to-Image models, including Stable Diffusion, have significantly improved in generating images that are highly semantically aligned with the given prompts. However, existing models may fail to produce appropriate images for the cultural concepts or objects that are not well known or underrepresented in western cultures, such as `hangari' (Korea...
As neural language models achieve human-comparable performance on Machine Reading Comprehension (MRC) and see widespread adoption, ensuring their robustness in real-world scenarios has become increasingly important. Current robustness evaluation research, though, primarily develops synthetic perturbation methods, leaving unclear how well they refle...
The ability to automatically identify whether an entity is referenced in a future context can have multiple applications including decision making, planning and trend forecasting. This paper focuses on detecting implicit future references in entity-centric texts, addressing the growing need for automated temporal analysis in information processing....
This research paper presents HILDEGARD, an application conceived to guide a semi-expert user in the domain of cultural heritage data management toward the creation of a lightweight knowledge graph tailored for supporting Automatic Story Generation (ASG). For this purpose, a subset of CIDOC-CRM classes and properties is preliminarily selected to fit...
Despite decades-long efforts to increase diversity, underrepresented social groups remain small minorities in many fields. Here, we ask whether disparities in global recognition exist for traditionally underrepresented demographic groups. We investigate whether a notable person’s demographic attributes are associated with their global recognition,...
Przestrzeganie Konstytucji Rzeczypospolitej Polskiej jest fundamentem demokratycznego państwa
prawa, gwarantującym stabilność ustrojową, ochronę praw obywatelskich oraz równowagę między
władzami. W latach 2020–2024 Polska doświadczyła intensywnych debat i kontrowersji związanych
z funkcjonowaniem systemu konstytucyjnego. W tym okresie szczególną uw...
Resumo: Os ambientes colaborativos na Web, como a Wikipédia, configuram-se como espaços da cultura participativa que refletem dimensões sociais. Por meio de suas temáticas e lacunas sobre gênero, enfatiza-se a confluência com as pautas dos museus das mulheres, que oportunizam o compartilhamento, o acesso e a preservação das memórias das mulheres. N...
As large language models (LLMs) converge towards similar capabilities, the key to advancing their performance lies in identifying and incorporating valuable new information sources. However, evaluating which text collections are worth the substantial investment required for digitization, preprocessing, and integration into LLM systems remains a sig...
Dubbed “the world’s first and oldest academic social network” by a grant reviewer at the National Science Foundation, HASTAC (Humanities, Arts, Science, and Technology Alliance and Collaboratory or “Haystack”) built its first interactive website in 2002. Now, 22 years later, HASTAC has some 18,000 network members, over 400 institutional members, an...
This paper presents a novel methodology, called Word Co-occurrence SVN topic model (WCSVNtm), for document clustering and topic modeling in textual datasets. This method represents the corpus as a bipartite network of words and documents to rigorously assess the statistical significance of word co-occurrences within documents and document overlap b...
As a collaboratively edited and open-access knowledge archive, Wikipedia offers a vast dataset for training artificial intelligence (AI) applications and models, enhancing data accessibility and access to information. However, reliance on the crowd-sourced encyclopedia raises ethical issues related to data provenance, knowledge production, curation...
In the age of misinformation, hallucination -- the tendency of Large Language Models (LLMs) to generate non-factual or unfaithful responses -- represents the main risk for their global utility. Despite LLMs becoming increasingly multilingual, the vast majority of research on detecting and quantifying LLM hallucination are (a) English-centric and (b...
Large Language Models (LLMs) have gained significant popularity in recent years. Differentiating between a text written by a human and a text generated by an LLM has become almost impossible. Information hiding techniques such as digital watermarking or steganography can help by embedding information inside text without being noticed. However, exis...
Thanks to their linguistic capabilities, LLMs offer an opportunity to bridge the gap between informal mathematics and formal languages through autoformalization. However, it is still unclear how well LLMs generalize to sophisticated and naturally occurring mathematical statements. To address this gap, we investigate the task of autoformalizing real...
Currently, the world is facing challenges in accessing information on the web. Each second, millions of bytes of data are generated. Easy internet access has shifted users’ tendency towards online information retrieval systems. Notably, web search engines can retrieve relevant information from immense piles of available data. However, accessing web...
In this paper, as a first step toward summarizing and visualizing the content of discussions in the National Diet, we examine the effectiveness of a method that classifies statements from the National Diet minutes according to their roles using a BERT-based classifier. We constructed a dataset by assigning role tags—“Introduction,” “Basis,” “Opinio...
The goal of relation extraction is to recognize head and tail entities in a document and determine a relation between them. While a lot of progress was made in solving automated relation extraction in widely used languages such as English, the use of these methods for under-resourced languages and domains is limited due to the lack of training data...
Large Language Models (LLMs) are trained on Web data that might contain spelling errors made by humans. But do they become robust to similar real-world noise? In this paper, we investigate the effect of real-world spelling mistakes on the performance of 9 language models, with parameters ranging from 0.2B to 13B, in 3 different NLP tasks, namely Na...
In this article, we present a model for analyzing the co-occurrence count data derived from practical fields such as user–item or item–item data from online shopping platforms and co-occurring word–word pairs in sequences of texts. Such data contain important information for developing recommender systems or studying the relevance of items or words...
Bár a 19. századi magyar és a birodalmi közjogban az uralkodó felesége nem játszott tényleges szerepet, Erzsébet királyné a hatvanas évek végétől, politikai szerepvállalásának epizódjától kezdve mind gyakrabban jelent meg a közintézmények falain. A portrék leggyakoribb vizuális forrása Emil Rabending 1866-os fotósorozata volt, amely magyar díszruhá...
Írásomban a magyarországi polgári nőmozgalom fejlődésének fordulópontjait mutatom be Erzsébet királyné korában, pontosabban az 1840-es évek végétől az 1890-es évek közepéig. A kérdéskör bemutatásához elengedhetetlen azon politikai, gazdasági és társadalmi változások vázolása a tanulmány elején, amelyek a 19. századi nőmozgalom történetét alapvetően...
We present FoQA, a Faroese extractive question-answering (QA) dataset with 2,000 samples, created using a semi-automated approach combining Large Language Models (LLMs) and human validation. The dataset was generated from Faroese Wikipedia articles using GPT-4-turbo for initial QA generation, followed by question rephrasing to increase complexity a...
Background
ChatGPT has quickly gained popularity as a source of online health information (OHI). However, it is unclear how having a usual source of primary care (USPC) is related to OHI-seeking.
Objective
Explore how having a USPC and other characteristics thought to affect access-to-care influence the use of ChatGPT and other OHI forms.
Design...
This paper provides an overview of how virtual reality (VR) and augmented reality (AR) technologies are being employed to enhance transportation infrastructure design and to foster public engagement. It discusses definitions and background information, reviews practical applications in design and simulation, presents examples from recent academic s...
We revisit the reference determinacy (RD) assumption in the task of natural language inference (NLI), i.e., the premise and hypothesis are assumed to refer to the same context when human raters annotate a label. While RD is a practical assumption for constructing a new NLI dataset, we observe that current NLI models, which are typically trained sol...
In this paper, we explore the problem of Claim Extraction using one-to-many text generation methods, comparing LLMs, small summarization models finetuned for the task, and a previous NER-centric baseline QACG. As the current publications on Claim Extraction, Fact Extraction, Claim Generation and Check-worthy Claim Detection are quite scattered in t...
The World Wide Web is a complex interconnected digital ecosystem, where information and attention flow between platforms and communities throughout the globe. These interactions co-construct how we understand the world, reflecting and shaping public discourse. Unfortunately, researchers often struggle to understand how information circulates and ev...
In this study, ten language models are explored and compared in an English-Latvian semantic information retrieval setting, where the indexed collection of documents is written in English while the query documents are written in Latvian. Currently, no similar research has been done regarding the Latvian language. A dataset of 77736 pairs of articles...
One Librarian, One Reference (#1lib1ref) is a Wikipedia campaign aimed at getting librarians, who share values on open information, to collaborate and improve the verifiability of information in the open encyclopedia. Paired with an overview of this campaign and of the relationship between libraries and Wikipedia, this study looks at the editing pa...
The increasing use of artificial intelligence (hereafter AI) in education, particularly through large-scale language models such as ChatGPT and Bing, offers both challenges and opportunities. These models facilitate interaction in conversations and can perform tasks that require natural language processing, from answering questions to solving probl...
Citation Worthiness Detection (CWD) consists in determining which sentences, within an article or collection, should be backed up with a citation to validate the information it provides. This study, introduces ALPET, a framework combining Active Learning (AL) and Pattern-Exploiting Training (PET), to enhance CWD for languages with limited data reso...
Este artículo ofrece un análisis de la cobertura de Wikipedia en las noticias de medios digitales hispanohablantes. Se aplica la Teoría del Encuadre para examinar cómo los medios de comunicación presentan Wikipedia en los titulares de sus artículos. Se analizan 652 noticias extraídos de la base de datos Factiva entre los años 2013 y 2023. Se realiz...
A revival of “the Mie” problem, a revival of a book and contemporary research view on nanopar-ticles optical properties
Keywords: Mie Scattering, metal nanoparticles, optical tweezers
The present paper wishes to be a kind of historical revival of an “old book” that is hardly available especially to young generations of researchers and students. A b...
El artículo examina a Wikipedia y sus procesos de automatización. Esta enciclopedia sin ánimo de lucro se destaca por su modelo de gobernanza, que involucra a organizaciones y comunidades locales. Desde sus inicios, Wikipedia ha implementado bots para garantizar la sostenibilidad del proyecto y fomentar la participación de voluntarios. En la última...
Counterfeiting has been a persistent issue throughout history, often referred to as "the world's second-oldest profession" due to its widespread occurrence alongside the development of currency itself (PayComplete). The practice dates to ancient times when coins made from precious metals were first introduced. In regions like Lydia in Asia Minor, a...
Khorasan Razavi Province has a common border with the Republic of Turkmenistan from the north and northeast, with a length of about 531.6 kilometers, and a common border with Afghanistan from the east, with a length of about 302 square kilometers. In terms of internal borders, it is limited to North Khorasan Province from the northwest, South Khora...
Objective: To explore online preferences of Pakistani adolescents with reference to their gender and age group. Study Design: Qualitative study (Interpretive Paradigm) Place and Duration of Study: National Institute of Psychology, Quaid-i-Azam University, Islamabad Pakistan, from Nov 2020 to Jan 2021. Methodology: To explore online preferences of P...
The Brazil Macaúba Coconut 2024 Miracle-7 times more oil than soybean in hectare-year all for new SAF aviation sustainable fuel The Brazil Macaúba coconut (7 times more oil than soybeans) will recover millions of hectares MORE depollute air world (SAF drop in) PLUS replace up to 60% eucalyptus The Macauba coconut tree Brazil Socioenvironmental and...
The survival of a language is important for several reasons, some of which are to maintain cultural identity, tradition and wisdom. Therefore, people always try to protect their cultural identity, tradition, and wisdom, thus preserving and promoting their languages. Similarly, indigenous languages, Setswana and Punjabi face challenges in preservati...
Karadağ, Avrupa'nın güneydoğusunda Balkanların Adriyatik kıyısında bir ülkedir. Kuzeyde Bosna-Hersek, kuzeydoğuda Sırbistan, doğuda Kosova, güneydoğuda Arnavutluk, batıda ise Hırvatistan ve Adriyatik Denizi ile çevrilidir. Başkenti ve en büyük şehri Podgorica’dır (Wikipedia, 2023). Karadağ, Fatih Sultan Mehmet devrinde kısmen, 16. Yüzyılın başların...
Long short-term memory (LSTM) networks have shown great promise in sequential data analysis, especially in time-series and natural language processing. However, their potential for multi-view clustering has been largely underexplored. In this paper, we introduce a novel approach called deep multi-view clustering optimized by long short-term memory...
The Wikipedia editors' community has been actively pursuing the intent of achieving gender equality. To that end, it is important to explore the historical evolution of underlying gender disparities in Wikipedia articles. This paper presents the Wikipedia Gender Dashboard (WGD), a tool designed to enable the interaction with gender distribution dat...
Il y a dix ans, Sociologie et sociétés (vol. 45, no 2, automne 2013) publiait les premiers textes de sa nouvelle rubrique « Feuilleton », une traduction inédite de trois reportages rédigés en 1926 par Joseph Roth. La création de la rubrique s’inscrivait dans le sillage des travaux d’un groupe de recherche de l’Université de Montréal qui se penchait...
2025年1月22日(水)№1327
初めて書かれた ゼロベクトルの図 AIが重要性を教えてくれた:
矛盾(むじゅん、英: contradiction)
2025年01月08日(水)NEW !
テーマ:教育
ベクトルとは、「大きさと向きを持つ量」を意味する言葉です。 物事や考え方の向いている方向などを指し示す時に使われ、
抽象的な意味も持っています。2021/03/12
数学や物理の場合
数学や物理では、大きさと向きの2つの量を合わせてベクトルと言います。有向線分と呼ばれる向きと大きさを表すグラフで表され、似たような単語は大きさを表す「スカラー」です。
ベクトルの意味とは?ビジネスや日常生活での使い方を例文含めて紹介 | マイナビニュース
零ベクトル(ゼロベクトル、れいベク...
This paper introduces an approach to question answering over knowledge bases like Wikipedia and Wikidata by performing "question-to-question" matching and retrieval from a dense vector embedding store. Instead of embedding document content, we generate a comprehensive set of questions for each logical content unit using an instruction-tuned LLM. Th...
L’Encyclopédie des communautés et pratiques communautaires propose un corpus inédit de connaissances sur les sources, formes et modes d’organisations collaboratives et collectives d’hier, d’aujourd’hui et de demain. Rassemblant des spécialistes internationaux de diverses disciplines, dont le droit, la sociologie, l’économie ou les sciences de la ge...
INTRODUCTION: Wikipedia is a major source of information, particularly for medical and health content, citing over 4 million scholarly publications. However, the representation of research-based knowledge across different languages on Wikipedia has been under explored. This study analyses the largest database of Wikipedia citations collected to dat...
Microalgae’s adaptability and resilience to Earth’s diverse environments have evolved these photosynthetic microorganisms into a biotechnological source of industrially relevant physiological functions and biometabolites. Despite this, microalgae-based industries only exploit a handful of species. This lack of biodiversity hinders the expansion of...
Artikeln ger en kort biografi över de tio redaktörer som haft ansvaret för Socialmedicinsk tidskrift (SMT) under perioden 1924—2024. Sammanställningen baseras på ett flertal källor. Förutom texter i SMT har vi använt Svensk läkarehistoria (fjärde följden), tidningsartiklar (Dagens Nyheter och Svenska Dagbladet), Wikipedia och texter som vi funnit g...
Já ouviu falar em quimera? Na mitologia grega é uma criatura representada por um corpo composto por vários animais, tais como o leão, a cabra e a serpente ou um dragão (Priberam, 2024). Na área de Botânica, o termo é empregado quando uma planta é formada por uma mistura de tipos de células/tecidos/órgãos geneticamente diferentes (Wikipedia, 2024; B...
Wikipédia, encyclopédie en ligne pensée comme un « lieu » de collaboration ouvert à tous, ne repose pas sur une autorité politique mais s’appuie notamment sur la délibération qui doit amener vers un consensus partagé entre participants. Nous cherchons à montrer comment Wikipédia, en tant qu’espace collaboratif, concourt à la construction d’une déli...
Large Language Models (LLMs) are trained on Web data that might contain spelling errors made by humans. But do they become robust to similar real-world noise? In this paper, we investigate the effect of real-world spelling mistakes on the performance of 9 language models, with parameters ranging from 0.2B to 13B, in 3 different NLP tasks, namely Na...
In a rapidly evolving knowledge landscape and the increasing adoption of large language models, a need has emerged to keep these models continuously updated with current events. While existing benchmarks evaluate general factual recall, they often overlook two critical aspects: the ability of models to integrate evolving knowledge through continual...
This study presents an ML approach for classifying digital radio operating modes evaluated on real-world transmissions. We generated 98 different parameterized radio signals from 17 digital operating modes, transmitted each of them on the 70 cm (UHF) amateur radio band, and recorded our transmissions with two different architectures of SDR receiver...
Today, one of the most important tasks in natural language processing is answering user questions. Especially, users' questions nowadays moved from simple questions to complex questions. In recent years, several question answering datasets have been produced for Persian language, but none of them support complex open-domain and explainable question...
The Web and its main tools (Google, Wikipedia, Facebook, Twitter) deeply raise and renew fundamental questions, that everyone asks almost every day: Is this information or content true? Can I trust this author or source? These questions are not new, they have been the same with books, newspapers, broadcasting and television, and, more fundamentally...
Is the era of the reference book coming to an end? Publishers evidently don't think so. Not only have we seen an explosion in the number of handbooks, companions and dictionaries produced by the mainstream academic press in recent years, but even older, well-established ‘brands’ continue to flourish. Here is the fourth edition of Oxford University...
This study develops a question-answering system based on Retrieval-Augmented Generation (RAG) using Chinese Wikipedia and Lawbank as retrieval sources. Using TTQA and TMMLU+ as evaluation datasets, the system employs BGE-M3 for dense vector retrieval to obtain highly relevant search results and BGE-reranker to reorder these results based on query r...
We analyze the Google matrix of directed networks of Wikipedia articles related to eight recent Wikipedia language editions representing different cultures (English, Arabic, German, Spanish, French, Italian, Russian, Chinese). Using the reduced Google matrix algorithm, we determine relations and interactions of 23 society concepts and 17 religions...
The sixth wave of Ukrainian scientific emigration, which began at the end of February 2022, and the formation of new structures in the Ukrainian scientific diaspora have stimulated research on the study and preservation of Ukrainian scientific heritage abroad. However, such research and measures to preserve this scientific heritage will not be effe...
Honorifics serve as powerful linguistic markers that reflect social hierarchies and cultural values. This paper presents a large-scale, cross-linguistic exploration of usage of honorific pronouns in Bengali and Hindi Wikipedia articles, shedding light on how socio-cultural factors shape language. Using LLM (GPT-4o), we annotated 10, 000 articles of...
Every second, smart phone technologies provide potential for data explosion, streaming, and collection from heterogeneous devices. Analyzing these enormous datasets can reveal new unexplored habits and help optimise methods to city-wide applications or societal use cases. However, acquiring and handling these huge datasets offers issues in how to d...
This is a screen shot of Machine Understanding. Not only does the user interact with, in this case, Wikipedia, the Enguage app will tell you how this is done.
Sequential collaboration describes the incremental process of contributing to online collaborative projects such as Wikipedia and OpenStreetMap. After a first contributor creates an initial entry, subsequent contributors create a sequential chain by deciding whether to adjust or maintain the latest entry which is updated if they decide to make chan...
Queries to large language models (LLMs) can be divided into two parts: the instruction/question and the accompanying context. The context for retrieval-augmented generation (RAG) systems in most benchmarks comes from Wikipedia or Wikipedia-like texts which are written in a neutral and factual tone. However, when RAG systems retrieve internet-based...
Die analysierte Studie wurde von den berühmten Psychologe H.J. Eysenck durchgeführt, und, ihr Verfasser "was the living psychologist most frequently cited in the peer-reviewed scientific journal literature" (Wikipedia). Die Studie hat den Titel The Effects of Psychotherapy: An Evaluation, und wurde 1952 veröffentlicht. Der Zweck der Evaluationsstud...
Background: Millets (Nutri-cereals) are becoming more important in the contemporary global context. Diet prepared from the Millets are rich in various nutrient like different types of vitamins, minerals, dietary fiber, low in glycemic index and have gluten-free properties. The present review work has been done to compile the Millets, their properti...