Article

IS THERE A REPRODUCIBILITY CRISIS?

Authors:
To read the full-text of this research, you can request a copy directly from the author.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... Áno, občasné neúspešné pokusy o replikáciu publikovaných štúdií by sme očakávali, tu však ide o väčší problém. Podľa prieskumu pre časopis Nature až 90 % oslovených vedcov odpovedalo, že v súčasnosti takouto krízou prechádzame, pričom 52 % ju označilo za naozaj vážny problém (Baker, 2016). Ak naše poznanie nie je reprodukovateľné, možno ho vôbec považovať za vedecké? ...
... Ako môže dochádzať k tomu, že aj štúdie publikované v najprestížnejších vedeckých časopisoch nadhodnocujú veľkosti efektov alebo publikujú nereplikovateľné výsledky? Trochu náhľadu do tohto problému nám môžu priniesť odpovede 1576 výskumníkov z rôznych vedných disciplín 41 na otázky z už spomínaného prieskumu pre časopis Nature (Baker, 2016 Rosenthal, 1979), keďže zásuvka (alebo dnes disk počítača) je pravdepodobne odkladiskom pre nepublikované negatívne výsledky. Nezhoda medzi skutočnosťou a publikovanými výsledkami je tiež označovaná ako publikačné skreslenie (publication bias) (Easterbrook et al., 1991). ...
... Okrem pozitivity je dôležitá novosť. Ako sme videli vo výsledkoch prieskumu pre časopis Nature (Baker, 2016), iba menšina zapojených výskumníkov niekedy skúšala publikovať neúspešné priame replikácie, pričom iba 13 % z nich uviedlo, že sa im replikáciu podarilo publikovať. Napriek tomu, že sú replikácie pre vedu nevyhnutné, pre individuálneho vedca je nevýhodné venovať čas a energiu snahe publikovať neúspešnú replikáciu, pretože rovnaká energia a zdroje by mohli ísť do publikácie nových výsledkov, ktoré by mali väčšiu prestíž. ...
... Research reproducibility-the extent to which consistent results are obtained when a scientific experiment or research workflow is repeated (Curating for Reproducibility Consortium 2017)is a key aspect of the advancement of science, as it constitutes a minimum standard that allows understanding research products, that is, methods, data, analysis, results, etc. (Piwowar 2013), to determine their reliability and generality, and eventually build up scientific knowledge and applications based on those products (King 1995;Peng 2011;Powers and Hampton 2019). In the natural sciences, rates of reproducibility are low (Ioannidis 2005;Prinz, Schlange, and Asadullah 2011), which has elicited concerns about a crisis in the field (Baker 2016). ...
... OpenTree's API services have been wrapped by the rotl R package (Michonneau, Brown, and Winter 2016) and the opentree Python module (McTavish, Sánchez Reyes, and Holder 2021). R and Python programming languages are open source and free of cost and represent two of the most widely used programming languages in the sciences today (Eglen 2009;Baker 2017). As such, rotl and opentree software packages are contributing to approachability of OpenTree's resources to R and Python users, increasing availability to a wider user base. ...
Article
Full-text available
Research reproducibility is essential for scientific development. Yet, rates of reproducibility are low. As increasingly more research relies on computers and software, efforts for improving reproducibility rates have focused on making research products digitally available, such as publishing analysis workflows as computer code, and raw and processed data in computer readable form. However, research products that are digitally available are not necessarily friendly for learners and interested parties with little to no experience in the field. This renders research products unapproachable, counteracts their availability, and hinders scientific reproducibility. To improve both short and long term adoption of reproducible scientific practices, research products need to be made approachable for learners, the researchers of the future. Using a case study within evolutionary biology, we identify aspects of research workflows that make them unapproachable to the general audience: use of highly specialized language; unclear goals and high cognitive load; and lack of trouble-shooting examples. We propose principles to improve the unapproachable aspects of research workflows, and illustrate their application using an online teaching resource. We elaborate on the general application of these principles for documenting research products and teaching materials, to provide present learners and future researchers with tools for successful scientific reproducibility.
... True scientific discovery depends on the ability to replicate findings. Some believe we have entered an era of 'replication crisis' based on too few studies attempting to replicate results, and too few results being robust enough to remain significant when replication studies are attempted (Baker, 2016;Ioannidis, 2005;Peng, 2015;Schooler, 2014). The need for replication is particularly relevant when studying schizophrenia because it is a heterogeneous disorder, both in terms of symptoms and neuropathology (van Kesteren et al., 2017;Weickert et al., 2013). ...
... The findings of the present study demonstrate the existence of a robust subgroup of schizophrenia cases with heightened inflammation and altered markers for glia, immune cells and neurogenesis in the SEZ in a second independent cohort. In light of the controversy surrounding limited replication in science (Baker, 2016;Ioannidis, 2005;Peng, 2015;Schooler, 2014), our findings support that strong replication can be achieved, albeit by the same laboratory applying the same techniques. In this study, the high-inflammation schizophrenia subgroup comprised 52 % of schizophrenia cases, which was a greater proportion than the 37 % identified within the SMRI cohort (North et al., 2021b). ...
Article
We previously identified a subgroup of schizophrenia cases (~40 %) with heightened inflammation in the neurogenic subependymal zone (SEZ) (North et al., 2021b). This schizophrenia subgroup had changes indicating reduced microglial activity, increased peripheral immune cells, increased stem cell dormancy/quiescence and reduced neuronal precursor cells. The present follow-up study aimed to replicate and extend those novel findings in an independent post-mortem cohort of schizophrenia cases and controls from Australia. RNA was extracted from SEZ tissue from 20 controls and 22 schizophrenia cases from the New South Wales Brain Tissue Resource Centre, and gene expression analysis was performed. Cluster analysis of inflammation markers (IL1B, IL1R1, SERPINA3 and CXCL8) revealed a high-inflammation schizophrenia subgroup comprising 52 % of cases, which was a significantly greater proportion than the 17 % of high-inflammation controls. Consistent with our previous report (North et al., 2021b), those with high-inflammation and schizophrenia had unchanged mRNA expression of markers for steady-state and activated microglia (IBA1, HEXB, CD68), decreased expression of phagocytic microglia markers (P2RY12, P2RY13), but increased expression of markers for macrophages (CD163), monocytes (CD14), natural killer cells (FCGR3A), and the adhesion molecule ICAM1. Similarly, the high-inflammation schizophrenia subgroup emulated increased quiescent stem cell marker (GFAPD) and decreased neuronal progenitor (DLX6-AS1) and immature neuron marker (DCX) mRNA expression; but also revealed a novel increase in a marker of immature astrocytes (VIM). Replicating primary results in an independent cohort demonstrates that inflammatory subgroups in the SEZ in schizophrenia are reliable, robust and enhance understanding of neuropathological heterogeneity when studying schizophrenia.
... Experimental studies in biology require rigorous experimental design coupled with sufficiently detailed reporting of methods to allow other scientists to replicate and extend the results. Rigor and reproducibility have become a key initiative at the US National Institutes of Health (NIH) to improve the biomedical scientific enterprise (e.g., NIH guide notice NOT-OD- [2,3]. Training in rigor and transparency to increase reproducibility is now mandated for NIH-funded graduate and postdoctoral trainees [4]. ...
Article
Full-text available
Background Circadian rhythms are important for all aspects of biology; virtually every aspect of biological function varies according to time of day. Although this is well known, variation across the day is also often ignored in the design and reporting of research. For this review, we analyzed the top 50 cited papers across 10 major domains of the biological sciences in the calendar year 2015. We repeated this analysis for the year 2019, hypothesizing that the awarding of a Nobel Prize in 2017 for achievements in the field of circadian biology would highlight the importance of circadian rhythms for scientists across many disciplines, and improve time-of-day reporting. Results Our analyses of these 1000 empirical papers, however, revealed that most failed to include sufficient temporal details when describing experimental methods and that few systematic differences in time-of-day reporting existed between 2015 and 2019. Overall, only 6.1% of reports included time-of-day information about experimental measures and manipulations sufficient to permit replication. Conclusions Circadian rhythms are a defining feature of biological systems, and knowing when in the circadian day these systems are evaluated is fundamentally important information. Failing to account for time of day hampers reproducibility across laboratories, complicates interpretation of results, and reduces the value of data based predominantly on nocturnal animals when extrapolating to diurnal humans.
... As a result, this may create a path forward towards reliable data on subjective tasks, where a high IRR is difficult to obtain, such as emotions (Wong et al., 2021) and toxicity (Wulczyn et al., 2017). With a reproducibility crisis looming in the background (Baker, 2016;Hutson, 2018), more frequent and accurate reporting of reliability is our primary safeguard (Paritosh, 2012). ...
... While long been recognized as a key feature of scientific discoveries, reproducibility has been increasingly characterized as a crisis recently [174], [175], [176]. It is becoming a primary concern in computer and information science, evidenced by the recently developed ACM policy on Artifact Review and Badging 11 and emerging efforts including seminars [178], workshops [179], reproducibility checklist 12 [180], and focused tracks at major conferences, such as ECIR [181], ACM MM [182], SIGIR 13 , and ISWC [183]. ...
Preprint
Full-text available
Recently, one critical issue looms large in the field of recommender systems -- there are no effective benchmarks for rigorous evaluation -- which consequently leads to unreproducible evaluation and unfair comparison. We, therefore, conduct studies from the perspectives of practical theory and experiments, aiming at benchmarking recommendation for rigorous evaluation. Regarding the theoretical study, a series of hyper-factors affecting recommendation performance throughout the whole evaluation chain are systematically summarized and analyzed via an exhaustive review on 141 papers published at eight top-tier conferences within 2017-2020. We then classify them into model-independent and model-dependent hyper-factors, and different modes of rigorous evaluation are defined and discussed in-depth accordingly. For the experimental study, we release DaisyRec 2.0 library by integrating these hyper-factors to perform rigorous evaluation, whereby a holistic empirical study is conducted to unveil the impacts of different hyper-factors on recommendation performance. Supported by the theoretical and experimental studies, we finally create benchmarks for rigorous evaluation by proposing standardized procedures and providing performance of ten state-of-the-arts across six evaluation metrics on six datasets as a reference for later study. Overall, our work sheds light on the issues in recommendation evaluation, provides potential solutions for rigorous evaluation, and lays foundation for further investigation.
... Req. 6 (Reproducibility). Any evaluation must be supported with information that allow its reproducibility [91], [92]. Motivation: aside from obvious reasons, it serves to approximate ε. ...
Preprint
Full-text available
Machine learning (ML) has become an important paradigm for cyberthreat detection (CTD) in the recent years. A substantial research effort has been invested in the development of specialized algorithms for CTD tasks. From the operational perspective, however, the progress of ML-based CTD is hindered by the difficulty in obtaining the large sets of labelled data to train ML detectors. A potential solution to this problem are semisupervised learning (SsL) methods, which combine small labelled datasets with large amounts of unlabelled data. This paper is aimed at systematization of existing work on SsL for CTD and, in particular, on understanding the utility of unlabelled data in such systems. To this end, we analyze the cost of labelling in various CTD tasks and develop a formal cost model for SsL in this context. Building on this foundation, we formalize a set of requirements for evaluation of SsL methods, which elucidates the contribution of unlabelled data. We review the state-of-the-art and observe that no previous work meets such requirements. To address this problem, we propose a framework for assessing the benefits of unlabelled data in SsL. We showcase an application of this framework by performing the first benchmark evaluation that highlights the tradeoffs of 9 existing SsL methods on 9 public datasets. Our findings verify that, in some cases, unlabelled data provides a small, but statistically significant, performance gain. This paper highlights that SsL in CTD has a lot of room for improvement, which should stimulate future research in this field.
... Req. 6 (Reproducibility). Any evaluation must be supported with information that allow its reproducibility [91], [92]. Motivation: aside from obvious reasons, it serves to approximate ε. ...
... Klump et al. 2021) should be considered. The problem of ensuring the reproducible analysis of physical material is not unique to archaeology, and the presence of a 'reproducibility crisis' has been discussed over a wide variety of fields (Baker 2016) and with a healthy degree of scepticism (Fanelli 2018). Much focus has been placed on data and the infrastructure to preserve it (Kansa et al. 2020), but more recently on ensuring the use of statistics and computational methods are thoroughly documented (Marwick 2017), something which even the field of artificial intelligence research has not successfully achieved (Hutson 2018). ...
... Reproducibility of scientific results has been discussed widely in the recent years, also including its perception as a 'crisis' 20 . In general, reproducibility means the ability to reproduce another scientist's results, or eventually to reproduce own results. ...
Article
Full-text available
Due to the overall high costs, technical replicates are usually omitted in RNA-seq experiments, but several methods exist to generate them artificially. Bootstrapping reads from FASTQ-files has recently been used in the context of other NGS analyses and can be used to generate artificial technical replicates. Bootstrapping samples from the columns of the expression matrix has already been used for DNA microarray data and generates a new artificial replicate of the whole experiment. Mixing data of individual samples has been used for data augmentation in machine learning. The aim of this comparison is to evaluate which of these strategies are best suited to study the reproducibility of differential expression and gene-set enrichment analysis in an RNA-seq experiment. To study the approaches under controlled conditions, we performed a new RNA-seq experiment on gene expression changes upon virus infection compared to untreated control samples. In order to compare the approaches for artificial replicates, each of the samples was sequenced twice, i.e. as true technical replicates, and differential expression analysis and GO term enrichment analysis was conducted separately for the two resulting data sets. Although we observed a high correlation between the results from the two replicates, there are still many genes and GO terms that would be selected from one replicate but not from the other. Cluster analyses showed that artificial replicates generated by bootstrapping reads produce it p values and fold changes that are close to those obtained from the true data sets. Results generated from artificial replicates with the approaches of column bootstrap or mixing observations were less similar to the results from the true replicates. Furthermore, the overlap of results among replicates generated by column bootstrap or mixing observations was much stronger than among the true replicates. Artificial technical replicates generated by bootstrapping sequencing reads from FASTQ-files are better suited to study the reproducibility of results from differential expression and GO term enrichment analysis in RNA-seq experiments than column bootstrap or mixing observations. However, FASTQ-bootstrapping is computationally more expensive than the other two approaches. The FASTQ-bootstrapping may be applicable to other applications of high-throughput sequencing.
... More generally, reproducibility is-or at least should be-a central tenet of the scientific process. However, many recent studies across a range of scientific disciplines have been found to be irreproducible, which poses a major challenge the credibility of science [83][84][85]. The chances of a given workflow or analysis being reproducible are greatly enhanced if the precise datasets employed are clearly stated, are findable and interoperable [86], full codes and/or algorithmic details are shared, and said algorithms are implemented in open source software (again with details such as version numbers indicated). ...
Article
Full-text available
Changing climate and human demographics in the world's mountains will have increasingly profound environmental and societal consequences across all elevations. Quantifying current human populations in and near mountains is crucial to ensure that any interventions in these complex social-ecological systems are appropriately resourced, and that valuable ecosystems are effectively protected. However, comprehensive and reproducible analyses on this subject are lacking. Here, we develop and implement an open workflow to quantify the sensitivity of mountain population estimates over recent decades, both globally and for several sets of relevant reporting regions, to alternative input dataset combinations. Relationships between mean population density and several potential environmental covariates are also explored across elevational bands within individual mountain regions (i.e. "sub-mountain range scale"). Globally, mountain population estimates vary greatly-from 0.344 billion (<5% of the corresponding global total) to 2.289 billion (>31%) in 2015. A more detailed analysis using one of the population datasets (GHS-POP) revealed that in �35% of mountain sub-regions, population increased at least twofold over the 40-year period 1975-2015. The urban proportion of the total mountain population in 2015 ranged from 6% to 39%, depending on the combination of population and urban extent datasets used. At sub-mountain range scale, population density was found to be more strongly associated with climatic than with topographic and protected-area variables, and these relationships appear to have strengthened slightly over time. Such insights may contribute to improved predictions of future mountain population distributions under scenarios of future climatic and demographic change. Overall, our work emphasizes that irrespective of data choices, substantial human populations are likely to be directly affected by-and themselves affect-mountainous environmental and ecological change. It thereby further underlines the urgency with which the multitudinous challenges concerning the interactions between mountain climate and human societies under change must be tackled. PLOS ONE PLOS ONE | https://doi.org/10.1371/journal.pone.
... Past efforts were stymied by a range of problems that resulted in a lack of both reproducibility (the inability to regenerate previously issued forecasts, predictions, or test results) and replicability (the inability to reach the same conclusion about a model's predictive skill from different data; Stodden et al., 2018; National Academies of Sciences, Engineering, and Medicine and others, 2019). The peer-review process was frequently insufficient to ensure these necessary standards, an experience mirrored in other empirical research fields (Baker, 2016). Meaningful prospective evaluations require sufficient data, which may take several decades or more to collect in certain regions, especially for large earthquakes. ...
Article
The Collaboratory for the Study of Earthquake Predictability (CSEP) is an open and global community whose mission is to accelerate earthquake predictability research through rigorous testing of probabilistic earthquake forecast models and prediction algorithms. pyCSEP supports this mission by providing open-source implementations of useful tools for evaluating earthquake forecasts. pyCSEP is a Python package that contains the following modules: (1) earthquake catalog access and processing, (2) representations of probabilistic earthquake forecasts, (3) statistical tests for evaluating earthquake forecasts, and (4) visualization routines and various other utilities. Most significantly, pyCSEP contains several statistical tests needed to evaluate earthquake forecasts, which can be forecasts expressed as expected earthquake rates in space–magnitude bins or specified as large sets of simulated catalogs (which includes candidate models for governmental operational earthquake forecasting). To showcase how pyCSEP can be used to evaluate earthquake forecasts, we have provided a reproducibility package that contains all the components required to re-create the figures published in this article. We recommend that interested readers work through the reproducibility package alongside this article. By providing useful tools to earthquake forecast modelers and facilitating an open-source software community, we hope to broaden the impact of the CSEP and further promote earthquake forecasting research.
... Indeed, Begley and Ellis [24] report that only 6 of 53 published findings in cancer biology could be confirmed; which Wen et al. [25] notes is "a rate approaching an alarmingly low 10% of reproducibility". Moreover, a 2016 Nature survey [26], of over 1500 scientists, found that 70% of researchers have tried but failed to reproduce another scientist's experiments, and 52% thought there was a significant 'crisis' of reproducibility. ...
Article
Full-text available
Many researchers try to understand a biological condition by identifying biomarkers . This is typically done using univariate hypothesis testing over a labeled dataset, declaring a feature to be a biomarker if there is a significant statistical difference between its values for the subjects with different outcomes. However, such sets of proposed biomarkers are often not reproducible – subsequent studies often fail to identify the same sets. Indeed, there is often only a very small overlap between the biomarkers proposed in pairs of related studies that explore the same phenotypes over the same distribution of subjects. This paper first defines the Reproducibility Score for a labeled dataset as a measure (taking values between 0 and 1) of the reproducibility of the results produced by a specified fixed biomarker discovery process for a given distribution of subjects. We then provide ways to reliably estimate this score by defining algorithms that produce an over-bound and an under-bound for this score for a given dataset and biomarker discovery process, for the case of univariate hypothesis testing on dichotomous groups. We confirm that these approximations are meaningful by providing empirical results on a large number of datasets and show that these predictions match known reproducibility results. To encourage others to apply this technique to analyze their biomarker sets, we have also created a publicly available website, https://biomarker.shinyapps.io/BiomarkerReprod/ , that produces these Reproducibility Score approximations for any given dataset (with continuous or discrete features and binary class labels).
... Reproducibility of results is coming under increasing scrutiny in the machine learning (ML) and natural language processing (NLP) fields, against the background of a perceived reproducibility crisis in science more widely (Baker 2016), and NLP/ML specifically (Mieskes et al. 2019). There have been workshops and checklist initiatives, conferences promoting reproducibility via calls, chairs' blogs and special themes, and the first shared tasks, including REPROLANG'20 (Branco et al. 2020) and ReproGen'21 (Belz et al. 2021b). ...
Article
Full-text available
Reproducibility has become an increasingly debated topic in NLP and ML over recent years, but so far, no commonly accepted definitions of even basic terms or concepts have emerged. The range of different definitions proposed within NLP/ML not only do not agree with each other, they are also not aligned with standard scientific definitions. This article examines the standard definitions of repeatability and reproducibility provided by the meta-science of metrology, and explores what they imply in terms of how to assess reproducibility, and what adopting them would mean for reproducibility assessment in NLP/ML. It turns out the standard definitions lead directly to a method for assessing reproducibility in quantified terms that renders results from reproduction studies comparable across multiple reproductions of the same original study, as well as reproductions of different original studies. The article considers where this method sits in relation to other aspects of NLP work one might wish to assess in the context of reproducibility.
... Failure to reproduce many results in the published literature is causing discussions among scientists about poor research practices (Baker 2016, Fanelli 2018. A lack of reproducibility (Glossary) hinders our ability to corroborate or falsify results, and is often associated with incomplete reporting of experimental protocols and pipelines (Munafò et al. 2017), limited data and code sharing (Tenopir et al. 2011, Culina et al. 2020) and misuse of statistics and analyses (e.g. ...
Article
Full-text available
The widespread use of species traits in basic and applied ecology, conservation and biogeography has led to an exponential increase in functional diversity analyses, with > 10 000 papers published in 2010–2020, and > 1800 papers only in 2021. This interest is reflected in the development of a multitude of theoretical and methodological frameworks for calculating functional diversity, making it challenging to navigate the myriads of options and to report detailed accounts of trait‐based analyses. Therefore, the discipline of trait‐based ecology would benefit from the existence of a general guideline for standard reporting and good practices for analyses. We devise an eight‐step protocol to guide researchers in conducting and reporting functional diversity analyses, with the overarching goal of increasing reproducibility, transparency and comparability across studies. The protocol is based on: 1) identification of a research question; 2) a sampling scheme and a study design; 3–4) assemblage of data matrices; 5) data exploration and preprocessing; 6) functional diversity computation; 7) model fitting, evaluation and interpretation; and 8) data, metadata and code provision. Throughout the protocol, we provide information on how to best select research questions, study designs, trait data, compute functional diversity, interpret results and discuss ways to ensure reproducibility in reporting results. To facilitate the implementation of this template, we further develop an interactive web‐based application (stepFD) in the form of a checklist workflow, detailing all the steps of the protocol and allowing the user to produce a final ‘reproducibility report' to upload alongside the published paper. A thorough and transparent reporting of functional diversity analyses ensures that ecologists can incorporate others' findings into meta‐analyses, the shared data can be integrated into larger databases for consensus analyses, and available code can be reused by other researchers. All these elements are key to pushing forward this vibrant and fast‐growing field of research.
... Більше як 70 % науковців намагалися відтворити експерименти своїх чи інших учених, та не змогли. За результатами дослідження, 52 % опитаних погодились із тим, що існує значна криза реплікації в науці [2]. ...
Article
У статті розкрито суть сучасної кризи реплікації в суспільних науках, до яких належить і кримінологія. Процеси глобалізації та його наслідки, які полягають у колосальній інформатизації суспільства, масовому тиражуванні наукових робіт, призводять до загострення кризи в наукових дослідженнях. Попри велику кількість праць, присвячених зазначеній проблемі, необхідно констатувати, що серед учених немає єдиної сформованої думки щодо способів розв’язання методологічної кризи. Здійснено аналіз поточного стану розвитку кримінологічних досліджень та встановлено, що сучасні дослідження потребують методологічного оновлення. Акцентовано увагу на основних проблемах відтворюваності дослідження, до яких належать об’єкт кримінологічного дослідження, який неможливо відтворити повторно, що ставить під сумнів зміст та обсяг знань про суспільство. Проведений аналіз показав, що криза реплікації вплинула на кримінологічні дослідження та кримінологічну науку загалом. Запропоновано способи розв’язання цієї проблеми в перспективних вітчизняних кримінологічних дослідженнях. Автором було визначено, що одним із напрямків оновлення методології суспільних наук і кримінології в цілому є методологія Reproducible research (відтворюваних досліджень). Проаналізовано здобутки України на шляху використання зазначеної методології та наведено приклади вже існуючих відкритих досліджень. Зроблено висновок, що в умовах швидкоплинних змін технологічної реальності саме використання таких елементів, як доступний програмний код, набори вихідних даних, їх обробка і візуалізація результатів дослідження забезпечить інтеграцію сучасних наукових концепцій про злочинність, заснованих на відкритості і прозорості кримінологічних досліджень.
... Adicionalmente, muitos autores sacrificam a generalidade pela escalabilidade, dedicando-se apenas ao caso estático, no qual os objetos estão em sua maioria parados ou, ao caso dinâmico, no qual a maioria destes está em movimento. Além disso, a falta de generalidade das soluções do estadoda-arte na academia e na indústria, acrescida da ausência de uma metodologia padrão, dificultam que resultados reportados sejam reproduzíveis por terceiros, preocupação de especial relevância dada a atual crise de reprodutibilidade [Baker 2016]. Na dissertação de mestrado [Serpa 2019] 1 abordamos as seguintes questões de pesquisa: (1) a possibilidade de geração de uma nova metodologia (ora inexistente) que fosse padrão e aberta para o desenvolvimento e análise de algoritmos de detecção de colisão broad phase [Serpa and Rodrigues 2019a]; (2) a geração de uma solução inédita, genérica e escalável, para aárea de detecção de colisão broad phase [Serpa and Rodrigues 2017]; e (3) a viabilidade de disponibilização open-source de uma ferramenta 2 contendo a plataforma de desenvolvimento implementada, visando a transferência de conhecimento para a academia, indústria e sociedade. ...
Conference Paper
Detecção de colisão é um problema computacional focado na identificação de interseções geométricas entre objetos e, em geral, relações de proximidade entre os mesmos. Apesar de sua notória relevância para várias áreas do conhecimento, poucos autores propuseram soluções simultaneamente gerais e escaláveis. Adicionalmente, não havia uma metodologia padrão, nem na academia, nem na indústria: somente modelos próprios de cenários e de análises comparativas tinham sido desenvolvidos, dificultando a reprodução e a comparação dos resultados. Neste contexto, apresentamos uma nova solução genérica e escalável para a detecção de colisão broad phase e uma nova metodologia para a análise comparativa de algoritmos, nomeada Broadmark, cujo código open-source está disponível publicamente, visando a transferência de conhecimento para a academia, indústria e sociedade. Assim, almejamos contribuir para a geração de soluções robustas e multi-facetadas aplicadas a cenários diversos e, portanto, para uma maior transparência, facilidade de modificação/extensão e reprodutibilidade dos resultados.
... A prerequisite for allowing replication is that the original researchers provide the full information to allow others to replicate their work. A comprehensive study by Baker [66] found that this requirement is far from being satisfied in the natural sciences. In addition to merely publishing the relevant information for the purpose of replication studies, it has been argued that it is essential that data should be shared on online platforms. ...
Article
Full-text available
This paper presents a philosophical examination of classical rock engineering problems as the basis to move from traditional knowledge to radical (innovative) knowledge. While this paper may appear abstract to engineers and geoscientists more accustomed to case studies and practical design methods, the aim is to demonstrate how the analysis of what constitutes engineering knowledge (what rock engineers know and how they know it) should always precede the integration of new technologies into empirical disciplines such as rock engineering. We propose a new conceptual model of engineering knowledge that combines experience (practical knowledge) and a priori knowledge (knowledge that is not based on experience). Our arguments are not a critique of actual engineering systems, but rather a critique of the (subjective) reasons that are invoked when using those systems, or to defend conclusions achieved using those systems. Our analysis identifies that rock engineering knowledge is shaped by cognitive biases, which over the years have created a sort of dogmatic barrier to innovation. It therefore becomes vital to initiate a discussion on the subject of engineering knowledge that can explain the challenges we face in rock engineering design at a time when digitalisation includes the introduction of machine algorithms that are supposed to learn from conditions of limited information.
... Those empiric criteria are thus subjected to a careful understanding of those external factors (i.e. the data at hand and the implementation in use). To that end, making data and information readily available in the form of repositories enables increased reproducibility (Baker, 2016;Pineau et al., 2021). Even then, implementations may differ due to choices made when designing them such as the programming language that was used or the data structures. ...
Thesis
A graph is a mathematical object that makes it possible to represent relationships (called edges) between entities (called nodes). Graphs have long been a focal point in a number of problems ranging from work by Euler to PageRank and shortest-path problems. In more recent times, graphs have been used for machine learning.With the advent of social networks and the world-wide web, more and more datasets can be represented using graphs. Those graphs are ever bigger, sometimes with billions of edges and billions of nodes. Designing efficient algorithms for analyzing those datasets has thus proven necessary. This thesis reviews the state of the art and introduces new algorithms for the clustering and the embedding of the nodes of massive graphs. Furthermore, in order to facilitate the handling of large graphs and to apply the techniques under study, we introduce Scikit-network, a free and open-source Python library which was developed during the thesis. Many tasks, such as the classification or the ranking of the nodes using centrality measures, can be carried out thanks to Scikit-network.We also tackle the problem of labeling data. Supervised machine learning techniques require labeled data to be trained. The quality of this labeled data has a heavy influence on the quality of the predictions of those techniques once trained. However, building this data cannot be achieved through the sole use of machines and requires human intervention. We study the data labeling problem in a graph-based setting, and we aim at describing the solutions that require as little human intervention as possible. We characterize those solutions and illustrate how they can be applied in real use-cases.
... In a wider perspective and in order to provide feedback that allows for comparison between different practices and experiences not only horizontally but also longitudinally and even internationally and across disciplines, one needs reliable and validated instruments that enable qualitative and summative assessment of practice. Such instruments can not only identify the strengths and weaknesses of an individual mentorship that enable interventions to improve practice but also overcome the deep-rooted problem of reproducibility and replicability of studies in the social sciences (e.g., Baker, 2016;Laraway et al., 2019;LeBeau et al., 2021). ...
Article
Full-text available
In the context of improving the quality of teacher education, the focus of the present work was to adapt the Mentoring for Effective Primary Science Teaching instrument to become more universal and have the potential to be used beyond the elementary science mentoring context. The adapted instrument was renamed the Mentoring for Effective Teaching Practi-cum Instrument. The new, validated instrument enables the assessment of trainee teachers’ perceived experiences with their mentors during their two-week annual teaching practicum at elementary and high schools. In the first phase, the original 34-item Mentoring for Effective Primary Sci-ence Teaching instrument was expanded to 62 items with the addition of new items and items from the previous works. All items were rephrased to refer to contexts beyond primary science teaching. Based on responses on an expanded instrument received from 105 pre-service teachers, of whom 94 were females in their fourth year of study (approx. age 22–23 years), the instrument was reviewed and shortened to 36 items classified into six dimensions: personal attributes, system requirements, pedagogical knowledge, modelling, feedback, and Information and Communication Technology due to outcomes of Principal Component and Confirmatory Factor analyses. All six dimensions of the revised instrument are unidi-mensional, with Cronbach alphas above 0.8 and factor loadings of items above 0.6. Such an instrument could be used in follow-up studies and to improve learning outcomes of teaching practice. As such, specific and general recommendations for the mentee, mentors, university lecturers, and other stakeholders could be derived from the findings to encourage reflection and offer suggestions for the future.
... Moreover, a further advantage of this perspective lies in the fact that it can help to address the major crisis that science has witnessed as a consequence of the replication failure in our field, after the findings that only 30% of all psychological experiments, although deriving from very influential theories, have been replicated. Similarly, more than half of researchers have failed to reproduce their own studies (Baker, 2016). The answer of researchers to this replicability crisis is the open science movement; this movement parallels the search for unifying broader theories and both may increase the predictive value of each micro-theory and contribute to overcome compartmentation. ...
Article
The compartmentalization of psychological science and of the profession prevents the progress of the discipline. Compartmentalization is a collateral effect of the impressive scientific, methodological, and technical development of psychology, which has led to the emergence of specialized segments of knowledge and practice that unavoidably tend to progress separately from each other and weaken their reciprocal linkage. The work highlights the limits of compartmentalization and discusses motives that call for the unity of psychology. Three approaches to unification are outlined: I) the identification of the ultimate causal explanation; II) the progressive extension of the explicative capacity of specific theories; III) the building of a metatheoretical framework. Finally, the paper proposes the intervention as the criterion to compare the capacity of the three approaches to unity. According to this criterion, approaches can be validated by reason of their ability to enable professional psychology to address the current challenges that people and society have to face.
... However, there are concerns that discourage researchers from sharing data. A key barrier in plant phenotyping is data accessibility due to the lack of established domain-specific repositories, which is limiting research reproducibility and sharing (Baker, 2016). Other challenges include a lack of requisite infrastructure for data management and storage (Lynch, 2008) and incentive systems for publishing research data (Borgman et al., 2007), and privacy and confidentiality concerns (Bizer et al., 2011;Figueiredo, 2017). ...
Article
Full-text available
The application of new technologies in scientific research, particularly automated sensing of plant phenotypic performance, has resulted in a deluge of data and raised the question of how these data can be efficiently managed and shared. Many studies have examined the benefits and constraints of data sharing in different disciplines. We focus on plant phenotyping due to the increasing volume of digital data generated in multi-disciplinary plant phenotyping research. Data sharing and reuse practices in plant phenotyping research have not been widely explored. Study results show that data sharing in plant phenotyping research occurs mostly through direct personal requests based on trust relationships and technical supplements (appendices) to publications, and researchers are willing to share data if incentives and policies are aligned to overcome the barriers. This paper provides empirical evidence to guide the establishment of incentive systems and policy frameworks that support FAIR (findability, accessibility, interoperability, and reusability) data, promote behavioral change, and enhance data sharing for the advancement of science and innovation by research communities, institutions, policymakers, and funders.
... This misapplication of inferential techniques allowed editors to favor studies with splashy results instead of those that followed sound scientific approaches but produced banal findings that failed to overturn existing paradigms. As a result, some fields are facing a severe credibility crisis because the most impactful research published in top journals cannot be replicated (Baker, 2016). Data science offers a similarly seductive sirens call, inviting scholars to deploy new tools that promise insight, but only if they can navigate the perilous obstacles that accompany the journey. ...
Article
Full-text available
This paper charts the rapid rise of data science methodologies in manuscripts published in top journals for third sector scholarship, indicating their growing importance to research in the field. We draw on critical quantitative theory (QuantCrit) to challenge the assumed neutrality of data science insights that are especially prone to misrepresentation and unbalanced treatment of sub-groups (i.e., those marginalized and minoritized because of their race, gender, etc.). We summarize a set of challenges that result in biases within machine learning methods that are increasingly deployed in scientific inquiry. As a means of proactively addressing these concerns, we introduce the “Wells-Du Bois Protocol,” a tool that scholars can use to determine if their research achieves a baseline level of bias mitigation. Ultimately, this work aims to facilitate the diffusion of key insights from the field of QuantCrit by showing how new computational methodologies can be improved by coupling quantitative work with humanistic and reflexive approaches to inquiry. The protocol ultimately aims to help safeguard third sector scholarship from systematic biases that can be introduced through the adoption of machine learning methods.
... For example, Boutard-Hunt et al. (2009) found limited evidence that natural enemy numbers (not specified, but including Coccinellidae, Neuroptera, Diptera and Anthocoridae) were higher on pepper plants treated with B. amyloliquefaciens. Saravanakumar et al. (2008) found that field applications of Pseudomonas to rice reduced leaffolder, Cnaphalocrocis medinalis (Guenée) incidence and lured its natural enemies, whereas Gadhave, Finch, et al. (2016a) found no effect of the aforementioned It is a fundamental principle of science that experiments should be repeatable (Baker, 2016), something which is particularly important in the soil environment (Bond-Lamberty et al., 2016). The work reported here was designed to be similar to our previous study (Gadhave, Finch, et al., 2016a) on a Bacillus-calabrese-cabbage aphid-natural enemy model system which was conducted in temperate conditions (UK). ...
Article
Plant growth‐promoting rhizobacteria in the genus Bacillus have been shown to reduce growth and increase parasitism of some aphids, but the generality of these interactions is unknown. All previous studies have taken place in temperate conditions. We studied the effects of seed application of three Bacillus species, singly and in mixture, on three aphid pests and their natural enemies, on field‐grown calabrese (green sprouting broccoli) in the subtropical climate of South West India. All three bacteria reduced populations of Brevicoryne brassicae and Myzus persicae, but had no effect on numbers of Lipaphis erysimi. Chewing insects (flea beetles and diamondback moth larvae) were also unaffected by the treatments. However, parasitism rates of aphids were higher on plants treated with the bacteria. We conclude that Bacillus spp. shape above ground interactions in a context‐specific manner, directly via altered field infestations of some pests and indirectly via natural enemy responses.
... As a result, this may create a path forward towards reliable data on subjective tasks, where a high IRR is difficult to obtain, such as emotions (Wong et al., 2021) and toxicity (Wulczyn et al., 2017). With a reproducibility crisis looming in the background (Baker, 2016;Hutson, 2018), more frequent and accurate reporting of reliability is our primary safeguard (Paritosh, 2012). ...
Preprint
Full-text available
Since the inception of crowdsourcing, aggregation has been a common strategy for dealing with unreliable data. Aggregate ratings are more reliable than individual ones. However, many natural language processing (NLP) applications that rely on aggregate ratings only report the reliability of individual ratings, which is the incorrect unit of analysis. In these instances, the data reliability is under-reported, and a proposed k-rater reliability (kRR) should be used as the correct data reliability for aggregated datasets. It is a multi-rater generalization of inter-rater reliability (IRR). We conducted two replications of the WordSim-353 benchmark, and present empirical, analytical, and bootstrap-based methods for computing kRR on WordSim-353. These methods produce very similar results. We hope this discussion will nudge researchers to report kRR in addition to IRR.
Article
Enzyme reactions are highly dependent on reaction conditions. To ensure reproducibility of enzyme reaction parameters, experiments need to be carefully designed and kinetic modeling meticulously executed. Furthermore, to enable quality control of enzyme reaction parameters, the experimental conditions, the modeling process as well as the raw data need to be reported comprehensively. By taking these steps, enzyme reaction parameters can be open and FAIR (findable, accessible, interoperable, re-usable) as well as repeatable, replicable and reproducible. This review discusses these requirements and provides a practical guide to designing initial rate experiments for the determination of enzyme reaction parameters and gives an open, FAIR and re-editable example of the kinetic modeling of an enzyme reaction. Both the guide and example are scripted with Python in Jupyter Notebooks and are publicly available (https://fairdomhub.org/investigations/483/snapshots/1). Finally, the prerequisites of automated data analysis and machine learning algorithms are briefly discussed to provide further motivation for the comprehensive, open and FAIR reporting of enzyme reaction parameters.
Article
Full-text available
In the modern period, Bacon and Descartes dedicated some time to talk about errors. However, by the end of the last century, and the beginning of the 21st century, the topic did not receive enough attention. Thus, we debate three different authors on error, representing two epistemic views that we are calling progressivist and traditionalist. The first author is Rescher (2007), who we take as a supporter of a more traditionalist approach, with a great contribution to the researches on the topic. The second author, a progressivist, is Allchin. He stresses the need to build a catalog of errors, so we can gradually avoid them. Our third author, Feyerabend, sees errors from this categories: small and comprehensive. Different from Allchin, Feyerabend puts more weight on issues like pluralism and the relation between error and all the theoretical structures of which it is part. Our aim, beyond exploring the views of these authors, is to see how in the 21st century they present different but contributive research on errors, although we hope to convince the reader that some of these views, as how Feyerabend contributes to this subject matter, seems to be in better harmony with science as we see nowadays.
Article
A recent study published in Science of the Total Environment conducted a systematic review of persistent, bioaccumulative, and toxic chemicals (PBTs) in insects using Web of Science Core Collection. Interestingly, a remarkable increase of human, animal, and vertebrate publications related to PBTs appeared in the early 1990s. Despite the authors' attempts to illustrate the anomalies from different perspectives, no rational explanation has been found yet. Quite interested in this abnormal phenomenon, we intend to join the academic discussion by pointing out some problems in the data retrieval and processing process in this review study and giving a more reasonable explanation for the surge of research publications in the early 1990s. Our new interpretations based on large-scale empirical data will help scholars make better use of this well-known and widely used database.
Article
Might traditional Chinese thought regarding creativity not just influence, but also enrich, contemporary European thought about the same? Moreover, is it possible that traditional Chinese thought regarding creativity might enrich contemporary thought both in a more broad, holistic sense, and more specifically regarding the nature and role of creativity as it pertains to scientific inquiry? In this paper, I elucidate why the answer to these questions is: yes. I explain in detail a classical Chinese conception of creativity rooted in Zhuangist philosophy and which centrally involves spontaneity engendered by embracing yóu遊, or “wandering”, rather than novelty or originality (even if processes or products that issue from such spontaneity very often are, or strike us as, novel or original). I then illustrate how this conception of creativity can be used to enrich contemporary thought regarding the nature and role of creativity both in general and as it pertains to scientific inquiry in particular, as well as how to engender creativity, by arguing that it might allow us to: i) more easily remove what is frequently an obstacle to creativity (viz., that of striving for novelty or originality, or even creativity itself, whatever it is taken to involve), and; ii) better understand creative agents as being more intimately connected with, and as processes within and products of, their environments (and thus better promote both extraordinary and ordinary creativity). Finally, I conclude by briefly remarking on how exploring various cultural perspectives on creativity promises to help us to better comprehend and promote creativity, by encouraging us to become more creative about creativity itself.
Article
Cyber-Physical Systems (CPS) refer to systems where some intelligence is embedded into devices that interact with their environment. Using wireless technology in such systems is desirable for better flexibility, improved maintainability, and cost reduction, among others. Moreover, CPS applications often specify deadlines; that is, maximal tolerable delays between the execution of distributed tasks. Systems that guarantee to meet such deadlines are called real-time systems. In the past few years, a technique known as synchronous transmissions (ST) has been shown to enable reliable and energy efficient communication, which is promising for the design of real-time wireless CPS. We identify at least three issues that limit the adoption of ST in this domain: (i) ST is difficult to use due to stringent time synchronization requirements (in the order of μs). There is a lack of tools to facilitate the implementation of ST by CPS engineers, which are often not wireless communication experts. (ii) There are only few examples showcasing the use of ST for CPS applications and academic works based on ST tend to focus on communication rather than applications. Convincing proof-of-concept CPS applications are missing. (iii) The inherent variability of the wireless environment makes performance evaluation challenging. The lack of an agreed-upon methodology hinders experiment reproduciblility and limits the confidence in the performance claims. This paper synthesizes recent advances what address these three problems, thereby enabling significant progress for future applications of low-power wireless technology in real-time CPS.
Article
Full-text available
New tools enable new ways of working, and materials science is no exception. In materials discovery, traditional manual, serial, and human-intensive work is being augmented by automated, parallel, and iterative processes driven by Artificial Intelligence (AI), simulation and experimental automation. In this perspective, we describe how these new capabilities enable the acceleration and enrichment of each stage of the discovery cycle. We show, using the example of the development of a novel chemically amplified photoresist, how these technologies’ impacts are amplified when they are used in concert with each other as powerful, heterogeneous workflows.
Article
Biofilms are widely recognised as a contributing factor in significant problems currently facing human health and industry. The following paper summarises a round table forum held at the 2021 International Biodeterioration and Biodegradation Symposium which discussed the potential role of standards in biofilm research and industry innovation. Standards and other forms of best-practice guidance are reviewed in an academic research context as well as in relation to industry impacts and product development. The understanding of fundamental aspects of biofilms is rapidly evolving, driven in part by new analytical methods. However, the complex and multidisciplinary nature of biofilm-associated problems and typically limited training available for industry personnel tackling the associated issues often reduces the ability to provide best-practice solutions. As such it is argued that more effort needs to be made by both academia and industry experts to provide consensus and associated documentation on standard test methods or guidance documents related to studying and combating biofilms.
Article
The reliability of some published research from well-funded disciplines of medicine and psychology has been brought into question. This is because some researchers failed to achieve consistent results after replicating published studies using the same methodology. Researchers have referred to this as the ‘replicability in science crisis’ and have identified several practices contributing to unreliable science. Protected area and other conservation researchers are unlikely to be immune from these poor practices given they use the same scientific approaches as other disciplines. Fortunately, there are solutions to the poor practices contributing to unreliable science. In this paper I identify those poor practices and describe solutions as identified by researchers from a range of disciplines. These solutions are transferable to protected area science and related conservation disciplines. Most solutions are not costly or demanding to implement. Adopting these solutions can improve the reliability of both published and unpublished research.
Article
Full-text available
With the growing concerns about research reproducibility and replicability, the assessment of scientific results’ fragility (or robustness) has been of increasing interest. The fragility index was proposed to quantify the robustness of statistical significance of clinical studies with binary outcomes. It is defined as the minimal event status modifications that can alter statistical significance. It helps clinicians evaluate the reliability of the conclusions. Many factors may affect the fragility index, including the treatment groups in which event status is modified, the statistical methods used for testing for the association between treatments and outcomes, and the pre-specified significance level. In addition to assessing the fragility of individual studies, the fragility index was recently extended to both conventional pairwise meta-analyses and network meta-analyses of multiple treatment comparisons. It is not straightforward for clinicians to calculate these measures and visualize the results. We have developed an R package called “fragility” to offer user-friendly functions for such purposes. This article provides an overview of methods for assessing and visualizing the fragility of individual studies as well as pairwise and network meta-analyses, introduces the usage of the “fragility” package, and illustrates the implementations with several worked examples.
Article
Gas-separation membranes are a critical industrial component for a low-carbon and energy-efficient future. As a result, many researchers have been testing membrane materials over the past several decades. Unfortunately, almost all membrane-based testing systems are home-built, and there are no widely accepted material standards or testing protocols in the literature, making it challenging to accurately compare experimental results. In this multi-lab study, ten independent laboratories collected high-pressure pure-gas permeation data for H2, O2, CH4, and N2 in commercial polysulfone (PSf) films. Equipment information, testing procedures, and permeation data from all labs were collected to provide (1) accepted H2, O2, CH4, and N2 permeability values at 35 °C in PSf as a reference standard, (2) statistical analysis of lab-to-lab uncertainties in evaluating permeability, and (3) a list of best practices for sample preparation, equipment set-up, and permeation testing using constant-volume variable-pressure apparatuses. Results summarized in this work provide a reference standard and recommended testing protocols for pure-gas testing of membrane materials.
Article
The prevailing view in the current replication crisis literature is that the non-replicability of published empirical studies (a) confirms their untrustworthiness, and (b) the primary source of that is the abuse of frequentist testing, in general, and the p-value in particular. The main objective of the paper is to challenge both of these claims and make a case that (a) non-replicability does not necessarily imply untrustworthiness and (b) the abuses of frequentist testing are only symptomatic of a much broader problem relating to the uninformed and recipe-like implementation of statistical modeling and inference that contributes significantly to untrustworthy evidence. It is argued that the crucial contributors to the untrustworthiness relate (directly or indirectly) to the inadequate understanding and implementation of the stipulations required for model-based statistical induction to give rise to trustworthy evidence. It is argued that these preconditions relate to securing reliable ‘learning from data’ about phenomena of interest and pertain to the nature, origin, and justification of genuine empirical knowledge, as opposed to beliefs, conjectures, and opinions.
Book
Full-text available
Bu kitap yaklaşık son on yıldır var olan, üzerinde daha fazla düşünülerek evrilmiş ve gelişen teknolojinin desteğini de alarak daha önce var olmayan birçok imkân sunan açık bilim pratikleri üzerine yazılmıştır. Açık bilim ve pratikleri birçok araştırmacı tarafından bilinmemektedir ve tecrübeli araştırmacılara dahi yabancı bir kavram olabilmektedir. Açık bilim prensiplerini teorik olarak bilen ama nasıl uygulanacağı konusunda tereddütte olan araştırmacılar da vardır ki çok kısa bir geçmişi olan bir kavram olduğu düşünüldüğünde bu pek şaşırtıcı değildir. Bu kitabı yazma amacımız açık bilim prensip ve uygulamalarını hem teorik hem de pratik olarak tanıtmak ve bu sayede araştırmacıların kendi bilimsel araştırmalarında bu pratiklerden faydalanmasını sağlamaktır. Bu kitap yüksek lisans ve üzeri eğitim almış tüm araştırmacıların faydalanabileceği bir kaynaktır.
Article
A key issue in science is assessing robustness to data analysis choices, while avoiding selective reporting and providing valid inference. Specification Curve Analysis is a tool intended to prevent selective reporting. Alas, when used for inference it can create severe biases and false positives, due to wrongly adjusting for covariates, and mask important treatment effect heterogeneity. As our motivating application, it led an influential study to conclude there is no relevant association between technology use and teenager mental well‐being. We discuss these issues and propose a strategy for valid inference. Bayesian Specification Curve Analysis (BSCA) uses Bayesian Model Averaging to incorporate covariates and heterogeneous effects across treatments, outcomes and subpopulations. BSCA gives significantly different insights into teenager well‐being, revealing that the association with technology differs by device, gender and who assesses well‐being (teenagers or their parents).
Preprint
Full-text available
Phylogenetic models have become increasingly complex and phylogenetic data sets larger and richer. Yet inference tools lack a model specification language that succinctly describes a full phylogenetic analysis independently of implementation details. We present a new lightweight and concise model specification language, called ‘LPhy’, that is both human and machine readable. ‘LPhy’ is accompanied by a graphical user interface for building models and simulating data using this new language, as well as for creating natural language narratives describing such models. These narratives can form the basis of manuscript method sections. We also introduce a command-line interface for converting LPhy-specified models into analysis specification files (in XML format) to be used alongside the BEAST2 software platform. Together, these tools will clarify the description and reporting of probabilistic models in phylogenetic studies, and improve result reproducibility. Author summary We describe a succinct domain-specific language to accurately specify the details of a phylogenetic model for the purposes of reproducibility or reuse. In addition, we have developed a graphical software package that can be used to construct and simulate data from models described in this new language, as well as create natural language narratives that can form the basis of a description of the model for the method section of a manuscript. Finally, we report on a command-line program that can be used to generate input files for the BEAST2 software package based on a model specified in this new language. These tools together should aid in the goal of reproducibility and reuse of probabilistic phylogenetic models.
Article
Full-text available
Reproducibility, the extent to which consistent results are obtained when an experiment or study is repeated, sits at the foundation of science. The aim of this process is to produce robust findings and knowledge, with reproducibility being the screening tool to benchmark how well we are implementing the scientific method. However, the re-examination of results from many disciplines has caused significant concern as to the reproducibility of published findings. This concern is well-founded—our ability to independently reproduce results build trust within the scientific community, between scientists and policy makers, and the general public. Within geoscience, discussions and practical frameworks for reproducibility are in their infancy, particularly in subsurface geoscience, an area where there are commonly significant uncertainties related to data (e.g., geographical coverage). Given the vital role of subsurface geoscience as part of sustainable development pathways and in achieving Net Zero, such as for carbon capture storage, mining, and natural hazard assessment, there is likely to be increased scrutiny on the reproducibility of geoscience results. We surveyed 346 Earth scientists from a broad section of academia, government, and industry to understand their experience and knowledge of reproducibility in the subsurface. More than 85% of respondents recognised there is a reproducibility problem in subsurface geoscience, with >90% of respondents viewing conceptual biases as having a major impact on the robustness of their findings and overall quality of their work. Access to data, undocumented methodologies, and confidentiality issues (e.g., use of proprietary data and methods) were identified as major barriers to reproducing published results. Overall, the survey results suggest a need for funding bodies, data providers, research groups, and publishers to build a framework and a set of minimum standards for increasing the reproducibility of, and political and public trust in, the results of subsurface studies.
Article
The majority of attempts to reconstruct the evolutionary history of the hominin clade proceed as if the hominin fossil record is a precise, accurate, and comprehensive record of human evolutionary history. In this contribution we review the various ways in which the apparent scarcity of early hominins on the landscape means that the existing hominin fossil record almost certainly falls short of this assumption, especially with respect to taxic diversity, as well as the spatial and temporal distribution of known taxa. We also suggest that interpretations of the hominin fossil record are particularly affected by practices that likely violate the principles of reproducibility, as well as by confirmation bias. The hominin fossil record should be seen for what it is; an incomplete record of human evolutionary history that limits what should be said about it. Generated narratives should be treated as heuristic devices, not as accurate and comprehensive descriptions of past events.
Conference Paper
Full text: https://tgroechel.github.io/publications/ROMAN_Meta.pdf ; Abstract— Study reproducibility and generalizability of results to broadly inclusive populations is crucial in any research. Previous meta-analyses in HRI have focused on the consistency of reported information from papers in various categories. However, members of the HRI community have noted that much of the information needed for reproducible and generalizable studies is not found in published papers. We address this issue by surveying the reported study metadata over the main proceedings of the 2021 IEEE International Conference on Robot & Human Interactive Communication (RO-MAN) and the past three years (2019 through 2021) of the main proceedings of the International Conference on Human Robot Interaction (HRI) and alt.HRI. Based on the analysis results, we propose a set of recommendations for the HRI community that follow the longer-standing reporting guidelines from human-computer interaction (HCI), psychology, and other fields most related to HRI. Finally, we examine two key areas for user study reproducibility: recruitment details and participant compensation. We find a lack of reporting of both of these study metadata categories: of the 416 studies across both conferences and all years, 258 studies failed to report recruitment method and 255 studies failed to report compensation. This work provides guidance about specific types of needed reporting improvements for the field of HRI.
Article
Decentralized science (DeSci) is a hot topic emerging with the development of Web3 or Web3.0 and decentralized autonomous organizations (DAOs) and operations. DeSci fundamentally differs from the centralized science (CeSci) and Open Science (OS) movement built in the centralized way with centralized protocols. It changes the basic structure and legacy norms of current scientific systems via reshaping the cooperation mode, value system, and incentive mechanism. As such, it can provide a viable path for solving bottleneck problems in the development of science, such as oligarchy, silos, and so on, and make science more fair, free, responsible, and sensitive. However, DeSci itself still faces many challenges, including scaling, balancing the quality of participants, system suboptimal loops, lack of accountability mechanism, and so on. Taking these into consideration, this article presents a systematic introduction of DeSci, proposes a novel reference model with a six-layer architecture, addresses the potential applications, and also outlines the key research directions in this emerging field. This article is committed to providing helpful guidance and reference for future research efforts on DeSci.
Article
Full-text available
Much attention has been paid to reproducibility and replicability (R&R) in studies of GIS and spatial analysis for ensuring the advancement of research transparency. Reproducibility refers to an ability of a researcher to get the same results of a prior study through the same data and method used by the original researcher. Replicability refers to an ability of a research to get the similar/same results of a prior study through the same method with the newly collected data. In this study, we defined the five steps for the R&R, and assess the previous studies published in Journal of the Korean Geographical Society. Also, we investigated the submission guidelines for the Korean journals indexed in the SCIE, SSCI, A&HCI, SCOPUS, regarding the data and code sharing policy. This study highlights an importance of establishing the guidelines for the R&R and educational efforts on the advancement of research transparency.
Article
Survey measurement scales are expected to be stable – to generate the same values across two timepoints and under unchanged conditions. In scale development, stability is assessed by calculating a scale's test-retest reliability – a prerequisite to validity. Yet, a systematic review shows that test-retest reliability values are reported for only 23% of newly developed scales and typically assessed only at aggregate level – based on scale-level or subscale level scores. This study (1) demonstrates how (sub)scale-level test-retest reliability indicators can conceal a lack of response stability at item level and (2) proposes a complementary protocol for assessing item-level response stability. Assessing stability at both item- and scale-level ensures that only stable items are included in a scale, which, in turn, increases the reliability and validity of the scale and contributes to the replicability of findings in the social sciences.
Article
In the last few decades, architecture has experienced paradigm shifts prompted by new computational tools. Algorithmic design (AD), a design approach based on algorithms, is one such example. However, architectural design practice is strongly based on visual and spatial reasoning, which is not easy to translate into algorithmic descriptions. Consequently, even using tailored AD tools, AD programs are generally hard to understand and develop, independently of one’s programming abilities. To address this problem, we propose a methodology and a design environment to support AD in a way that is more akin to the workflow typically employed by architects, who represent their ideas mostly through sketches and diagrams. The design environment is implemented as a computational notebook, with the ability to intertwine code, textual descriptions, and visual documentation in an integrated storytelling experience that helps architects read and write AD programs.
Article
Full-text available
Background Patient-derived induced pluripotent stem cells (iPSCs) are an innovative source as an in vitro model for neurological diseases. Recent studies have demonstrated the differentiation of brain microvascular endothelial cells (BMECs) from various stem cell sources, including iPSC lines. However, the impact of the culturing conditions used to maintain such stem cell pluripotency on their ability to differentiate into BMECs remains undocumented. In this study, we investigated the effect of different sources of Matrigel and stem cell maintenance medium on BMEC differentiation efficiency. Methods The IMR90-c4 iPSC line was maintained on mTeSR1 or in essential-8 (E-8) medium on growth factor-reduced (GFR) Matrigel from three different manufacturers. Cells were differentiated into BMECs following published protocols. The phenotype of BMEC monolayers was assessed by immunocytochemistry. Barrier function was assessed by transendothelial electrical resistance (TEER) and permeability to sodium fluorescein, whereas the presence of drug efflux pumps was assessed by uptake assay using fluorescent substrates. Results Stem cell maintenance medium had little effect on the yield and barrier phenotype of IMR90-derived BMECs. The source of GFR-Matrigel used for the differentiation process significantly impacted the ability of IMR90-derived BMECs to form tight monolayers, as measured by TEER and fluorescein permeability. However, the Matrigel source had minimal effect on BMEC phenotype and drug efflux pump activity. Conclusion This study supports the ability to differentiate BMECs from iPSCs grown in mTeSR1 or E-8 medium and also suggests that the origin of GFR-Matrigel has a marked inpact on BMEC barrier properties.
Article
Full-text available
C. Glenn Begley and Lee M. Ellis propose how methods, publications and incentives must change if patients are to benefit.
Article
In vitro and in vivo activities against Trypanosoma cruzi were evaluated for two sesquiterpene lactones: psilostachyin A and cynaropicrin. Cynaropicrin had previously been shown to potently inhibit African trypanosomes in vivo, and psilostachyin A had been reported to show in vivo effects against T. cruzi, albeit in another test design. In vitro data showed that cynaropicrin was more effective than psilostachyin A. Ultrastructural alterations induced by cynaropicrin included shedding events, detachment of large portions of the plasma membrane, and vesicular bodies and large vacuoles containing membranous structures, suggestive of parasite autophagy. Acute toxicity studies showed that one of two mice died at a cynaropicrin dose of 400 mg/kg of body weight given intraperitoneally (i.p.). Although no major plasma biochemical alterations could be detected, histopathology demonstrated that the liver was the most affected organ in cynaropicrin-treated animals. Although cynaropicrin was as effective as benznidazole against trypomastigotes in vitro, the treatment (once or twice a day) of T. cruzi-infected mice (up to 50 mg/kg/day cynaropicrin) did not suppress parasitemia or protect against mortality induced by the Y and Colombiana strains. Psilostachyin A (0.5 to 50 mg/kg/day given once a day) was not effective in the acute model of T. cruzi infection (Y strain), reaching 100% animal mortality. Our data demonstrate that although it is very promising against African trypanosomes, cynaropicrin does not show efficacy compared to benznidazole in acute mouse models of T. cruzi infection.