Science topic

Databases - Science topic

A database is an organized collection of data. The data is typically organized to model relevant aspects of reality in a way that supports processes requiring this information.
Questions related to Databases
  • asked a question related to Databases
Question
6 answers
Suggestions of online databases/tools I can use to verify candidate genes
Relevant answer
Answer
I want to verify a list of genes, find them related to a disease I am researching on Blaise Manga Enuh
  • asked a question related to Databases
Question
9 answers
What are the best database for blood cell images for research ?
Relevant answer
Answer
Have a look at our database www.raabindata.com which have more than 40,000 free white blood cell images. Also you can access to more than 10,000 free leukemia images.
  • asked a question related to Databases
Question
13 answers
We are conducting a Systematic Literature Review and we would like to know how to merge the different results in a unique database as to easily recognise duplicates. Merging excel files seems not to be an immediate procedure.
Relevant answer
Answer
  • asked a question related to Databases
Question
4 answers
I know of several for the gas phase (e.g. HITRAN, GEISA, PNNL) but not the condensed phase.  Most seem to be proprietary databases for matching spectra, but don't allow determining absorption as a function of density or path length.
Relevant answer
Answer
Hi,
Once, I saw this database, it is quite a good one, though, I don't know if it contains the material you are looking for.
David
  • asked a question related to Databases
Question
15 answers
Hi guys,
I am looking for suggestions/recommendations from the research community regarding public databases that are most commonly used by researchers in their analysis.
Just like GEO, GTex, TCGA, Gnomad, TopMed etc, even databases from other countries besides US.
#genomics #publicdata #genomicdatabases #databases #datamining #TCGA #HCA #GTEX #GEO #ARRAYEXPRESS
Relevant answer
Answer
Congratulations for your selection of a very important ResearchGate discussion thread question, which has in recent years been generating a great deal of controversy along with lateral and longitudinal expansion from the public into the private domain.
The following article appeared in 2018 and it gives a good overview of some of the relevant issues involved in big data sharing of genomic data:
"OPINION article
Front. Public Health, 28 November 2018 | https://doi.org/10.3389/fpubh.2018.00334
Big Data Sharing: A Crucial Democratic Issue for Genomic Medicine
📷Benjamin Derbez*
  • Université de Bretagne Occidentale, Brest, France
Introduction
Big data are often viewed as responsible for major upheavals in many aspects of contemporary life (1) and in the health sector in particular (2). For instance, in medicine, big data are perceived as one of the major drivers of genomic medicine (3). Indeed, rapid genomic data collection on a large scale, made possible by the use of high-throughput sequencing technologies, has made the production of new medical knowledge possible. This knowledge has helped to improve disease prevention, risk prediction, individualized care, and patient involvement (4, 5). One of the conditions of such progress, however, is the need to create databases large enough to enable successful comparative analyses (6). While some initiatives seeking to share different national databases have been launched at the international level (7), the sharing of data between public institutions and private organizations remains a critical question.
Drawing on the example of databases of variants in breast and ovarian cancer predisposition BRCA 1–2 genes, we will show that genomic data is a techno-scientific democracy issue worth discussing. In this case, the recent evolution of patenting legislation has led to a shift from gene sequencing to the clinical interpretation of its results as the key activity of oncogenetics (8). Database access, which is necessary to estimate the risks associated with sequenced genetic variants, has become a critical issue, especially for private firms wishing to break into the market. In this context, the partial privatization of public databases, such as that of the French consortium that will be discussed later, is proof that there is a growing movement of public-private hybridization of these infrastructures. This shift, accentuated by the developments of high-throughput sequencing and genomic medicine, needs to be accompanied by reflection about the public health system user information contributing to the constitution of these databases.
Patenting genes
The controversy that shook the world of genetic cancer for years is well known. Indeed, the American company Myriad Genetics filed a patent application claiming BRCA1, BRCA2, and genetic methods of diagnosing a predisposition for breast and ovarian cancer (9, 10). Thanks to the legal ownership of these genes which had been designed as biotechnologies, the start-up from Salt Lake City sought to have a global monopoly on the hereditary breast cancer market, which was expected to experience robust growth. In the face of this offensive, institutional resistance (bringing together hospitals, ministries, associations, etc.) arose in the early 2000s in Europe and then in the United States (11). This resistance has often been interpreted as paradigmatic of the opposition between an “open science,” regulated by peers respecting the law of priority, and a “proprietary science,” regulated by the market, and respecting intellectual property (12). There was thus concern that the production of public knowledge would decline because of the legal appropriation of genes by private organizations (13).
An analysis of the British case, however, helps to get a more balanced view of this dichotomy. Indeed, (14–16) has shown that patents are perceived as legal weapons by private organizations as well as by public scientific, medical, and social institutions. Moreover, actors from private and public groups cannot be radically distinguished insofar as each defines the other in a complex network of negotiated interrelationships. In line with the studies undertaken on the role of patents in management science between academic circles and the business world (17, 18), Parthasarathy calls attention to how the NHS and Myriad reached an agreement in the early 2000s, making it possible to connect the “moral order” of the former, based on the principle of equal access to healthcare for all citizens, to the freedom of consumers valued by the latter. Among the negotiated items, it appears clearly that the issue of the transfer of data from Myriad to the NHS was essential and intended to add onto the public BRCA mutation databases. Beyond the issue of monopoly over the gene sequence through the patenting of genes or methods, this example clearly shows that the ownership of data is of crucial importance to both groups. With high-throughput sequencing technology, it has become a major issue.
Next generation sequencing
Two major developments placed the issue of the sharing of BRCA databases at the center of the debate from the 2010s. The first, naturally, was the full or partial decline in the patents claimed by Myriad Genetics around the world (19, 20). Indeed, this decline opened up the sequencing market to new private actors (GeneDx, Invitae, Pathway Genomics, Counsyl, etc.) and allowed public laboratories to carry out their activities. The second development was the progressive introduction of high-throughput DNA sequencing technology which began in the mid-2000s. The use of these “next generation” devices reinforced laboratories' analytical capacities. It is now possible to analyse within a few hours, and at the same time, several genes (panels) of several individuals, or even the complete genome of an individual at a much lower cost-100 dollars is regularly mentioned, compared to the 3 billion dollars spent in the framework of the Human Genome Project 20 years ago (21). All these developments have led stakeholders to focus on the issue of the classification of the genetic variants in BRCA genes.
A genetic variant from a sequenced individual can only acquire the status of “mutation,” i.e., the status of “pathogenic” variant, if it is clearly linked to a history of illness, either directly (in the individual or in their family) or indirectly (in a family affected by cancer and found to have the same variant). According to the current classification in genetics, the clinical significance of these variants may vary: they can be pathogenic, probably pathogenic, of unknown significance, benign, or probably benign. As (22) have pointed out, distinguishing between these categories is a major “interpretive dilemma” for geneticists. The classification of a variant in a given category depends on available data concerning the frequency of the link associating it with a specific disease. In the absence of data, the clinical significance of the variant is deemed unknown—a Variant of Unknown Significance (VUS)—until it is identified in other individuals with similar phenotypic characteristics. The importance of new DNA sequencing technologies thus lies in their ability to increase genetic databases more quickly in order to reduce the at times dramatic clinical uncertainty associated with diagnosed genetic anomalies (23). The sharing of information among geneticists, thanks to databases fed on an international scale, is a central issue1. This sharing of information, however, is now problematic.
Genomic databases
For several years now, science and technology studies have been stressing that physical infrastructure plays a central role in the production of knowledge (24–27). In this area, the study of genetic databases serves as a model (28–31). Indeed, the first molecular biology databases were launched by different public institutions around the world in the early 1980s [(32): 75]. With the spread of the Internet and the Human Genome Project in the 1990s, they quickly developed as a form of support for new open “communication regimes” between scientists, likely to encourage the emergence of new knowledge (33). However, an analysis of the construction of this information infrastructure shows that the modes of data publishing remain a major source of tension between different actors.
This tension has been highlighted by Bruno Strasser, for instance, in his study on the development of the comprehensive GenBank sequence database (32). This historian of life sciences argues that tensions linked to the different conceptions of data ownership arose from the outset of the project. Participants engaged in a “moral economy of natural history,” i.e., in a “system of values that places emphasis on the exchange of scientific knowledge” inherited from the naturalists of the eighteenth century, considered that the sequences published in scientific journals should be freely accessible data. Other participants, advocates of a “moral economy of experimentation” which has garnered momentum among molecular biologists, view sequences as the products of scientific activity and as the property of their authors. According to Strasser, GenBank embodies a form of hybridization of these two value systems. It appears that those who conceived it succeeded in taking advantage of the “ambiguity” of the very notion of “data,” owing to the fact that what seems “literally given” is at the same time “the result of an organized action” (34): 248). In the context of the Human Genome Project, this ambiguity has manifested itself in the emergence of information control modes which involve a complex interplay of revelation and concealment (35). Nowadays, as seen previously, in addition to the tensions inherent in the moral economies of science, other tensions associated with the political economy of knowledge resulting from the growing role played by private firms in the production of knowledge emerged from the early 1990s (36, 37). Beyond the question of the patentability of living organisms, it is now the question of sharing that is in front of the debate, like the case of BRCA1 and BRCA2 genes clearly shows it.
Data sharing
In the present case, i.e., the focus on BRCA1 and BRCA2 genes, there is no unique and comprehensive database of BRCA variants accessible to all professionals around the world. On the contrary, different databases developed by consortia of multinational public institutions or private organizations exist, but their access is generally limited. This is the case of the database developed by Myriad Genetics throughout the period the patents were under discussion. Although this is the largest database in the world, Myriad Genetics has exclusive access to it. This has given the company a major competitive asset in the BRCA testing market insofar as the database offers a solid basis on which to interpret results. According to genetics professionals, the main issue is not the sequencing itself. Rather, what matters most is the interpretation of the results intended to give clinical significance. This has turned out to be the most costly activity, both in terms of the recruitment of highly qualified personnel and for the development, maintenance, and access to huge databases that list the known variants of specific genes. Certain professionals estimate that there is a 1 to 10 ratio with regards to the cost of complete genome sequencing and its interpretation. In this context, ownership and the opening up of genetic variants databases emerges as a crucial issue.
From this context, the example of the future of the UMD BRCA base—Universal Mutation Database-BRCA—speaks volumes. Developed in the 1990s by a public consortium of French geneticists, it was considered to be one of the most important global databases until 2015. Driven by two major players in genetic testing in the United States [Quest Diagnosis and Laboratory Corporation of America (LabCorp)], the database was partially privatized in 2015. These two companies purchased the right to obtain access to data in exchange for funding the database. While the French sought to finance over the short- and medium-term an activity that had become too costly for public finances to sustain, the Americans' objective was to quickly be able to compete with Myriad Genetics by improving the quality of their analyses. The question that arises, then, is: How will this be handled over the long term? Will the French geneticists at the origin of the database still be able to access it? Will French patients still benefit from the knowledge generated thanks to the data they provided? What justifies this privatization if we consider the donations made by patients who agreed to have their data kept in this database? Similar questions had already been raised by the NHS during its negotiations with Myriad in the early 2000s, when the issue of the privatization of access to BRCA testing for British citizens arose (16). Questions revolving around access (currently and in the future) to genetic databases thus remain relevant.
Conclusion
At a time when the opening up of public data has become common practice in the field of administration (38), the example of the genetics of breast cancer shows that data sharing is still a major issue in research (39). The question here is the extreme overlapping of public issues and private interests. In this case, there is a need to go beyond a simple comparison between the open regimes of data publication associated with academic institutions, and the closed regimes of the privatization of knowledge developed by business communities. Hybrid forms of database ownership such as those mentioned earlier, highlight the need to pay attention to the significance given to data sharing during the initial negotiations underpinning their establishment. Once these databases are filled by voluntary citizens who provide their DNA data, data sharing becomes a crucial issue in terms of technical democracy (40). Once again, however, citizens seem to be largely absent from the debate about the ownership and use of the genomic data stored in these databases. With increased power given to major programmes seeking to collect big data in genomics, it may be time to reflect on how citizens can be informed and involved in the decisions that will be made in this area.
At the very least, it seems necessary to provide people with information about the future of their genomic data: in which databases will the data be stored? For how long? Who will be able to use them? Can they be exploited for commercial purposes by private firms? As in the field of the Internet, database contributors should be able to oppose the reuse of their “data” for the benefit of private interests. The information challenge involves the very value of consent (41).
Author Contributions
The author confirms being the sole contributor of this work and has approved it for publication.
Funding
This opinion paper is based on a research funded by the Fonds Avenir/Masfip pour la Recherche, 2016.
Conflict of Interest Statement
The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Acknowledgments
The author would like to thanks Emmanuel Rial-Sebbag, S. de Montgolfier, Pr Dominique Stoppa-Lyonnet, Pr Eric Vilain, and Dr. Zaki El Haffaf for their help and collaboration. Thank you to Catherine Davies from UBO BTU for the translation.
Footnotes
1. ^For example: Human Gene Mutation Database (HGMD) or Online Mendelian Inheritance in Man (OMIM).
References
1. Mayer-Schönberger V, Cukier K. Big Data: A Revolution That Will Transform How We Live, Work, and Think. Boston, NY: Eamon Dolan /Mariner /Houghton Mifflin Harcourt (2013).
2. Groves P, Kayyali B, Knott D, Van Kuiken S. The Big Data Revolution in Health Care. Seattle, DC: McKinsey Quarterly (2013).
Google Scholar
3. Guttmacher AE, Collins FS. Genomic medicine – a primer. N Engl J Med. (2002) 347:1512–20. doi: 10.1056/NEJMra012240
PubMed Abstract | CrossRef Full Text | Google Scholar
4. Alyass A, Turcotte M, Meyre M. From big data analysis to personalized medicine for all: challenges and opportunities. BMC Med Genomics 8:33. doi: 10.1186/s12920-015-0108-y
PubMed Abstract | CrossRef Full Text
5. Flores M, Glusman G, Brogaard K, Price ND, Hood L. P4 medicine: how systems medicine will transform the healthcare sector and society. Personal Med. (2013) 10:565–76. doi: 10.2217/PME.13.57
PubMed Abstract | CrossRef Full Text | Google Scholar
6. He KY, Ge D, He MM. Big data analytics for genomic medicine. Int J Mol Sci. 18:412. doi: 10.3390/ijms18020412
PubMed Abstract | CrossRef Full Text
7. Scollen S, Page A, Wilson J. From the data on many, precision medicine for “one”: the case for widespread genomic data sharing. Biomed Hub (2017) 2:15. doi: 10.1159/000481682
CrossRef Full Text | Google Scholar
8. Stoppa-Lyonnet D. Tests génétiques: le défi n'est plus le séquençage mais l'interprétation. Pour Sci. (2014) 439:12–3.
Google Scholar
9. Parthasarathy S. Building Genetic Medicine. Breast Cancer, Technology, and the Comparative Politics of Health Care. Cambridge, MA: MIT Press (2017).
Google Scholar
10. Sherkow JS, Greely HT. The history of patenting genetic material. Annu Rev Genet. (2015) 49:161–82. doi: 10.1146/annurev-genet-112414-054731
PubMed Abstract | CrossRef Full Text | Google Scholar
11. Cassier M, Stoppa-Lyonnet D. L'opposition contre les brevets de myriad genetics et leur révocation totale ou partielle en Europe: Premiers enseignements. Med Sci. (2005) 21:648–62. doi: 10.1051/medsci/2005216-7658
CrossRef Full Text | Google Scholar
12. Dasgupta P, David PA. Towards a new economics of science. Res Policy (1994) 23:487–521. doi: 10.1016/0048-7333(94)01002-1
CrossRef Full Text | Google Scholar
13. Orsi F, Coriat B. Are Strong Patents beneficial to Innovative Activities - Lessons from genetic testing for breast cancer controversies. Indus Corp Change (2005) 14:1205–21. doi: 10.1093/icc/dth086
CrossRef Full Text | Google Scholar
14. Parthasarathy S. The patent is political: the consequences of patenting the BRCA genes in Britain. Community Genet. (2005) 8:235–42. doi: 10.1159/000087961
PubMed Abstract | CrossRef Full Text | Google Scholar
15. Parthasarathy S. Architectures of genetic medicine: comparing genetic testing for breast cancer in the USA and UK. Soc Stud Sci. (2005) 35:5–40. doi: 10.1177/0306312705047172
PubMed Abstract | CrossRef Full Text | Google Scholar
16. Parthasarathy S. Reconceptualizing technology transfer: the challenge of building an international system of genetic testing for breast cancer. In: Guston DH, Sarewitz D, editors. Shaping Science and Technology Policy: The Next Generation of Research. Madison, WI: University of Wisconsin Press (2006).
Google Scholar
17. Huang KG, Murray FE. Does patent strategy shape the long-run supply of public knowledge: evidence from human genetics. Acad Manag J. (2009) 52:1193–221. doi: 10.5465/amj.2009.47084665
CrossRef Full Text | Google Scholar
18. Murray F. The oncomouse that roared: hybrid exchange strategies as a source of distinction at the boundary of overlapping institutions. Am J Sociol. (2010) 116:341–88. doi: 10.1086/653599
CrossRef Full Text | Google Scholar
19. Cassier M, Stoppa-Lyonnet D. Un juge fédéral et le gouvernement des États-Unis interviennent contre la brevetabilité des gènes. Med Sci. (2011) 27:662–7. doi: 10.1051/medsci/201228s204
CrossRef Full Text | Google Scholar
20. Pollack A,. Myriad Genetics Ending Patent Dispute on Breast Cancer Risk Testing, New York Times. (2015). Available online at: https://www.nytimes.com/2015/01/28/business/myriad-genetics-ending-patent-dispute-on-breast-cancer-risk-testing.html (Accessed 18/09/18)
21. Reuter. New Illumina Tech Could Usher in $100 Gene-Sequencing Era. (2017). Available online at: https://www.reuters.com/article/us-illumina-stocks/new-illumina-tech-could-usher-in-100-gene-sequencing-era-idUSKBN14U1PO (Accessed 18/09/18)
22. Timmermans S, Tietbohl C, Skaperdas E. Narrating uncertainty: variants of uncertain significance (VUS) in exome sequencing.(2016) BioSocieties 12:439–58. doi: 10.1057/s41292-016-0020-5
CrossRef Full Text | Google Scholar
23. Stivers T, Timmermans S. Negotiating the diagnostic uncertainty of genomic test results. Soc Psychol Quart. (2017) 79:199–221. doi: 10.1177/0190272516658770
CrossRef Full Text | Google Scholar
24. Star SL, Ruhleder K. Steps towards an ecology of infrastructure: design and access for large information spaces. Inform Syst Res. (1996) 7:111–34. doi: 10.1287/isre.7.1.111
CrossRef Full Text | Google Scholar
25. Star SL. The Ethnography of Infrastructure ≫. Am Behav Sci. (1999) 43:377–91. doi: 10.1177/00027649921955326
CrossRef Full Text | Google Scholar
26. Bowker G, Baker K, Millerand F, Ribes D. Towards information infrastructure studies: ways of knowing in a networked environment. In: Hunsinger J, Klastrup L, Allen M, editors. International Handbook of Internet Research Dordrecht. Dordrecht: Springer (2010). p. 97–117.
Google Scholar
27. Bowker GC, Star SL. Sorting Things Out: Classification and Its Consequences. Cambridge, MA: MIT Press (1999).
28. Brown C. The changing face of scientific discourse: analysis of genomic and proteomic database usage and acceptance. J Am Soc Inform Sci Technol. (2003) 54:926–38. doi: 10.1002/asi.10289
CrossRef Full Text | Google Scholar
29. Bowker G. Memory Practices in the Sciences. Cambridge, MA: MIT Press (2005).
30. Hine C. Databases as scientific instruments and their role in the ordering of scientific work. Soc Stud Sci. (2006) 36:269–98. doi: 10.1177/0306312706054047
CrossRef Full Text | Google Scholar
31. Dagiral E, Peerbaye A. Making knowledge in boundary infrastructures: inside and beyond a database for rare diseases. Sci Technol Stud. (2016) 29:44–61.
Google Scholar
32. Strasser B. The Experimenter's Museum: GenBank, natural history, and the moral economies of biomedicine. Isis (2011) 102:60–96. doi: 10.1086/658657
PubMed Abstract | CrossRef Full Text | Google Scholar
33. Hilgartner S. Biomolecular databases: new communication regimes for biology? Sci Commun. (1995) 17:240–63. doi: 10.1177/1075547095017002009
CrossRef Full Text | Google Scholar
34. Desrosières A. The Politics of Large Numbers. A History of Statistical Reasoning. Boston, MA: Harvard University Press (2002).
Google Scholar
35. Hilgartner S. Reordering Life: Knowledge and Control in the Genomics Revolution. Cambridge, MA: MIT Press (2017).
36. Cassier M, Gaudillière JP. Recherche, médecine et marché: la génétique du cancer du sein. Sci Soc Santé (2000) 18:29–49. doi: 10.3406/sosan.2000.1504
CrossRef Full Text | Google Scholar
37. Huang KG, Murray FE. Entrepreneurial experiments in science policy: analyzing the human genome project. Res Policy (2010) 39:567–82. doi: 10.1016/j.respol.2010.02.004
CrossRef Full Text | Google Scholar
38. Goëta S, Davies T. The daily shaping of state transparency: standards, machine-readability and the configuration of open government data policies. Sci Technol Stud. (2016) 29:10–30.
Google Scholar
39. Nature Editorial. The ups and downs of data sharing in science. pooling clinical details helps doctors to diagnose rare diseases — but more sharing is needed. Nature (2016) 534:435–6. doi: 10.1038/534435b
CrossRef Full Text | Google Scholar
40. Callon M, Lascoumes P, Barthe Y. Acting in an Uncertain World. An Essay on Technical Democracy. Boston, MA : MIT Press (2009).
Google Scholar
41. Ducournau P. The viewpoint of DNA donors on the consent procedure. New Genet Soc. (2007) 26:105–15. doi: 10.1080/14636770701218191
CrossRef Full Text | Google Scholar
Keywords: big data, genomics, BRCA, oncogenetics, database
Citation: Derbez B (2018) Big Data Sharing: A Crucial Democratic Issue for Genomic Medicine. Front. Public Health 6:334. doi: 10.3389/fpubh.2018.00334
Received: 03 July 2018; Accepted: 31 October 2018; Published: 28 November 2018.
Thomas Lefèvre, Université Paris 13, FranceEdited by:
Nicole C. Nelson, University of Wisconsin-Madison, United StatesReviewed by:
Copyright © 2018 Derbez. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Benjamin Derbez, benjamin.derbez@univ-brest.fr
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher."
This published article is also available on ResearchGate:
  • asked a question related to Databases
Question
42 answers
How to obtain currently necessary information from Big Data database systems for the needs of specific scientific research and necessary to carry out economic, business and other analyzes?
Of course, the right data is important for scientific research. However, in the present era of digitalization of various categories of information and creating various libraries, databases, constantly expanding large data sets stored in database systems, data warehouses and Big Data database systems, it is important to develop techniques and tools for filtering large data sets in those databases data to filter out of terabytes of data only information that is currently needed for the purpose of conducted scientific research in a given field of knowledge, for the purposes of obtaining answers to a given research question and for business needs, eg after connecting these databases to Business Intelligence analytical platforms. I described these issues in my scientific publications presented below.
Do you agree with my opinion on this matter?
In view of the above, I am asking you the following question:
How to obtain currently necessary information from Big Data database systems for the needs of specific scientific research and necessary to carry out economic, business and other analyzes?
Please reply
I invite you to the discussion
Thank you very much
Dear Colleagues and Friends from RG
The issues of the use of information contained in Big Data database systems for the purposes of conducting Business Intelligence analyzes are described in the publications:
I invite you to discussion and cooperation.
Best wishes
Relevant answer
Answer
Respected Doctor
Big data has three characteristics as follows:
1-Volume
It is the volume of data extracted from a source, which determines the value and capabilities of the data to be classified as big data, and by the year 2020, cyberspace will contain approximately 40,000 megabytes of data ready for analysis and information extraction.
2-Variety
It means the diversity of extracted data, which helps users, whether they are researchers or analysts, to choose the appropriate data for their field of research and includes structured data in databases and unstructured data (such as: images, clips, audio recordings, videos, SMS, call logs, and data). Maps (GPS), and require time and effort to prepare them in a suitable form for processing and analysis.
3-Velocity
It means the speed of producing and extracting data and sending it to cover the demand for it. Speed is a crucial element in making a decision based on this data, and it is the time we take from the moment this data arrives to the moment the decision is made based on it.
There are many tools and techniques that are used to analyze big data, such as: Hadoop, Map Reduce, HPCC, but Hadoop is one of the most famous of these tools. Big data is on several devices and then distributes the processing process to these devices to speed up the processing result and is returned or called as a single package. Tools that deal with big data consist of three main parts:
1- Data mining tools
2- Data Analysis Tools
3- Tools for displaying results (Dashboard).
Its use also varies statistically according to the research objectives (improving education, effectiveness of decision-making, military benefit, economic development, health management ... etc.).
greetings
Senior lecturer
Nuha hamid taher
  • asked a question related to Databases
Question
14 answers
Below are some issues related to Big Data database technologies that can be developed scientifically:
- Application of data processing technology in Big Data database systems for modern education 4.0,
- Improvement of forecasting of natural, climatic, economic, economic, financial, social etc. phenomena based on analyzing large data sets,
- Analysis of sentiment, opinions of citizens, Internet users regarding brand recognition of companies, customer reviews of specific services and products, views on various topics, citizens' worldview based on the analysis of large collections of information downloaded from various websites, from comments downloaded from social media portals,
- Analysis of information and marketing services of commercially operating companies that carry out specific analyzes of sentiment, citizens' opinions, Internet users regarding brand recognition, customer reviews of specific services and products etc. on behalf of other companies that purchase specific analytical reports,
- Analysis of the possibilities of cooperation, synergy, correlation, conducting interdisciplinary research, connecting Big Data database systems with other information technologies typical for the development of the current fourth technological revolution called Industry 4.0, which include technologies such as: cloud computing, machine learning, Internet of Things, Artificial Intelligence, etc.
In what other areas are the technologies of processing and analysis of information in Big Data database systems used?
Please answer
Best wishes
Dear Colleagues and Friends from RG
The issues of the use of information contained in Big Data database systems for the purposes of conducting Business Intelligence analyzes are described in the publications:
I invite you to discussion and cooperation.
Best wishes
Relevant answer
Answer
Dear Srdjan Atanasijevic,
Yes. You pointed to the important conditions for the development of big data analysis technology with the use of Big Data Analytics.
Thank you, Regards,
Dariusz Prokopowicz
  • asked a question related to Databases
Question
3 answers
Hi
I'm working in Diagnosis Location  of aphasia lesion of  Stroke patients and I need to  database of MRI images for aphasia Patients.
Relevant answer
Answer
Hi Driss,
You could check out the free datasets listed on www.imagingQA.com at the following link :
If you don't find what your looking for, its free to join so please open a new discussion topic (there are lots of data scientists and imaging specialists in the community) :
  • asked a question related to Databases
Question
21 answers
Hi there,
Can anyone recommend a Delphi method online tool that I can use? 
Mesydel.com has been recommended but you are not able to download it. 
I have been recommended http://armstrong.wharton.upenn.edu/delphi2/ as well although the website is less user friendly than I would like.
Any other suggestions?
Thanks.
L
Relevant answer
Answer
We have recently launched a course on Delphi method, checkout at https://researchhub.org/course/delphi-method/
  • asked a question related to Databases
Question
31 answers
What kind of scientific research dominate in the field of Big Data database systems?
Please, provide your suggestions for a question, problem or research thesis in the issues: Big Data database systems.
Please reply. I invite you to the discussion
Dear Colleagues and Friends from RG
Some of the currently developing aspects and determinants of the applications of data processing technologies in Big Data database systems are described in the following publications:
I invite you to discussion and cooperation.
Best wishes
Relevant answer
Answer
Informative question
  • asked a question related to Databases
Question
36 answers
What kind of scientific research dominate in the field of Popularization of science on the Internet?
Please, provide your suggestions for a question, problem or research thesis in the issues: Popularization of science on the Internet.
Please reply.
I invite you to the discussion
Thank you very much
Best wishes
Relevant answer
Answer
Currently any scientific topic relating closely or remotely to Covid19 is "in the buzz", extremely popular.
Vaccine and their properties and effects, propagation of the contamination, comorbidity statistics, this is on the radar of everyone in the broadest audience possible for science.
The popularity of the topics and anxiety around related issues additionallly creates a window for anti-science: fact-negationists, people opposing vaccines whereas vaccines have demonstrated huge benefits not obtainable otherwise (quasi eradication of polyomelitis, tuberculosis, etc), witchcraft recipe mongers (like a former US president who recommended injection of toilet desinfectant or something not far from it. Horrible effects to be expected).
It is likely that anything tagged "Covid19" becomes highly impopular once the pandemic will be over (thus belonging to "bad memories" not to be recalled with any pleasure).
In the longer term popular interest for Space science keeps growing, with impressive achievements from many missions, Europe, Russia, China, USA, Japan, India, etc.
This is a gem, where everyone sees common interest.
Climate science with its degree of probabilistic scenario calculation not understood by all is progressing its audience, especially with young people becoming more aware of huge risks for their future in the absence of immediate radical changes in the way we live on this planet earth.
  • asked a question related to Databases
Question
37 answers
In some countries, work in the scientific field is paid mediocre. Young people have little interest in the complex problems of science. The popularization of science is necessary. The pages of the RG publish questions and answers to these scientific questions.
Can the publication of answers to questions in the RG be considered a popularization of science?
Relevant answer
I am on the same line as Xavier Rouard.
  • asked a question related to Databases
Question
3 answers
Do you know any databases that not only specifies the plant origins of a specific phytochemical, but also demonstrates how much of that substance may be extracted from some specified parts of the plant?
I have also found this awesome website but it doesn't work at all! Beside answering my question, could you please let me know if you could get any results by searching a term in it.
Relevant answer
Answer
You can try Dr. Duke's Phytochemical and some other ethnobotanical Databases.
  • asked a question related to Databases
Question
7 answers
Hello,
I am teaching a database systems class and I wish to guide students on how a distributed databases work. 
We are using postgres DBMS for illustration.
What other applications do I need to setup a DDBMS in one windows OS machine.
Best
Derdus
Relevant answer
I would recommend Distributed Postgres using the Timescaledb extension. No DaaS provides this because the licensing prohibits it, however technology enablers such as Full Stack Engine. my firm, provide this as an implicit operator for Kubernetes. You wind up launching stacks like this:
You can do so in our cloud, which is free for life with few special conditions:
* Your research contributes to our open collective non-profit if plausible/possible
* You operate only stacks we approve of for the sake of security & performance on shared systems
Pretty lax in terms of who uses it, I'm not even tapped into the full power and it's got plenty of tenants. That could change, but for now it remains free. Just ping me and I can set you up for free with a Github Username.
Ping @large.systems on discord: https://lmg.systems/discord
Relevant:
  • asked a question related to Databases
Question
3 answers
Hello everyone,
I have been having problems with gathering all the information that I need for my study and also, some problems with fixed effects.
1) First of all, I am interested in comparing exports of Panama to certain pair countries (of Central America and some other). I am using panel data in stata (1994-2017) and variables of interest are:
1) Exportations (y)
2) Distance of the capitals
3) GDP (c_origin)
4) GDP (destination_c)
5) Population (c_origin)
6) Population (destination_c)
-->+ some dummy variables
7) common language (0,1)
8) borders between countries (0,1)
9) whether the destination country belongs to Central America (main variable of interest; 0, 1)
Is there any page where I could get most of the data? I used CEPII and could find data until 2015 for distance, language, population and GDP (but only the current one). ***I would need data for 2016 and 2017 and also a help of GDP deflator and exportations of Panama to those specific countries in a period of 1994-2017. How do I convert the GDP measured in current dollars with a help of GDP deflator and are there any pages where I already have those values? *** I first used GDP measured in current dollars and with the data I had, the outcome did not turn out okay.
2) Another problem I encountered when doing a hierarchical regression with my first "faulty" database was that, when including fixed effects of the countries, all of the variance was already explained by countries so there was no variance left for my variables of interest to explain. On the other hand, if I do not include it-the model will be biased. What is more, without any fixed effects, in some of the regression blocks I get counter intuitive correaltions/b coefficient values. Therefore, my questions are (apart from the *** sentence marked above):
a) How to solve this problem so my model is not biased and so my variables still explain significant variance of the y (exportations) variable?
b) Which fixed effects should I add to the model? Country, distance, population... which ones?
This is the first time I am using econometrics and fixed effects so your help would mean a lot!
Thank you in advance!
Relevant answer
Answer
  • asked a question related to Databases
Question
15 answers
Libraries left right and centre are cancelling print versions of academic journals and discarding their old journal stocks. When challenged, they say don't worry, this information will all be freely available on the Internet. This is not even completely true nowadays, but what is the long-term future? We are entering a Digital Dark Age, not helped by the fact we have our eyes closed as well. Dangers and factors militating against indefinite free storage include -- energy supply security; energy costs (some data centres use as much electricity as a small town); obsolete storage devices and digital formats; missing software; ephemeral recording media; planned obsolescence; unreliable or complacent custodians; malicious hackers, criminals or terrorists; politically or religiously motivated activists. Some of these points are discussed by Roger Highfield in Daily Telegraph Jan 7 2014 p25.
Relevant answer
Answer
"the reverence for the historical record of text has been carried by librarians and archivists within private and public libraries to this very day. "
My house is overfull of important scientific books and journals that I have had to rescue after being thrown out of libraries.
  • asked a question related to Databases
Question
12 answers
hi all,
how to tell R that the row names is for instance certain column, when exporting files to r using read.csv file function ?
Relevant answer
Answer
Use M.S Excel to arrange data in csv file
It is very easy
  • asked a question related to Databases
Question
7 answers
The Microsoft Jet database engine could not find the object
or
Column 'DATE' does not belong to table tbITs
The first error occurs when using a .dbf and the second error occurs when using a text file.
Any help is greatly appreciated. 
Relevant answer
Answer
I had the same problem while modeling the dam and the source in ArcSWAT.
Finally, ArcSWAT installed on ArcMap 10.2 solved the problem.
  • asked a question related to Databases
Question
4 answers
I need to do an analysis with STRUCTURE using dominant data. If you have an example of this type of database, please contact me.
Relevant answer
Answer
please , could you provide us a file sample which could be used for STRUCTURE program when dominante marker (1/0) data is analysed.
Thank you so much
  • asked a question related to Databases
Question
4 answers
Most of the publicly available databases give only the basic information like age, gender, mode of infection, etc. regarding the infected patients suffering from CoVID 19. So, can anyone recommend or suggest more specific databases related to image, speech or clinical data of the patients that are meant for open research?
  • asked a question related to Databases
Question
5 answers
I have been trying to find on a online source for FG-Net aeging database , MOPRH database and YGA database. I can't seem to find none of them available to download . Does anyone know any other ageing database that i can use? Thankyou. I need an ageing database for a my school thesis as I am building a face recognition system and classifying the faces by age . It would be very helpful.
  • asked a question related to Databases
Question
15 answers
The medical conditions include their heart and respiratory rates, systolic and diastolic pressure, etc. It would be more helpful if the dataset also includes information about usage of medications like NSAIDS and DMARDS, by the patients prior to CoVID 19 infection.
Relevant answer
Answer
John Hopkin database for COVID 19 might have the resources you are looking for,
  • asked a question related to Databases
Question
2 answers
I need database of tomato leaves for testing My algorithms.
Relevant answer
  • asked a question related to Databases
Question
18 answers
I need a dermoscopic image database to test an algorithm for automatic diagnose. In particular I am interested in images with blue-black colors within the lesion.
Can anyone tell me where to find it?
  • asked a question related to Databases
Question
6 answers
Biomedical
Medical hyperspectral imaging
Relevant answer
Answer
  • asked a question related to Databases
Question
6 answers
I need the research paper in which dataset should also be available with that so that i can start my research.
Relevant answer
Answer
I've recently published o public dataset. It contains 2690 images of lemons with annotations. For more information please visit: https://github.com/softwaremill/lemon-dataset
  • asked a question related to Databases
Question
13 answers
What in your opinion will the applications of the technology of analyzing big information collections in Big Data database systems be developed in the future?
In which areas of industry, science, research, information services, etc., in your opinion, will the applications of technology for the analysis of large collections of information in Big Data database systems be developed in the future?
Please reply
I invite you to the discussion
I described these issues in my publications below:
I invite you to discussion and cooperation.
Best wishes
Relevant answer
Answer
Dear Shafagat Mahmudova, Len Leonid Mizrah, Reema Ahmad, Shah Md. Safiul Hoque, Natesan Andiyappillai, Omar El Beggar, Tiroyamodimo Mmapadi Mogotlhwane, Thank you very much for participating in this discussion and providing inspiring and informative answers to the above question: What will Big Data be like in the future? Thank you very much for the interesting information and inspiration to continue deliberations on the above-mentioned issues. This discussion confirms the importance of the above-mentioned issues and the legitimacy of developing research on this subject. I also believe that the Big Data Analytics analytical and database technology is one of the most developing technologies included in Industry 4.0. What do you think about it?
Thank you very much and best regards,
Dariusz Prokopowicz
  • asked a question related to Databases
Question
5 answers
Currently, I am going to implement the surveying method in one of my research related to business units. Orbis database (of Company information across the globe | BvD) or similar would be useful for me to make a sample according to certain criteria and obtain contacts. My organization does not provide access to the Orbis Database. Maybe someone has access to this database and could provide me with data from it or recommend free alternatives?
Thank you in advance.
Relevant answer
Answer
The alternatives are:
However, those databases require you to have an account. Are you requiring data for private companies?
Some countries allowing you to buy the audited company report that is listed in the stock market.
  • asked a question related to Databases
Question
6 answers
I want to perform database operations in distributed database environment. If somebody have idea relating to it. Please share.
Thanks in advance.
Relevant answer
Answer
You can use Postgresxl for Relational or DGraph for Graphical Distributed Environment. Both these tools can be used in virtual or physical mode. You may use Row (Postgresql) and Column (MonetDB) stores for data partitioning labs.
  • asked a question related to Databases
Question
15 answers
To put you in the context, our work consists in realizing a machine learning model which takes a vector with the properties of a farm, includes the weather why not.Then from a database of crops, make a recommendation of the most suitable crop for the soil. Therefore a recognition on the elements which help in this decision is an important part before starting the collection of the data necessary for the model.
Relevant answer
Answer
I would like to recommend you to go through two of my articles:
1. B.K.Tripathy and Sooraj, T. R.: An Interval Valued Fuzzy Soft set Based Optimization Algorithm for High Yielding Seed Selection, International Journal of Fuzzy Sets and Applications, IGI publications, vol.7, issue 2, (2018), pp. 44 - 61.
2. B.K.Tripathy, Sooraj, T. R.: Optimization of seed selection for higher product using Interval valued Hesitant Fuzzy Soft Sets, Songklanakarin Journal of Science and Technology (SJST), 40 (5), Sep. -Oct. 2018, (2018), pp.1125-1135 .
  • asked a question related to Databases
Question
5 answers
Hello,
I wonder what would be the best database that one can use to store and manage a large amount of data (maybe a few hundred gigabytes), in the main basin level, that includes:
1) GIS data:
  • raster
  • vector
2) Hydrological data:
  • time-series of different variables (e.g., rainfall, temperature, humidity, etc.) for different stations.
From the internet, I found that the below databases could be used:
  • PostGIS
  • MongoDB
Thank you very much
Relevant answer
Answer
Moreover, consider that PostgreSQL with the spatial PostGIS extension is entirely open-source and it is developed by a strong and active community. I therefore definitely recommend you to go with PostGIS and not to use proprietary stuff such as all the ESRI products.
In addition, PostgreSQL is - despite being a relational DBMS - offering some object-oriented features such as hstores or jsonb which allows you to extend your data model - if required - to unstructured data.
  • asked a question related to Databases
Question
10 answers
Dears,
I am looking for online free sources of gridded high spatial (1 x 1 km) and temporal (hourly or tri-hourly) resolution weather/climate data to be used in my research. The spatial domain is Europe (or even the world if possible). Please, could you provide me some suggestion on the best available data sources?
Thank you for the support.
Best,
Giorgio
  • asked a question related to Databases
Question
11 answers
Dear researchers, is it possible to marge two or more RIS file from Scopus database into one RIS file.
Relevant answer
Answer
If you're using Windows, the "copy" command should help (files should be in the same folder), using a Command prompt:
copy *.ris db.ris
If you're using Ubuntu Linux and can move all the ris files inside a folder, just open a terminal in the folder and use "cat":
cat *.ris > db.ris
There are more complex commands, but you can move files in a folder easily if you search for them by extension and cut/paste.
The files are basic text files, so it works for other text files of interest (i.e. CSV).
I used this to import some stuff to my JabRef database... seems like the new version of JabRef doesn't like Drag&Drop and can't select multiple files for import.
p.s. there's a lovely extension for browsers called: BibItNow
  • asked a question related to Databases
Question
7 answers
As a part of various financial research initiative, we need different types of dataset. This question is asked for identifying the online sources of financial data both free and paid version. This information can help every researchers to locate Online Directory or data bank for financial and economic analysis.
Relevant answer
Answer
OECD, UNCTAD, Groningan, Penn World data and Federal Reserve Economic Data FRED St. Louis Fed,
  • asked a question related to Databases
Question
11 answers
Please suggest me a standard images database related Age estimation and prediction.
Relevant answer
Answer
IMDB-WIKI: (523,051)
FG-NET: (1,002)
MORPH: (55,134)
CACD: (163,446)
LAP : ( 4,691)
  • asked a question related to Databases
Question
6 answers
I have gone through many questions but there is no single Discussion thread giving the link, API, JSON for the dataset.
I am making a start so it will be easily searchable by everyone
<COUNTRY NAME> :: <TYPE - JSON/CSV/... etc...> :: <KIND OF DATA - COUNT/NETWORK INFO>
<URL/s>
Relevant answer
Answer
Italy, web, data on symptom onset, ministry of the intern: http://www.salute.gov.it/portale/nuovocoronavirus/homeNuovoCoronavirus.jsp?lingua=english
  • asked a question related to Databases
Question
5 answers
The " COVID-19 " pandemic has been an unprecedented situation with rapid research and development taking place. There has been a lot of data that is being generated like the total number of patients infected, Active case, Recovered, Deceased.
Data can be obtained from different sources, Example in India " https://www.mohfw.gov.in/ " is the official website of Govt. of India, " https://www.covid19india.org/ " is a website by a group of dedicated volunteers, then we have " https://www.worldometers.info/coronavirus/ " which is worldometer maintained by Dadax
Is there any data validation model or method to verify the data that has been put out.
There is a possibility of overstating or understating the numbers. There can be a discrepancy between sources.
Has there been a solution or discussion about this in the research community, If so what is it?
Relevant answer
Answer
Chandrasekar S.N. you have here an example of ground truth appearing, which is not the official data of a government, but the analysis from very professional independent journalists and the official institute of statistics for the UK, ONS:
-the government trailed at less than 30 000 deaths
-the ONS published data, which means 40 000 deaths
-The Guardian (and Reuters) picked it up
-the government sticked to their figure.
Is it typical? The public gets more reliable information from sources other than government?
Another case was PPE, equipment for health staff. The government said "all good" , the official voice of the NHS (health care) said "missing masks and equipment almost everywhere"
Yet another controversy: tests
The government set a target of 100 000/day. They reported earlier confusing things to avoid saying they had failed to meet this.
Puzzling is it not?
For sure we now know who should not be trusted, but it should not be that way, not in a democracy.
Or, put it otherwise, is it not transparency and accuracy of information which characterises a democracy?
  • asked a question related to Databases
Question
4 answers
CTU-UHB Intrapartum CTG database consists of CTG records and clinical information. The data in database has been extracted from the OB TraceVue system to an open format using software. I need the raw data of this database. The database contains .dat and .hea file. But i cann't find a way to extract the raw data in the database. Can anyone help me?
Relevant answer
Answer
where can i find the data dictionary for the clinical parameters for this data? I could not find on physionet site..Thanks for any help
  • asked a question related to Databases
Question
4 answers
I want to do sample project with SVM .But i can not retrieve data from ZINC database .
I need sample data from this database for classification.My method for classification is SVM.
Thanks
Relevant answer
Answer
Hi, I downloaded from zinc library on windows and on Linux systems.
for windows, I select the desired LogP and molecular weight and by download, it gives me a text file with Linux then I copied them and used it with internet download manager I downloaded approximately million compounds by this step it was giving me rar files I just decompressed them.
for Linux as II remember I used WGET program Itext was using the same text file that I get from zinc with this program.
best regards.
  • asked a question related to Databases
Question
9 answers
I want to study the relationship between different epigenetic factors and the different types of cancer using existing records in epigenetic and / or oncological databases, but, as a bioinformatician, I have never worked with epigenetics data, so I do not know they are available in what format, they require what type of preprocessing, nor what tools I can use to analyze them.
I would really appreciate if someone gave me some basic indications of how I should start, or if someone recommended me a paper or tutorial about how to work with epigenetic data in cancer bioinformatics.
Relevant answer
Answer
If you are interested in a particular gene, UCSC genome browser HAIB Methyl RRBS Track (ENCODE at UCSC Downloads Subtracks⇓ Description⇓ Contact⇓ HAIB Methyl RRBS Track Settings) is a good place, too.
  • asked a question related to Databases
Question
4 answers
Hello,
I am looking for databases of quadruplex structures (both C-based i-Motifs and G-based G-quadruplexes (G4s)).
I have only found the databases:
- G4Hunter supplementary material database (mostly all DNA G4s)
Does anybody else knows if any other source of quadruplex structures exist?
Thanks in advance.
Relevant answer
Answer
As I didn’t find any database of tetraplex structures, I decided to create my own. I then used it in the package G4-iM Grinder.
In the latest update (V1.5.9) G4-iM Grinder’s internal database (V.2.5.1) has been expanded to a total sum of 2851 already known-to-form or known-not-to-form quadruplex nucleotidic sequences.
- 2141 of these sequences form tetraplex and 710 don’t.
- 283 are i-Motifs and 2568 are G4s.
- 1858 are DNA and 993 are RNA.
This database is used by the algorithm to locate any known tetraplex within its results.
However, the database is available for anyone to use.
[GiG.DB within G4iMGrinder package]
The database also includes references for each entry and some biophysical results (which will be soon expanded).
Hope this helps.
  • asked a question related to Databases
Question
4 answers
Hello everyone
I am currently a master student in Nevşehir Hacı Bektaş Veli University, Department of Geography, studying Physical Geography. My main areas of expertise are; İn additıon to geographical analysis, plant geography, data mining, map reduce and hadoop systems, land planning, plant taxonomy, I have been working on social media analytics and social media applications and analysis, social media and geography education in scientific and technical terms. But my main focus is on "Data Mining and Plant Geography modeling". I have a technical research article on this topic in my Research gate research account as full text. Study name "Creatıon of Plant Geography Databases Wıth The Map Reduce Modelıng ın The Clusterıng of the Large Geographıcal Data Sets". This study my Map Reduce and Hadoop systems and algorithms addition to data mining, GIS, plant geography research methods - techniques and various international plant databases, taking advantage of biological databases in Turkey and vegetatıon carried out ın the world, plant geography and so on. I tried to develop a new database model with this latest work that will contribute to the fields. In conclusion, I would especially like to listen and take advantage of the ideas and opinions of my colleagues and teachers working on data mining and geography, plant geography or vegetation, especially among geographers. Thanks to everyone who contributed in advance. Sincereley.
  • asked a question related to Databases
Question
3 answers
In my investigation of the triangular relationship between international sales, international ownership and the riskiness of a stock I am looking for a database that could provide the % of foreign ownership of shares from DAX and MDAX companies between 2003-2019.
Relevant answer
Answer
It's around 42 per cent in average currently. More Dax than M-Dax. Quote me if you need a reference.
  • asked a question related to Databases
Question
4 answers
I have a list of reference SNPs IDs (rsid) and I need to retrieve the associated diseases ... what are the suitable bioinformatics tools or databases?
Relevant answer
Answer
This one is quite good
  • asked a question related to Databases
Question
3 answers
I'd like to publish some ideas about taxonomical database. This is possible to do in Biodiversity Data Journal or PeerJ, but I need to pay for an article. Does anybody know similar journals without article processing charge?
Relevant answer
Answer
  • asked a question related to Databases
Question
3 answers
For my research, I am trying to find ecological, geographic, hydrologic, social, economic and political spatial/GIS data that is preferably free and easily available. I am especially interested in layers associated with protected areas, distribution of Adivasi populations, Adivasi owned / managed lands, watersheds, dams/roads/mines/embankments, land ownership, or any other data within these fields. I would greatly appreciate some inputs/recommendations/tips as the government data.gov.in website has been difficult to navigate and almost impossible to find data on and the bhuvan website also doesn't allow data downloads.
Relevant answer
Answer
Akanksha Sharma Link Districts/ Blocks with census of India data.
  • asked a question related to Databases
Question
9 answers
Hello, can anyone help me?
When I open cif file with file>open , the software (crystal explorer 17.5) shows error processing CIF:
Error in TEXTFILE:open_new_file_for_write ... error opening new file.
I checked the cif file with CCDC checkCIF and it shows correct, opened Mercury generated cif file from my original cif file and gain the same error. I tried cif files from ccdc database but unsuccessfully. Could anyone give me a hand, please?
Relevant answer
Answer
There are many data in original cif file which makes it too heavy to ne uploaded in crystal explorer. In order to remove those data its better to open your file with mercury and save it in .cif format.
  • asked a question related to Databases
Question
3 answers
I am seeking for the minimal inhibitory concentrations of different antimicrobial agents (such as chloramphenicol, ketoconazole, nitrofurantoin, etc.). Maybe there is a database where it should be possible to find MICs against different bacteria and fungi (S. aureus, E.coli, P. aeruginosa, K.pneumoniae, C.albicans, etc.)?
Relevant answer
Answer
Many books and periodicals have methods to measure the minimum inhibitor concentration of antimicrobial agents.
  • asked a question related to Databases
Question
4 answers
I am looking for the volume of public fund to research topics over time in each country. Is there a reliable database that indexes the public funding allocation into research theme or topics (Particularly US)?
For example, the volume of public funding for "electric battery related research" over the past 30 years in the US.
Relevant answer
Answer
In the European Union its the Cordis database: https://cordis.europa.eu/ It " provides information on all EU-supported R&D activities, including programs (H2020, FP7 and older), projects, results, publications."
  • asked a question related to Databases
Question
6 answers
It could be series of images too but not single image for emotion
Relevant answer
Answer
  • asked a question related to Databases
Question
3 answers
I am currently trying to search for data on RLFS (R-loop forming sequences) for the human genome on UCSC Table Browser but I cannot find anything. Does anyone know if these data exist?
Currently I am trying to generate it myself using QmRLFS-finder, but it would be great if I could find other sources.
Thank you in advance,
Ana
Relevant answer
Answer
You can search the Genome Browser mailing list archives and/or email them directly here:
  • asked a question related to Databases
Question
8 answers
I've got a big amount of environmental data as independent variables, so I used a PCA to work better with them. But I have problems in extracting/converting the data from de Principal Components to make them work like variables in different GLMs. I'm working with R software. Can anyone help me?
Thank you,
Ferran
Relevant answer
Answer
factoextra is an R package for extract PCA data.
You can use PCA scores directly in glm model:
library(tidyverse)
set.seed(1234) #repro
df <- as_tibble(replicate(expr=rnorm(100),n=5)) #V1,V2,V3,V4,V5
pca <- FactoMineR::PCA(df[-1], graph = FALSE) #use V2:V5 in PCA
score <- as_tibble(factoextra::get_pca_ind(pca)$coord) #extract individual scores
mod <- cbind(df[1], score[1:2]) #use V1 as DV, use Dim.1:Dim.2 as IV
glm(V1~Dim.1+Dim.2, data = mod) #simple linear model
  • asked a question related to Databases
Question
10 answers
We are interested in developing method for predicting siRNA, thus we need a large set of siRNA for developing models. I will highly appreciate if you please suggest best database or databases on siRNA. This will help us in creating large dataset that may cover all experimentally characterize siRNA. Please also suggest best (latest) prediction method on siRNA. Do you think their is possibility for developing better prediction method or this field is already saturated. 
Relevant answer
Answer
Hi I wonder whether you have found the updated siRNA database?
  • asked a question related to Databases
Question
8 answers
Can you help me with some database for neuroscience, for example fMRI database, or database which show underlying mechanisms of the brain, show the connection between brain and behavior, psychiatry database and other things which related to brain, if you were familiar with genetics we have for example Reactome, KEGG, STRING and other database which show lots of pathway and cell connection, I wonder if we have sth like that in neuroscience, a big database which help us to better understand the brain.
Relevant answer
Answer
Actually you can use those databases that you mentioned to conduct research and understand mechanisms of brain activity in health and disease in computational neuroscience!
We recently used KEGG and STRING to study gene networks in different psychiatric diseases.
However, if you are mainly interested in using fMRI and imaging databases in your research, please check the following as well:
1. Blue Brain project by EPFL has a great database in at circuit level that can be used:
2. Human Connectome Project (HCP) is a great database comprised of scans and analyses of more than 1100 subjects:
3. Allen Brain Atlas
and many other databases that you may find and use depending on your main focus in neuroscience.
  • asked a question related to Databases
Question
5 answers
Hi everybody!
About 6 months ago I started a data collection thought online questionnaires sent by email. I got almost 2000 people at baseline, with 60% who agreed to be contacted again for the follow-up (online questionnaire, about 10 minutes) in order to investigate associations over time.
In your mind, considering the size of the sample, the online recruitment, what should be the response rate needed I should concretely wish to get to have a "strong" dataset for my analysis?
Under which response rate I should quit the idea to use the longitudinal information?
Is there any?
I would love to hear your idea and your experience with this matter.
Thanks in advance for sharing!
Relevant answer
Answer
Congratulations on your high response rate. I guess the follow up will be via email as well and online FU's or online recruitment is not somethin I am familiar with but I guess the main thing is to send out the FU reuqest early enough for the participant to get back to you as well. Also - we get so many emails from different companies that unless it is somethin we are expecting or someting that were familiar with we may just skip past the email without realising. So - you could send out an initial email - reminding the participants of the upcoming questionnaire and then send out another one with the link attached. In both communications, make sure you make it clear in the email subject / header that this email is regarding study / project that they agreed to participate in to jog their memory and to make it stand out from all their other emails. Hope this helps
  • asked a question related to Databases
Question
5 answers
We are looking to implement a web-based lab notebook as well as a tracking system to upload various assay results for several analogs of a parent compound. We will need to keep very close track of lot numbers, dates received, chemists who synthesized them, ect. Does anyone use a service which would be helpful?
Relevant answer
Answer
Hey
SCINOTE
  • Intuitive and easy to use
  • Inventory management and MS Office integration
  • Automatically generates reports & manuscript drafts
  • Exports all data in a readable format and API
  • Free account option
  • asked a question related to Databases
Question
11 answers
Hello, I'm trying to find the specific gene expression in various types of cancers and cancer cell lines, Is there any database in this regard?
Relevant answer
Answer
Hi
I recommend the Human Protein Altas (HPA) database which is visualized on its website.
If you search one specifc gene into HPA, the website could show the RNA expression level of the gene in various cancer cell line And different types of cancer. Moreover, the website (the Pathology category) also showed the Protein expression level (IHC) In different types of cancer (Human Cancer tissues). In addition, the website also show the cellular location of the gene you input.
With regards
  • asked a question related to Databases
Question
6 answers
Hello all,
I'm trying to test out a predictive model but I'm having a very hard time trying to find hourly precipitation data. I've looked at NOAA and on the new data repository (NCDC) but I can't find hourly or 15 minute interval data past 2014. Am I missing something here? Is there an alternative source I don't know about? If it helps typically this is the weather station I have pulled from in the past: USW00014819.
Any help is appreciated. Thanks.
Relevant answer
Answer
Success! I called the customer service line and the nice woman knew exactly what I was talking about. For whatever reason the UI they are using does not have past 2014. But on the back end where you request the data it has all the most recent data. See it here: https://www.ncdc.noaa.gov/cdo-web/datatools/lcd
  • asked a question related to Databases
Question
5 answers
Can anyone recommend a database that contains raw multispectral images with the different bands and in the same database the NDVI and NDWI index to compare the results obtained? Also, I am looking to see if I use my own multispectral images how I can compare between the vegetation and water of real plants and the NDVI and NDWI indices.
Relevant answer
Answer
these indices have a range of values between -1 and +1 so if your values do not fall within these ranges, then there's a problem. The value also depends on the vegetation situation of your land-cover types. Healthy vegetation, for instance, have very high positive NDVI values and vice versa. A good understanding of the vegetation dynamics in your study area is also necessary to know the correctness of the your NDVI or NDMI values.
  • asked a question related to Databases
Question
3 answers
I am looking for databases that contain microRNA-drug interactions. Any suggestions or recommendations?
Relevant answer
Answer
Dear Ali Akbar Jamali ,
Look the link, maybe useful.
Regards,
Shafagat
  • asked a question related to Databases
Question
5 answers
I've spent a lot of time but still could not find a quality yet public/free data for causal inference (binary treatment; e.g., A=0 or 1, non-DCD vs DCD donations) with survival data. I know it's quite specific requirement, but need one for my master degree thesis.. Of course, one might recommend the one from 'survival' package in R.
But I really want to find good, real (if possible, not too old) data.
For example, the one from this journal looks perfect (but cannot access) .
Can I get some advice or recommendation for such data?
Any comment is appreciated !
Relevant answer
Answer
Based on chapter 2 and 9 of the book called "Applied linear statistical Model" (John Netter), the regression analysis with observed data couldn't give information about cause-and-effect problems. However, if a regression analysis is performed with experimental data (experimental study), then the regression can give information about cause-and-effect relations. In such cases, the effect of latent variables on the response variable reaches its minimum due to randomization in the experimental study and the causality effect can be examined.
The survival analysis is the special kind of regression analysis and if the study is experimental, the causality inference could be analyzed.
  • asked a question related to Databases
Question
4 answers
I am processing a 16s RNA next gen sequencing data set and trying to compare between my samples the effects on organisms involved in the nitrogen cycle. I am just wondering if there is some sort of database or even a good paper that goes over all of the known organisms in the nitrogen cycle. The more detail the better but i would settle for just a simple list
Relevant answer
Answer
Thanks Shan Thomas i will check it out, looks like a more convenient solution than trolling through papers
  • asked a question related to Databases
Question
5 answers
I am carrying out my postgraduate thesis project on the extractive industry firms and their reporting practices.
Relevant answer
  • asked a question related to Databases
Question
20 answers
I'm interested in automated algae identification using neural networks. I need compose substantial micro photography dataset of algae generas (most significant of Cyanobacteria, Chlorophyta and Bacillariophyta).
Thank you in advance!
Relevant answer
Answer
Automated alge identification using neural networks is a great idea. I played with it few year ago as well :-).
Besides sources mentioned in previous answers https://atlasofcyanobacteria.com/index.php coud be another one.
In general, it is necessary to have reliable and double-verified identifications of micrographs used for machine learning.
I would be a bit cautious with mixing natural material images and photos of cultures, which may look quite different.
Good luck!
  • asked a question related to Databases
Question
3 answers
I'm looking for a food picture database to use in designing a behavioral task. I would be interested in controlling the degree of knowledge and nutritional value of the food. Thank you in advance.
Relevant answer
Answer
Peace unto you. I am not sure if I understand your question. But, for nutrition data issues, FAO's nutritiondata.com is a good food-related statistical website to start from.
  • asked a question related to Databases
Question
3 answers
I have been working on a project to collate species occurrence data inherent from unpublished student theses in an integrated database (currently published in GBIF) and still working on a systematic protocol of data validation. Expert review is really subjective and I got many findings that said "expert" estimation were not always more consistent than amateurs, student, or even public enthusiasts (feel free to message me for the papers I collected regarding this), thus my team was still struggling to find a way. Our current method is just independently evaluate the scientific names through taxonomic checklists and the geographic distribution were validated through available published literature mentioning the geographic distribution of each species. We occasionally ask experts but as we are working on many understudied taxa and geographical area, there was not many around.
Relevant answer
Answer
I suppose it all depends on your study species. For the most part, I think experts in most fields are able to identify the species they're most knowledgeable about with relatively high accuracy, given they have enough information in the photo and geographic location to do so.
It's usually when someone gets a bit overzealous and identifies something to the species level when given minimal information and just going off of an educated guess for species most likely to be in the area.
It also depends on what the question for your study is. If you're doing an SDM for a species, you could always thin the records to about 100 and then self-verify (if you're confident in your abilities to do so). You could see if the species occurrence data has any corresponding NCBI molecular data and use DNA to verify species.
If you're using a dataset of 1000 + (or some other number where it isn't feasible to self verify each account) from inaturalist, you could query the data with >3 verified ID agreements with no "maverick" or disputed IDs. The likelihood of obtaining false positives should decrease with user agreement on a species identification.
  • asked a question related to Databases
Question
3 answers
I am looking for company-level data on R&D expenses, because I would map it to M&A data in order to assess the impact of (cross-border) M&A on R&D intensity. Thank you.
Relevant answer
Answer
C K Gomathy & Stephanie Tonn Goulart Moura - thank you very much for coming back to me
  • asked a question related to Databases
Question
7 answers
I am looking for free online database of Brain Hemorrhage CT images for my research work. 
Relevant answer
Answer
Please find below a link a dataset of intracranial hemorrhage segmentation.
I know my reply is too late, but just in case, other research are looking for similar datasets.
  • asked a question related to Databases
Question
6 answers
Do we have a open-source standardized database of TB microscopic sputum smear images?
Relevant answer
Answer
TB PCR it is Gene Xpert is the latest WHO recommended diagnostic.It is 90% sensitivity test for pulmonary tuberculosis.
  • asked a question related to Databases
Question
7 answers
I Want to download complete KEGG bacterial data please guide me how i can do that?
or if any one have this data kindly send me. its not freely available for academic user too.
Relevant answer
Answer
Sorry I did not read completely the question before. I read again, and it now seems to be more stupid then it appeared before.