Science topic

Databases - Science topic

A database is an organized collection of data. The data is typically organized to model relevant aspects of reality in a way that supports processes requiring this information.
Questions related to Databases
  • asked a question related to Databases
Question
2 answers
How are the AMRFinderPlus and CARD different from each other for predication of AMR genes from bacterial genomic sequences?
How much overlap do AMRFinderPlus and CARD database have?
Relevant answer
Answer
Anuradha Goswami Thank you for the information.
  • asked a question related to Databases
Question
8 answers
I have some difficulties in finding a good number of DICOM files for SSM...
Relevant answer
Answer
Here are some websites where you can find free DICOM images:
  1. The Cancer Imaging Archive (TCIA):Website: https://www.cancerimagingarchive.net/ Description: TCIA provides a large collection of medical images, including DICOM images, related to cancer research. It includes CT, MRI, and other imaging modalities.
  2. Radiological Society of North America (RSNA) Clinical Trials Data:Website: https://www.rsna.org/en/research/clinical-trials-data Description: RSNA provides public access to anonymized imaging data from clinical trials, including DICOM images from CT, MRI, and other modalities.
  3. Open Access Biomedical Imaging Resource (OABR):Website: https://openabm.org/ Description: OABR is an open-access platform that provides biomedical imaging data, including DICOM images from various imaging modalities.
  4. The National Cancer Institute (NCI) Imaging Data Commons:Website: https://portal.gdc.cancer.gov/ Description: NCI's Imaging Data Commons contains a collection of cancer imaging data, including DICOM images from various imaging studies.
  5. MosMedData: Moscow Chest CT Dataset:Website: https://mosmed.ai/en/datasets/covid19 Description: MosMedData provides a large-scale chest CT dataset with over 7,000 anonymized CT scans of patients with pulmonary diseases.
  6. COVID-19 Image Data Collection:Website: https://github.com/ieee8023/covid-chestxray-dataset Description: This dataset contains chest X-ray and CT images of COVID-19 patients, as well as images from other respiratory conditions and healthy individuals.
  7. ImageCLEF - Multimedia Retrieval in CLEF:Website: https://www.imageclef.org/ Description: ImageCLEF offers medical imaging datasets for research, including CT and MRI data.
  • asked a question related to Databases
Question
4 answers
This is advised nowadays to submit a dataset to a publicly available repository (eg. Mendeley) before publishing a paper done on these data. Can I reuse such a repository dataset to publish my second paper? Can anybody else use my dataset and publish his/her paper basing on my dataset without my acceptance?
Relevant answer
Answer
Yes, Michal S. Karbownik you can reuse your published dataset, if you're able to generate some new insights from it. Also, others can use your dataset to drive research. However, other researcher's usage rights depends on the Creative Commons (CC) license that you've signed during the upload. You can go through all CC licenses, here: About CC Licenses - Creative Commons .
All in all, I would like to say that publishing your data is a good practice and valuable scientific contribution. Everyone must follow it. FAIR Principles - GO FAIR (go-fair.org) is a wonderful movement in this path.
  • asked a question related to Databases
Question
5 answers
I am looking for the volume of public fund to research topics over time in each country. Is there a reliable database that indexes the public funding allocation into research theme or topics (Particularly US)?
For example, the volume of public funding for "electric battery related research" over the past 30 years in the US.
Relevant answer
Answer
Dear Dr. Hajikhani!
I would say also OECD is a decent resource of knowledge:
a) OECD Main Science and Technology Indicators (2022). Available at:
b) OECD (2023), Gross domestic spending on R&D (indicator). doi: 10.1787/d8b068b4-en (Accessed on 14 January 2023) Available at:
Yours, Bulcsu Szekely
  • asked a question related to Databases
Question
4 answers
I'm looking for a food picture database to use in designing a behavioral task. I would be interested in controlling the degree of knowledge and nutritional value of the food. Thank you in advance.
Relevant answer
Answer
Hi,
I'm trying to build one for fruits.
I've started with images of fruits; then, I've added more information (like content) for each fruit, vegetable, etc.
Here is the paper:
But it is not complete yet ... I'm still adding more information to it.
Best regards,
Mihai
  • asked a question related to Databases
Question
6 answers
Suggestions of online databases/tools I can use to verify candidate genes
Relevant answer
Answer
I want to verify a list of genes, find them related to a disease I am researching on Blaise Manga Enuh
  • asked a question related to Databases
Question
9 answers
What are the best database for blood cell images for research ?
Relevant answer
Answer
Have a look at our database www.raabindata.com which have more than 40,000 free white blood cell images. Also you can access to more than 10,000 free leukemia images.
  • asked a question related to Databases
Question
13 answers
We are conducting a Systematic Literature Review and we would like to know how to merge the different results in a unique database as to easily recognise duplicates. Merging excel files seems not to be an immediate procedure.
Relevant answer
Answer
  • asked a question related to Databases
Question
4 answers
I know of several for the gas phase (e.g. HITRAN, GEISA, PNNL) but not the condensed phase.  Most seem to be proprietary databases for matching spectra, but don't allow determining absorption as a function of density or path length.
Relevant answer
Answer
Hi,
Once, I saw this database, it is quite a good one, though, I don't know if it contains the material you are looking for.
David
  • asked a question related to Databases
Question
16 answers
Hi guys,
I am looking for suggestions/recommendations from the research community regarding public databases that are most commonly used by researchers in their analysis.
Just like GEO, GTex, TCGA, Gnomad, TopMed etc, even databases from other countries besides US.
#genomics #publicdata #genomicdatabases #databases #datamining #TCGA #HCA #GTEX #GEO #ARRAYEXPRESS
Relevant answer
Answer
Congratulations for your selection of a very important ResearchGate discussion thread question, which has in recent years been generating a great deal of controversy along with lateral and longitudinal expansion from the public into the private domain.
The following article appeared in 2018 and it gives a good overview of some of the relevant issues involved in big data sharing of genomic data:
"OPINION article
Front. Public Health, 28 November 2018 | https://doi.org/10.3389/fpubh.2018.00334
Big Data Sharing: A Crucial Democratic Issue for Genomic Medicine
📷Benjamin Derbez*
  • Université de Bretagne Occidentale, Brest, France
Introduction
Big data are often viewed as responsible for major upheavals in many aspects of contemporary life (1) and in the health sector in particular (2). For instance, in medicine, big data are perceived as one of the major drivers of genomic medicine (3). Indeed, rapid genomic data collection on a large scale, made possible by the use of high-throughput sequencing technologies, has made the production of new medical knowledge possible. This knowledge has helped to improve disease prevention, risk prediction, individualized care, and patient involvement (4, 5). One of the conditions of such progress, however, is the need to create databases large enough to enable successful comparative analyses (6). While some initiatives seeking to share different national databases have been launched at the international level (7), the sharing of data between public institutions and private organizations remains a critical question.
Drawing on the example of databases of variants in breast and ovarian cancer predisposition BRCA 1–2 genes, we will show that genomic data is a techno-scientific democracy issue worth discussing. In this case, the recent evolution of patenting legislation has led to a shift from gene sequencing to the clinical interpretation of its results as the key activity of oncogenetics (8). Database access, which is necessary to estimate the risks associated with sequenced genetic variants, has become a critical issue, especially for private firms wishing to break into the market. In this context, the partial privatization of public databases, such as that of the French consortium that will be discussed later, is proof that there is a growing movement of public-private hybridization of these infrastructures. This shift, accentuated by the developments of high-throughput sequencing and genomic medicine, needs to be accompanied by reflection about the public health system user information contributing to the constitution of these databases.
Patenting genes
The controversy that shook the world of genetic cancer for years is well known. Indeed, the American company Myriad Genetics filed a patent application claiming BRCA1, BRCA2, and genetic methods of diagnosing a predisposition for breast and ovarian cancer (9, 10). Thanks to the legal ownership of these genes which had been designed as biotechnologies, the start-up from Salt Lake City sought to have a global monopoly on the hereditary breast cancer market, which was expected to experience robust growth. In the face of this offensive, institutional resistance (bringing together hospitals, ministries, associations, etc.) arose in the early 2000s in Europe and then in the United States (11). This resistance has often been interpreted as paradigmatic of the opposition between an “open science,” regulated by peers respecting the law of priority, and a “proprietary science,” regulated by the market, and respecting intellectual property (12). There was thus concern that the production of public knowledge would decline because of the legal appropriation of genes by private organizations (13).
An analysis of the British case, however, helps to get a more balanced view of this dichotomy. Indeed, (14–16) has shown that patents are perceived as legal weapons by private organizations as well as by public scientific, medical, and social institutions. Moreover, actors from private and public groups cannot be radically distinguished insofar as each defines the other in a complex network of negotiated interrelationships. In line with the studies undertaken on the role of patents in management science between academic circles and the business world (17, 18), Parthasarathy calls attention to how the NHS and Myriad reached an agreement in the early 2000s, making it possible to connect the “moral order” of the former, based on the principle of equal access to healthcare for all citizens, to the freedom of consumers valued by the latter. Among the negotiated items, it appears clearly that the issue of the transfer of data from Myriad to the NHS was essential and intended to add onto the public BRCA mutation databases. Beyond the issue of monopoly over the gene sequence through the patenting of genes or methods, this example clearly shows that the ownership of data is of crucial importance to both groups. With high-throughput sequencing technology, it has become a major issue.
Next generation sequencing
Two major developments placed the issue of the sharing of BRCA databases at the center of the debate from the 2010s. The first, naturally, was the full or partial decline in the patents claimed by Myriad Genetics around the world (19, 20). Indeed, this decline opened up the sequencing market to new private actors (GeneDx, Invitae, Pathway Genomics, Counsyl, etc.) and allowed public laboratories to carry out their activities. The second development was the progressive introduction of high-throughput DNA sequencing technology which began in the mid-2000s. The use of these “next generation” devices reinforced laboratories' analytical capacities. It is now possible to analyse within a few hours, and at the same time, several genes (panels) of several individuals, or even the complete genome of an individual at a much lower cost-100 dollars is regularly mentioned, compared to the 3 billion dollars spent in the framework of the Human Genome Project 20 years ago (21). All these developments have led stakeholders to focus on the issue of the classification of the genetic variants in BRCA genes.
A genetic variant from a sequenced individual can only acquire the status of “mutation,” i.e., the status of “pathogenic” variant, if it is clearly linked to a history of illness, either directly (in the individual or in their family) or indirectly (in a family affected by cancer and found to have the same variant). According to the current classification in genetics, the clinical significance of these variants may vary: they can be pathogenic, probably pathogenic, of unknown significance, benign, or probably benign. As (22) have pointed out, distinguishing between these categories is a major “interpretive dilemma” for geneticists. The classification of a variant in a given category depends on available data concerning the frequency of the link associating it with a specific disease. In the absence of data, the clinical significance of the variant is deemed unknown—a Variant of Unknown Significance (VUS)—until it is identified in other individuals with similar phenotypic characteristics. The importance of new DNA sequencing technologies thus lies in their ability to increase genetic databases more quickly in order to reduce the at times dramatic clinical uncertainty associated with diagnosed genetic anomalies (23). The sharing of information among geneticists, thanks to databases fed on an international scale, is a central issue1. This sharing of information, however, is now problematic.
Genomic databases
For several years now, science and technology studies have been stressing that physical infrastructure plays a central role in the production of knowledge (24–27). In this area, the study of genetic databases serves as a model (28–31). Indeed, the first molecular biology databases were launched by different public institutions around the world in the early 1980s [(32): 75]. With the spread of the Internet and the Human Genome Project in the 1990s, they quickly developed as a form of support for new open “communication regimes” between scientists, likely to encourage the emergence of new knowledge (33). However, an analysis of the construction of this information infrastructure shows that the modes of data publishing remain a major source of tension between different actors.
This tension has been highlighted by Bruno Strasser, for instance, in his study on the development of the comprehensive GenBank sequence database (32). This historian of life sciences argues that tensions linked to the different conceptions of data ownership arose from the outset of the project. Participants engaged in a “moral economy of natural history,” i.e., in a “system of values that places emphasis on the exchange of scientific knowledge” inherited from the naturalists of the eighteenth century, considered that the sequences published in scientific journals should be freely accessible data. Other participants, advocates of a “moral economy of experimentation” which has garnered momentum among molecular biologists, view sequences as the products of scientific activity and as the property of their authors. According to Strasser, GenBank embodies a form of hybridization of these two value systems. It appears that those who conceived it succeeded in taking advantage of the “ambiguity” of the very notion of “data,” owing to the fact that what seems “literally given” is at the same time “the result of an organized action” (34): 248). In the context of the Human Genome Project, this ambiguity has manifested itself in the emergence of information control modes which involve a complex interplay of revelation and concealment (35). Nowadays, as seen previously, in addition to the tensions inherent in the moral economies of science, other tensions associated with the political economy of knowledge resulting from the growing role played by private firms in the production of knowledge emerged from the early 1990s (36, 37). Beyond the question of the patentability of living organisms, it is now the question of sharing that is in front of the debate, like the case of BRCA1 and BRCA2 genes clearly shows it.
Data sharing
In the present case, i.e., the focus on BRCA1 and BRCA2 genes, there is no unique and comprehensive database of BRCA variants accessible to all professionals around the world. On the contrary, different databases developed by consortia of multinational public institutions or private organizations exist, but their access is generally limited. This is the case of the database developed by Myriad Genetics throughout the period the patents were under discussion. Although this is the largest database in the world, Myriad Genetics has exclusive access to it. This has given the company a major competitive asset in the BRCA testing market insofar as the database offers a solid basis on which to interpret results. According to genetics professionals, the main issue is not the sequencing itself. Rather, what matters most is the interpretation of the results intended to give clinical significance. This has turned out to be the most costly activity, both in terms of the recruitment of highly qualified personnel and for the development, maintenance, and access to huge databases that list the known variants of specific genes. Certain professionals estimate that there is a 1 to 10 ratio with regards to the cost of complete genome sequencing and its interpretation. In this context, ownership and the opening up of genetic variants databases emerges as a crucial issue.
From this context, the example of the future of the UMD BRCA base—Universal Mutation Database-BRCA—speaks volumes. Developed in the 1990s by a public consortium of French geneticists, it was considered to be one of the most important global databases until 2015. Driven by two major players in genetic testing in the United States [Quest Diagnosis and Laboratory Corporation of America (LabCorp)], the database was partially privatized in 2015. These two companies purchased the right to obtain access to data in exchange for funding the database. While the French sought to finance over the short- and medium-term an activity that had become too costly for public finances to sustain, the Americans' objective was to quickly be able to compete with Myriad Genetics by improving the quality of their analyses. The question that arises, then, is: How will this be handled over the long term? Will the French geneticists at the origin of the database still be able to access it? Will French patients still benefit from the knowledge generated thanks to the data they provided? What justifies this privatization if we consider the donations made by patients who agreed to have their data kept in this database? Similar questions had already been raised by the NHS during its negotiations with Myriad in the early 2000s, when the issue of the privatization of access to BRCA testing for British citizens arose (16). Questions revolving around access (currently and in the future) to genetic databases thus remain relevant.
Conclusion
At a time when the opening up of public data has become common practice in the field of administration (38), the example of the genetics of breast cancer shows that data sharing is still a major issue in research (39). The question here is the extreme overlapping of public issues and private interests. In this case, there is a need to go beyond a simple comparison between the open regimes of data publication associated with academic institutions, and the closed regimes of the privatization of knowledge developed by business communities. Hybrid forms of database ownership such as those mentioned earlier, highlight the need to pay attention to the significance given to data sharing during the initial negotiations underpinning their establishment. Once these databases are filled by voluntary citizens who provide their DNA data, data sharing becomes a crucial issue in terms of technical democracy (40). Once again, however, citizens seem to be largely absent from the debate about the ownership and use of the genomic data stored in these databases. With increased power given to major programmes seeking to collect big data in genomics, it may be time to reflect on how citizens can be informed and involved in the decisions that will be made in this area.
At the very least, it seems necessary to provide people with information about the future of their genomic data: in which databases will the data be stored? For how long? Who will be able to use them? Can they be exploited for commercial purposes by private firms? As in the field of the Internet, database contributors should be able to oppose the reuse of their “data” for the benefit of private interests. The information challenge involves the very value of consent (41).
Author Contributions
The author confirms being the sole contributor of this work and has approved it for publication.
Funding
This opinion paper is based on a research funded by the Fonds Avenir/Masfip pour la Recherche, 2016.
Conflict of Interest Statement
The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Acknowledgments
The author would like to thanks Emmanuel Rial-Sebbag, S. de Montgolfier, Pr Dominique Stoppa-Lyonnet, Pr Eric Vilain, and Dr. Zaki El Haffaf for their help and collaboration. Thank you to Catherine Davies from UBO BTU for the translation.
Footnotes
1. ^For example: Human Gene Mutation Database (HGMD) or Online Mendelian Inheritance in Man (OMIM).
References
1. Mayer-Schönberger V, Cukier K. Big Data: A Revolution That Will Transform How We Live, Work, and Think. Boston, NY: Eamon Dolan /Mariner /Houghton Mifflin Harcourt (2013).
2. Groves P, Kayyali B, Knott D, Van Kuiken S. The Big Data Revolution in Health Care. Seattle, DC: McKinsey Quarterly (2013).
Google Scholar
3. Guttmacher AE, Collins FS. Genomic medicine – a primer. N Engl J Med. (2002) 347:1512–20. doi: 10.1056/NEJMra012240
PubMed Abstract | CrossRef Full Text | Google Scholar
4. Alyass A, Turcotte M, Meyre M. From big data analysis to personalized medicine for all: challenges and opportunities. BMC Med Genomics 8:33. doi: 10.1186/s12920-015-0108-y
PubMed Abstract | CrossRef Full Text
5. Flores M, Glusman G, Brogaard K, Price ND, Hood L. P4 medicine: how systems medicine will transform the healthcare sector and society. Personal Med. (2013) 10:565–76. doi: 10.2217/PME.13.57
PubMed Abstract | CrossRef Full Text | Google Scholar
6. He KY, Ge D, He MM. Big data analytics for genomic medicine. Int J Mol Sci. 18:412. doi: 10.3390/ijms18020412
PubMed Abstract | CrossRef Full Text
7. Scollen S, Page A, Wilson J. From the data on many, precision medicine for “one”: the case for widespread genomic data sharing. Biomed Hub (2017) 2:15. doi: 10.1159/000481682
CrossRef Full Text | Google Scholar
8. Stoppa-Lyonnet D. Tests génétiques: le défi n'est plus le séquençage mais l'interprétation. Pour Sci. (2014) 439:12–3.
Google Scholar
9. Parthasarathy S. Building Genetic Medicine. Breast Cancer, Technology, and the Comparative Politics of Health Care. Cambridge, MA: MIT Press (2017).
Google Scholar
10. Sherkow JS, Greely HT. The history of patenting genetic material. Annu Rev Genet. (2015) 49:161–82. doi: 10.1146/annurev-genet-112414-054731
PubMed Abstract | CrossRef Full Text | Google Scholar
11. Cassier M, Stoppa-Lyonnet D. L'opposition contre les brevets de myriad genetics et leur révocation totale ou partielle en Europe: Premiers enseignements. Med Sci. (2005) 21:648–62. doi: 10.1051/medsci/2005216-7658
CrossRef Full Text | Google Scholar
12. Dasgupta P, David PA. Towards a new economics of science. Res Policy (1994) 23:487–521. doi: 10.1016/0048-7333(94)01002-1
CrossRef Full Text | Google Scholar
13. Orsi F, Coriat B. Are Strong Patents beneficial to Innovative Activities - Lessons from genetic testing for breast cancer controversies. Indus Corp Change (2005) 14:1205–21. doi: 10.1093/icc/dth086
CrossRef Full Text | Google Scholar
14. Parthasarathy S. The patent is political: the consequences of patenting the BRCA genes in Britain. Community Genet. (2005) 8:235–42. doi: 10.1159/000087961
PubMed Abstract | CrossRef Full Text | Google Scholar
15. Parthasarathy S. Architectures of genetic medicine: comparing genetic testing for breast cancer in the USA and UK. Soc Stud Sci. (2005) 35:5–40. doi: 10.1177/0306312705047172
PubMed Abstract | CrossRef Full Text | Google Scholar
16. Parthasarathy S. Reconceptualizing technology transfer: the challenge of building an international system of genetic testing for breast cancer. In: Guston DH, Sarewitz D, editors. Shaping Science and Technology Policy: The Next Generation of Research. Madison, WI: University of Wisconsin Press (2006).
Google Scholar
17. Huang KG, Murray FE. Does patent strategy shape the long-run supply of public knowledge: evidence from human genetics. Acad Manag J. (2009) 52:1193–221. doi: 10.5465/amj.2009.47084665
CrossRef Full Text | Google Scholar
18. Murray F. The oncomouse that roared: hybrid exchange strategies as a source of distinction at the boundary of overlapping institutions. Am J Sociol. (2010) 116:341–88. doi: 10.1086/653599
CrossRef Full Text | Google Scholar
19. Cassier M, Stoppa-Lyonnet D. Un juge fédéral et le gouvernement des États-Unis interviennent contre la brevetabilité des gènes. Med Sci. (2011) 27:662–7. doi: 10.1051/medsci/201228s204
CrossRef Full Text | Google Scholar
20. Pollack A,. Myriad Genetics Ending Patent Dispute on Breast Cancer Risk Testing, New York Times. (2015). Available online at: https://www.nytimes.com/2015/01/28/business/myriad-genetics-ending-patent-dispute-on-breast-cancer-risk-testing.html (Accessed 18/09/18)
21. Reuter. New Illumina Tech Could Usher in $100 Gene-Sequencing Era. (2017). Available online at: https://www.reuters.com/article/us-illumina-stocks/new-illumina-tech-could-usher-in-100-gene-sequencing-era-idUSKBN14U1PO (Accessed 18/09/18)
22. Timmermans S, Tietbohl C, Skaperdas E. Narrating uncertainty: variants of uncertain significance (VUS) in exome sequencing.(2016) BioSocieties 12:439–58. doi: 10.1057/s41292-016-0020-5
CrossRef Full Text | Google Scholar
23. Stivers T, Timmermans S. Negotiating the diagnostic uncertainty of genomic test results. Soc Psychol Quart. (2017) 79:199–221. doi: 10.1177/0190272516658770
CrossRef Full Text | Google Scholar
24. Star SL, Ruhleder K. Steps towards an ecology of infrastructure: design and access for large information spaces. Inform Syst Res. (1996) 7:111–34. doi: 10.1287/isre.7.1.111
CrossRef Full Text | Google Scholar
25. Star SL. The Ethnography of Infrastructure ≫. Am Behav Sci. (1999) 43:377–91. doi: 10.1177/00027649921955326
CrossRef Full Text | Google Scholar
26. Bowker G, Baker K, Millerand F, Ribes D. Towards information infrastructure studies: ways of knowing in a networked environment. In: Hunsinger J, Klastrup L, Allen M, editors. International Handbook of Internet Research Dordrecht. Dordrecht: Springer (2010). p. 97–117.
Google Scholar
27. Bowker GC, Star SL. Sorting Things Out: Classification and Its Consequences. Cambridge, MA: MIT Press (1999).
28. Brown C. The changing face of scientific discourse: analysis of genomic and proteomic database usage and acceptance. J Am Soc Inform Sci Technol. (2003) 54:926–38. doi: 10.1002/asi.10289
CrossRef Full Text | Google Scholar
29. Bowker G. Memory Practices in the Sciences. Cambridge, MA: MIT Press (2005).
30. Hine C. Databases as scientific instruments and their role in the ordering of scientific work. Soc Stud Sci. (2006) 36:269–98. doi: 10.1177/0306312706054047
CrossRef Full Text | Google Scholar
31. Dagiral E, Peerbaye A. Making knowledge in boundary infrastructures: inside and beyond a database for rare diseases. Sci Technol Stud. (2016) 29:44–61.
Google Scholar
32. Strasser B. The Experimenter's Museum: GenBank, natural history, and the moral economies of biomedicine. Isis (2011) 102:60–96. doi: 10.1086/658657
PubMed Abstract | CrossRef Full Text | Google Scholar
33. Hilgartner S. Biomolecular databases: new communication regimes for biology? Sci Commun. (1995) 17:240–63. doi: 10.1177/1075547095017002009
CrossRef Full Text | Google Scholar
34. Desrosières A. The Politics of Large Numbers. A History of Statistical Reasoning. Boston, MA: Harvard University Press (2002).
Google Scholar
35. Hilgartner S. Reordering Life: Knowledge and Control in the Genomics Revolution. Cambridge, MA: MIT Press (2017).
36. Cassier M, Gaudillière JP. Recherche, médecine et marché: la génétique du cancer du sein. Sci Soc Santé (2000) 18:29–49. doi: 10.3406/sosan.2000.1504
CrossRef Full Text | Google Scholar
37. Huang KG, Murray FE. Entrepreneurial experiments in science policy: analyzing the human genome project. Res Policy (2010) 39:567–82. doi: 10.1016/j.respol.2010.02.004
CrossRef Full Text | Google Scholar
38. Goëta S, Davies T. The daily shaping of state transparency: standards, machine-readability and the configuration of open government data policies. Sci Technol Stud. (2016) 29:10–30.
Google Scholar
39. Nature Editorial. The ups and downs of data sharing in science. pooling clinical details helps doctors to diagnose rare diseases — but more sharing is needed. Nature (2016) 534:435–6. doi: 10.1038/534435b
CrossRef Full Text | Google Scholar
40. Callon M, Lascoumes P, Barthe Y. Acting in an Uncertain World. An Essay on Technical Democracy. Boston, MA : MIT Press (2009).
Google Scholar
41. Ducournau P. The viewpoint of DNA donors on the consent procedure. New Genet Soc. (2007) 26:105–15. doi: 10.1080/14636770701218191
CrossRef Full Text | Google Scholar
Keywords: big data, genomics, BRCA, oncogenetics, database
Citation: Derbez B (2018) Big Data Sharing: A Crucial Democratic Issue for Genomic Medicine. Front. Public Health 6:334. doi: 10.3389/fpubh.2018.00334
Received: 03 July 2018; Accepted: 31 October 2018; Published: 28 November 2018.
Thomas Lefèvre, Université Paris 13, FranceEdited by:
Nicole C. Nelson, University of Wisconsin-Madison, United StatesReviewed by:
Copyright © 2018 Derbez. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Benjamin Derbez, benjamin.derbez@univ-brest.fr
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher."
This published article is also available on ResearchGate:
  • asked a question related to Databases
Question
43 answers
How to obtain currently necessary information from Big Data database systems for the needs of specific scientific research and necessary to carry out economic, business and other analyzes?
Of course, the right data is important for scientific research. However, in the present era of digitalization of various categories of information and creating various libraries, databases, constantly expanding large data sets stored in database systems, data warehouses and Big Data database systems, it is important to develop techniques and tools for filtering large data sets in those databases data to filter out of terabytes of data only information that is currently needed for the purpose of conducted scientific research in a given field of knowledge, for the purposes of obtaining answers to a given research question and for business needs, eg after connecting these databases to Business Intelligence analytical platforms. I described these issues in my scientific publications presented below.
Do you agree with my opinion on this matter?
In view of the above, I am asking you the following question:
How to obtain currently necessary information from Big Data database systems for the needs of specific scientific research and necessary to carry out economic, business and other analyzes?
Please reply
I invite you to the discussion
Thank you very much
Dear Colleagues and Friends from RG
The issues of the use of information contained in Big Data database systems for the purposes of conducting Business Intelligence analyzes are described in the publications:
I invite you to discussion and cooperation.
Best wishes
Relevant answer
Answer
Respected Doctor
Big data has three characteristics as follows:
1-Volume
It is the volume of data extracted from a source, which determines the value and capabilities of the data to be classified as big data, and by the year 2020, cyberspace will contain approximately 40,000 megabytes of data ready for analysis and information extraction.
2-Variety
It means the diversity of extracted data, which helps users, whether they are researchers or analysts, to choose the appropriate data for their field of research and includes structured data in databases and unstructured data (such as: images, clips, audio recordings, videos, SMS, call logs, and data). Maps (GPS), and require time and effort to prepare them in a suitable form for processing and analysis.
3-Velocity
It means the speed of producing and extracting data and sending it to cover the demand for it. Speed is a crucial element in making a decision based on this data, and it is the time we take from the moment this data arrives to the moment the decision is made based on it.
There are many tools and techniques that are used to analyze big data, such as: Hadoop, Map Reduce, HPCC, but Hadoop is one of the most famous of these tools. Big data is on several devices and then distributes the processing process to these devices to speed up the processing result and is returned or called as a single package. Tools that deal with big data consist of three main parts:
1- Data mining tools
2- Data Analysis Tools
3- Tools for displaying results (Dashboard).
Its use also varies statistically according to the research objectives (improving education, effectiveness of decision-making, military benefit, economic development, health management ... etc.).
greetings
Senior lecturer
Nuha hamid taher
  • asked a question related to Databases
Question
14 answers
Below are some issues related to Big Data database technologies that can be developed scientifically:
- Application of data processing technology in Big Data database systems for modern education 4.0,
- Improvement of forecasting of natural, climatic, economic, economic, financial, social etc. phenomena based on analyzing large data sets,
- Analysis of sentiment, opinions of citizens, Internet users regarding brand recognition of companies, customer reviews of specific services and products, views on various topics, citizens' worldview based on the analysis of large collections of information downloaded from various websites, from comments downloaded from social media portals,
- Analysis of information and marketing services of commercially operating companies that carry out specific analyzes of sentiment, citizens' opinions, Internet users regarding brand recognition, customer reviews of specific services and products etc. on behalf of other companies that purchase specific analytical reports,
- Analysis of the possibilities of cooperation, synergy, correlation, conducting interdisciplinary research, connecting Big Data database systems with other information technologies typical for the development of the current fourth technological revolution called Industry 4.0, which include technologies such as: cloud computing, machine learning, Internet of Things, Artificial Intelligence, etc.
In what other areas are the technologies of processing and analysis of information in Big Data database systems used?
Please answer
Best wishes
Dear Colleagues and Friends from RG
The issues of the use of information contained in Big Data database systems for the purposes of conducting Business Intelligence analyzes are described in the publications:
I invite you to discussion and cooperation.
Best wishes
Relevant answer
Answer
Dear Srdjan Atanasijevic,
Yes. You pointed to the important conditions for the development of big data analysis technology with the use of Big Data Analytics.
Thank you, Regards,
Dariusz Prokopowicz
  • asked a question related to Databases
Question
3 answers
Hi
I'm working in Diagnosis Location  of aphasia lesion of  Stroke patients and I need to  database of MRI images for aphasia Patients.
Relevant answer
Answer
Hi Driss,
You could check out the free datasets listed on www.imagingQA.com at the following link :
If you don't find what your looking for, its free to join so please open a new discussion topic (there are lots of data scientists and imaging specialists in the community) :
  • asked a question related to Databases
Question
22 answers
I have "x'pert highscore plus (3.0.5)" software to plot my samples' XRD patterns. To check for the reference, software cannot find the exact match. So, I need to add Reference databases to the software. How can I do it?
What is the proper method for this particular version?
Relevant answer
Answer
you can download the COS2021.hsrdb for XRD Database from the link below;
its 6.5Gbs.
  • asked a question related to Databases
Question
31 answers
What kind of scientific research dominate in the field of Big Data database systems?
Please, provide your suggestions for a question, problem or research thesis in the issues: Big Data database systems.
Please reply. I invite you to the discussion
Dear Colleagues and Friends from RG
Some of the currently developing aspects and determinants of the applications of data processing technologies in Big Data database systems are described in the following publications:
I invite you to discussion and cooperation.
Best wishes
Relevant answer
Answer
Informative question
  • asked a question related to Databases
Question
37 answers
What kind of scientific research dominate in the field of Popularization of science on the Internet?
Please, provide your suggestions for a question, problem or research thesis in the issues: Popularization of science on the Internet.
Please reply.
I invite you to the discussion
Thank you very much
Best wishes
Relevant answer
Answer
Currently any scientific topic relating closely or remotely to Covid19 is "in the buzz", extremely popular.
Vaccine and their properties and effects, propagation of the contamination, comorbidity statistics, this is on the radar of everyone in the broadest audience possible for science.
The popularity of the topics and anxiety around related issues additionallly creates a window for anti-science: fact-negationists, people opposing vaccines whereas vaccines have demonstrated huge benefits not obtainable otherwise (quasi eradication of polyomelitis, tuberculosis, etc), witchcraft recipe mongers (like a former US president who recommended injection of toilet desinfectant or something not far from it. Horrible effects to be expected).
It is likely that anything tagged "Covid19" becomes highly impopular once the pandemic will be over (thus belonging to "bad memories" not to be recalled with any pleasure).
In the longer term popular interest for Space science keeps growing, with impressive achievements from many missions, Europe, Russia, China, USA, Japan, India, etc.
This is a gem, where everyone sees common interest.
Climate science with its degree of probabilistic scenario calculation not understood by all is progressing its audience, especially with young people becoming more aware of huge risks for their future in the absence of immediate radical changes in the way we live on this planet earth.
  • asked a question related to Databases
Question
37 answers
In some countries, work in the scientific field is paid mediocre. Young people have little interest in the complex problems of science. The popularization of science is necessary. The pages of the RG publish questions and answers to these scientific questions.
Can the publication of answers to questions in the RG be considered a popularization of science?
Relevant answer
I am on the same line as Xavier Rouard.
  • asked a question related to Databases
Question
3 answers
Do you know any databases that not only specifies the plant origins of a specific phytochemical, but also demonstrates how much of that substance may be extracted from some specified parts of the plant?
I have also found this awesome website but it doesn't work at all! Beside answering my question, could you please let me know if you could get any results by searching a term in it.
Relevant answer
Answer
You can try Dr. Duke's Phytochemical and some other ethnobotanical Databases.
  • asked a question related to Databases
Question
7 answers
Hello,
I am teaching a database systems class and I wish to guide students on how a distributed databases work. 
We are using postgres DBMS for illustration.
What other applications do I need to setup a DDBMS in one windows OS machine.
Best
Derdus
Relevant answer
I would recommend Distributed Postgres using the Timescaledb extension. No DaaS provides this because the licensing prohibits it, however technology enablers such as Full Stack Engine. my firm, provide this as an implicit operator for Kubernetes. You wind up launching stacks like this:
You can do so in our cloud, which is free for life with few special conditions:
* Your research contributes to our open collective non-profit if plausible/possible
* You operate only stacks we approve of for the sake of security & performance on shared systems
Pretty lax in terms of who uses it, I'm not even tapped into the full power and it's got plenty of tenants. That could change, but for now it remains free. Just ping me and I can set you up for free with a Github Username.
Ping @large.systems on discord: https://lmg.systems/discord
Relevant:
  • asked a question related to Databases
Question
3 answers
Hello everyone,
I have been having problems with gathering all the information that I need for my study and also, some problems with fixed effects.
1) First of all, I am interested in comparing exports of Panama to certain pair countries (of Central America and some other). I am using panel data in stata (1994-2017) and variables of interest are:
1) Exportations (y)
2) Distance of the capitals
3) GDP (c_origin)
4) GDP (destination_c)
5) Population (c_origin)
6) Population (destination_c)
-->+ some dummy variables
7) common language (0,1)
8) borders between countries (0,1)
9) whether the destination country belongs to Central America (main variable of interest; 0, 1)
Is there any page where I could get most of the data? I used CEPII and could find data until 2015 for distance, language, population and GDP (but only the current one). ***I would need data for 2016 and 2017 and also a help of GDP deflator and exportations of Panama to those specific countries in a period of 1994-2017. How do I convert the GDP measured in current dollars with a help of GDP deflator and are there any pages where I already have those values? *** I first used GDP measured in current dollars and with the data I had, the outcome did not turn out okay.
2) Another problem I encountered when doing a hierarchical regression with my first "faulty" database was that, when including fixed effects of the countries, all of the variance was already explained by countries so there was no variance left for my variables of interest to explain. On the other hand, if I do not include it-the model will be biased. What is more, without any fixed effects, in some of the regression blocks I get counter intuitive correaltions/b coefficient values. Therefore, my questions are (apart from the *** sentence marked above):
a) How to solve this problem so my model is not biased and so my variables still explain significant variance of the y (exportations) variable?
b) Which fixed effects should I add to the model? Country, distance, population... which ones?
This is the first time I am using econometrics and fixed effects so your help would mean a lot!
Thank you in advance!
Relevant answer
Answer
  • asked a question related to Databases
Question
12 answers
hi all,
how to tell R that the row names is for instance certain column, when exporting files to r using read.csv file function ?
Relevant answer
Answer
Use M.S Excel to arrange data in csv file
It is very easy
  • asked a question related to Databases
Question
4 answers
I need to do an analysis with STRUCTURE using dominant data. If you have an example of this type of database, please contact me.
Relevant answer
Answer
please , could you provide us a file sample which could be used for STRUCTURE program when dominante marker (1/0) data is analysed.
Thank you so much
  • asked a question related to Databases
Question
4 answers
Most of the publicly available databases give only the basic information like age, gender, mode of infection, etc. regarding the infected patients suffering from CoVID 19. So, can anyone recommend or suggest more specific databases related to image, speech or clinical data of the patients that are meant for open research?
  • asked a question related to Databases
Question
5 answers
I have been trying to find on a online source for FG-Net aeging database , MOPRH database and YGA database. I can't seem to find none of them available to download . Does anyone know any other ageing database that i can use? Thankyou. I need an ageing database for a my school thesis as I am building a face recognition system and classifying the faces by age . It would be very helpful.
  • asked a question related to Databases
Question
15 answers
The medical conditions include their heart and respiratory rates, systolic and diastolic pressure, etc. It would be more helpful if the dataset also includes information about usage of medications like NSAIDS and DMARDS, by the patients prior to CoVID 19 infection.
Relevant answer
Answer
John Hopkin database for COVID 19 might have the resources you are looking for,
  • asked a question related to Databases
Question
18 answers
I need a dermoscopic image database to test an algorithm for automatic diagnose. In particular I am interested in images with blue-black colors within the lesion.
Can anyone tell me where to find it?
  • asked a question related to Databases
Question
6 answers
Biomedical
Medical hyperspectral imaging
Relevant answer
Answer
  • asked a question related to Databases
Question
7 answers
I need the research paper in which dataset should also be available with that so that i can start my research.
Relevant answer
Answer
I've recently published o public dataset. It contains 2690 images of lemons with annotations. For more information please visit: https://github.com/softwaremill/lemon-dataset
  • asked a question related to Databases
Question
13 answers
What in your opinion will the applications of the technology of analyzing big information collections in Big Data database systems be developed in the future?
In which areas of industry, science, research, information services, etc., in your opinion, will the applications of technology for the analysis of large collections of information in Big Data database systems be developed in the future?
Please reply
I invite you to the discussion
I described these issues in my publications below:
I invite you to discussion and cooperation.
Best wishes
Relevant answer
Answer
Dear Shafagat Mahmudova, Len Leonid Mizrah, Reema Ahmad, Shah Md. Safiul Hoque, Natesan Andiyappillai, Omar El Beggar, Tiroyamodimo Mmapadi Mogotlhwane, Thank you very much for participating in this discussion and providing inspiring and informative answers to the above question: What will Big Data be like in the future? Thank you very much for the interesting information and inspiration to continue deliberations on the above-mentioned issues. This discussion confirms the importance of the above-mentioned issues and the legitimacy of developing research on this subject. I also believe that the Big Data Analytics analytical and database technology is one of the most developing technologies included in Industry 4.0. What do you think about it?
Thank you very much and best regards,
Dariusz Prokopowicz
  • asked a question related to Databases
Question
5 answers
Currently, I am going to implement the surveying method in one of my research related to business units. Orbis database (of Company information across the globe | BvD) or similar would be useful for me to make a sample according to certain criteria and obtain contacts. My organization does not provide access to the Orbis Database. Maybe someone has access to this database and could provide me with data from it or recommend free alternatives?
Thank you in advance.
Relevant answer
Answer
The alternatives are:
However, those databases require you to have an account. Are you requiring data for private companies?
Some countries allowing you to buy the audited company report that is listed in the stock market.
  • asked a question related to Databases
Question
6 answers
I want to perform database operations in distributed database environment. If somebody have idea relating to it. Please share.
Thanks in advance.
Relevant answer
Answer
You can use Postgresxl for Relational or DGraph for Graphical Distributed Environment. Both these tools can be used in virtual or physical mode. You may use Row (Postgresql) and Column (MonetDB) stores for data partitioning labs.
  • asked a question related to Databases
Question
15 answers
To put you in the context, our work consists in realizing a machine learning model which takes a vector with the properties of a farm, includes the weather why not.Then from a database of crops, make a recommendation of the most suitable crop for the soil. Therefore a recognition on the elements which help in this decision is an important part before starting the collection of the data necessary for the model.
Relevant answer
Answer
I would like to recommend you to go through two of my articles:
1. B.K.Tripathy and Sooraj, T. R.: An Interval Valued Fuzzy Soft set Based Optimization Algorithm for High Yielding Seed Selection, International Journal of Fuzzy Sets and Applications, IGI publications, vol.7, issue 2, (2018), pp. 44 - 61.
2. B.K.Tripathy, Sooraj, T. R.: Optimization of seed selection for higher product using Interval valued Hesitant Fuzzy Soft Sets, Songklanakarin Journal of Science and Technology (SJST), 40 (5), Sep. -Oct. 2018, (2018), pp.1125-1135 .
  • asked a question related to Databases
Question
5 answers
Hello,
I wonder what would be the best database that one can use to store and manage a large amount of data (maybe a few hundred gigabytes), in the main basin level, that includes:
1) GIS data:
  • raster
  • vector
2) Hydrological data:
  • time-series of different variables (e.g., rainfall, temperature, humidity, etc.) for different stations.
From the internet, I found that the below databases could be used:
  • PostGIS
  • MongoDB
Thank you very much
Relevant answer
Answer
Moreover, consider that PostgreSQL with the spatial PostGIS extension is entirely open-source and it is developed by a strong and active community. I therefore definitely recommend you to go with PostGIS and not to use proprietary stuff such as all the ESRI products.
In addition, PostgreSQL is - despite being a relational DBMS - offering some object-oriented features such as hstores or jsonb which allows you to extend your data model - if required - to unstructured data.
  • asked a question related to Databases
Question
10 answers
Dears,
I am looking for online free sources of gridded high spatial (1 x 1 km) and temporal (hourly or tri-hourly) resolution weather/climate data to be used in my research. The spatial domain is Europe (or even the world if possible). Please, could you provide me some suggestion on the best available data sources?
Thank you for the support.
Best,
Giorgio
  • asked a question related to Databases
Question
13 answers
Dear researchers, is it possible to marge two or more RIS file from Scopus database into one RIS file.
Relevant answer
Answer
If you're using Windows, the "copy" command should help (files should be in the same folder), using a Command prompt:
copy *.ris db.ris
If you're using Ubuntu Linux and can move all the ris files inside a folder, just open a terminal in the folder and use "cat":
cat *.ris > db.ris
There are more complex commands, but you can move files in a folder easily if you search for them by extension and cut/paste.
The files are basic text files, so it works for other text files of interest (i.e. CSV).
I used this to import some stuff to my JabRef database... seems like the new version of JabRef doesn't like Drag&Drop and can't select multiple files for import.
p.s. there's a lovely extension for browsers called: BibItNow
  • asked a question related to Databases
Question
24 answers
Hi there,
Can anyone recommend a Delphi method online tool that I can use? 
Mesydel.com has been recommended but you are not able to download it. 
I have been recommended http://armstrong.wharton.upenn.edu/delphi2/ as well although the website is less user friendly than I would like.
Any other suggestions?
Thanks.
L
Relevant answer
Answer
I wonder why you need to use the Delphi software. If you simply want to conduct a small-scale Delphi study (less than 50 experts), it is no need to make use of any specific Delphi program. Just follow a Delphi methodology starting from identifying experts, through questionnaire design, data collection (the number of rounds) and data analysis, to data interpretation. It is quite sad that researchers nowadays cannot carry out research without a research-assisted program. In fact, we should focus more on research philosophy and methodology with reporting guidelines than the IT or software.
  • asked a question related to Databases
Question
7 answers
As a part of various financial research initiative, we need different types of dataset. This question is asked for identifying the online sources of financial data both free and paid version. This information can help every researchers to locate Online Directory or data bank for financial and economic analysis.
Relevant answer
Answer
OECD, UNCTAD, Groningan, Penn World data and Federal Reserve Economic Data FRED St. Louis Fed,
  • asked a question related to Databases
Question
11 answers
Please suggest me a standard images database related Age estimation and prediction.
Relevant answer
Answer
IMDB-WIKI: (523,051)
FG-NET: (1,002)
MORPH: (55,134)
CACD: (163,446)
LAP : ( 4,691)
  • asked a question related to Databases
Question
6 answers
I have gone through many questions but there is no single Discussion thread giving the link, API, JSON for the dataset.
I am making a start so it will be easily searchable by everyone
<COUNTRY NAME> :: <TYPE - JSON/CSV/... etc...> :: <KIND OF DATA - COUNT/NETWORK INFO>
<URL/s>
Relevant answer
Answer
Italy, web, data on symptom onset, ministry of the intern: http://www.salute.gov.it/portale/nuovocoronavirus/homeNuovoCoronavirus.jsp?lingua=english
  • asked a question related to Databases
Question
5 answers
The " COVID-19 " pandemic has been an unprecedented situation with rapid research and development taking place. There has been a lot of data that is being generated like the total number of patients infected, Active case, Recovered, Deceased.
Data can be obtained from different sources, Example in India " https://www.mohfw.gov.in/ " is the official website of Govt. of India, " https://www.covid19india.org/ " is a website by a group of dedicated volunteers, then we have " https://www.worldometers.info/coronavirus/ " which is worldometer maintained by Dadax
Is there any data validation model or method to verify the data that has been put out.
There is a possibility of overstating or understating the numbers. There can be a discrepancy between sources.
Has there been a solution or discussion about this in the research community, If so what is it?
Relevant answer
Answer
Chandrasekar Subramani Narayana you have here an example of ground truth appearing, which is not the official data of a government, but the analysis from very professional independent journalists and the official institute of statistics for the UK, ONS:
-the government trailed at less than 30 000 deaths
-the ONS published data, which means 40 000 deaths
-The Guardian (and Reuters) picked it up
-the government sticked to their figure.
Is it typical? The public gets more reliable information from sources other than government?
Another case was PPE, equipment for health staff. The government said "all good" , the official voice of the NHS (health care) said "missing masks and equipment almost everywhere"
Yet another controversy: tests
The government set a target of 100 000/day. They reported earlier confusing things to avoid saying they had failed to meet this.
Puzzling is it not?
For sure we now know who should not be trusted, but it should not be that way, not in a democracy.
Or, put it otherwise, is it not transparency and accuracy of information which characterises a democracy?
  • asked a question related to Databases
Question
4 answers
CTU-UHB Intrapartum CTG database consists of CTG records and clinical information. The data in database has been extracted from the OB TraceVue system to an open format using software. I need the raw data of this database. The database contains .dat and .hea file. But i cann't find a way to extract the raw data in the database. Can anyone help me?
Relevant answer
Answer
where can i find the data dictionary for the clinical parameters for this data? I could not find on physionet site..Thanks for any help
  • asked a question related to Databases
Question
4 answers
I want to do sample project with SVM .But i can not retrieve data from ZINC database .
I need sample data from this database for classification.My method for classification is SVM.
Thanks
Relevant answer
Answer
Hi, I downloaded from zinc library on windows and on Linux systems.
for windows, I select the desired LogP and molecular weight and by download, it gives me a text file with Linux then I copied them and used it with internet download manager I downloaded approximately million compounds by this step it was giving me rar files I just decompressed them.
for Linux as II remember I used WGET program Itext was using the same text file that I get from zinc with this program.
best regards.
  • asked a question related to Databases
Question
9 answers
I want to study the relationship between different epigenetic factors and the different types of cancer using existing records in epigenetic and / or oncological databases, but, as a bioinformatician, I have never worked with epigenetics data, so I do not know they are available in what format, they require what type of preprocessing, nor what tools I can use to analyze them.
I would really appreciate if someone gave me some basic indications of how I should start, or if someone recommended me a paper or tutorial about how to work with epigenetic data in cancer bioinformatics.
Relevant answer
Answer
If you are interested in a particular gene, UCSC genome browser HAIB Methyl RRBS Track (ENCODE at UCSC Downloads Subtracks⇓ Description⇓ Contact⇓ HAIB Methyl RRBS Track Settings) is a good place, too.
  • asked a question related to Databases
Question
4 answers
Hello,
I am looking for databases of quadruplex structures (both C-based i-Motifs and G-based G-quadruplexes (G4s)).
I have only found the databases:
- G4Hunter supplementary material database (mostly all DNA G4s)
Does anybody else knows if any other source of quadruplex structures exist?
Thanks in advance.
Relevant answer
Answer
As I didn’t find any database of tetraplex structures, I decided to create my own. I then used it in the package G4-iM Grinder.
In the latest update (V1.5.9) G4-iM Grinder’s internal database (V.2.5.1) has been expanded to a total sum of 2851 already known-to-form or known-not-to-form quadruplex nucleotidic sequences.
- 2141 of these sequences form tetraplex and 710 don’t.
- 283 are i-Motifs and 2568 are G4s.
- 1858 are DNA and 993 are RNA.
This database is used by the algorithm to locate any known tetraplex within its results.
However, the database is available for anyone to use.
[GiG.DB within G4iMGrinder package]
The database also includes references for each entry and some biophysical results (which will be soon expanded).
Hope this helps.
  • asked a question related to Databases
Question
4 answers
Hello everyone
I am currently a master student in Nevşehir Hacı Bektaş Veli University, Department of Geography, studying Physical Geography. My main areas of expertise are; İn additıon to geographical analysis, plant geography, data mining, map reduce and hadoop systems, land planning, plant taxonomy, I have been working on social media analytics and social media applications and analysis, social media and geography education in scientific and technical terms. But my main focus is on "Data Mining and Plant Geography modeling". I have a technical research article on this topic in my Research gate research account as full text. Study name "Creatıon of Plant Geography Databases Wıth The Map Reduce Modelıng ın The Clusterıng of the Large Geographıcal Data Sets". This study my Map Reduce and Hadoop systems and algorithms addition to data mining, GIS, plant geography research methods - techniques and various international plant databases, taking advantage of biological databases in Turkey and vegetatıon carried out ın the world, plant geography and so on. I tried to develop a new database model with this latest work that will contribute to the fields. In conclusion, I would especially like to listen and take advantage of the ideas and opinions of my colleagues and teachers working on data mining and geography, plant geography or vegetation, especially among geographers. Thanks to everyone who contributed in advance. Sincereley.
  • asked a question related to Databases
Question
3 answers
In my investigation of the triangular relationship between international sales, international ownership and the riskiness of a stock I am looking for a database that could provide the % of foreign ownership of shares from DAX and MDAX companies between 2003-2019.
Relevant answer
Answer
It's around 42 per cent in average currently. More Dax than M-Dax. Quote me if you need a reference.
  • asked a question related to Databases
Question
4 answers
I have a list of reference SNPs IDs (rsid) and I need to retrieve the associated diseases ... what are the suitable bioinformatics tools or databases?
Relevant answer
Hi Ahmad,
You can find some resources in the answers to this similar question: https://www.researchgate.net/post/Is_there_a_list_of_human_SNPs_associated_with_a_disease
Some examples:
PehGenl (Phenotype-Genotype): Users can search based on chromosomal location, gene, SNP, or phenotype and view and download results including annotated tables of SNPs, genes and association results, a dynamic genomic sequence viewer, and gene expression data. https://www.ncbi.nlm.nih.gov/gap/phegeni
NHGRI-EBI Catalog of published genome-wide association studies https://www.ebi.ac.uk/gwas/home (for example: https://www.ebi.ac.uk/gwas/search?query=rs7329174)
PharmGKB, a pharmacogenomics knowledge resource that encompasses clinical information including clinical guidelines and drug labels, potentially clinically actionable gene-drug associations and genotype-phenotype relationships. https://www.pharmgkb.org/
I hope this helps
  • asked a question related to Databases
Question
3 answers
I'd like to publish some ideas about taxonomical database. This is possible to do in Biodiversity Data Journal or PeerJ, but I need to pay for an article. Does anybody know similar journals without article processing charge?
Relevant answer
Answer
  • asked a question related to Databases
Question
3 answers
For my research, I am trying to find ecological, geographic, hydrologic, social, economic and political spatial/GIS data that is preferably free and easily available. I am especially interested in layers associated with protected areas, distribution of Adivasi populations, Adivasi owned / managed lands, watersheds, dams/roads/mines/embankments, land ownership, or any other data within these fields. I would greatly appreciate some inputs/recommendations/tips as the government data.gov.in website has been difficult to navigate and almost impossible to find data on and the bhuvan website also doesn't allow data downloads.
Relevant answer
Answer
Akanksha Sharma Link Districts/ Blocks with census of India data.
  • asked a question related to Databases
Question
9 answers
Hello, can anyone help me?
When I open cif file with file>open , the software (crystal explorer 17.5) shows error processing CIF:
Error in TEXTFILE:open_new_file_for_write ... error opening new file.
I checked the cif file with CCDC checkCIF and it shows correct, opened Mercury generated cif file from my original cif file and gain the same error. I tried cif files from ccdc database but unsuccessfully. Could anyone give me a hand, please?
Relevant answer
Answer
There are many data in original cif file which makes it too heavy to ne uploaded in crystal explorer. In order to remove those data its better to open your file with mercury and save it in .cif format.
  • asked a question related to Databases
Question
3 answers
I am seeking for the minimal inhibitory concentrations of different antimicrobial agents (such as chloramphenicol, ketoconazole, nitrofurantoin, etc.). Maybe there is a database where it should be possible to find MICs against different bacteria and fungi (S. aureus, E.coli, P. aeruginosa, K.pneumoniae, C.albicans, etc.)?
Relevant answer
Answer
Many books and periodicals have methods to measure the minimum inhibitor concentration of antimicrobial agents.
  • asked a question related to Databases
Question
6 answers
It could be series of images too but not single image for emotion
Relevant answer
Answer
  • asked a question related to Databases
Question
3 answers
I am currently trying to search for data on RLFS (R-loop forming sequences) for the human genome on UCSC Table Browser but I cannot find anything. Does anyone know if these data exist?
Currently I am trying to generate it myself using QmRLFS-finder, but it would be great if I could find other sources.
Thank you in advance,
Ana
Relevant answer
Answer
You can search the Genome Browser mailing list archives and/or email them directly here:
  • asked a question related to Databases
Question
8 answers
I've got a big amount of environmental data as independent variables, so I used a PCA to work better with them. But I have problems in extracting/converting the data from de Principal Components to make them work like variables in different GLMs. I'm working with R software. Can anyone help me?
Thank you,
Ferran
Relevant answer
Answer
factoextra is an R package for extract PCA data.
You can use PCA scores directly in glm model:
library(tidyverse)
set.seed(1234) #repro
df <- as_tibble(replicate(expr=rnorm(100),n=5)) #V1,V2,V3,V4,V5
pca <- FactoMineR::PCA(df[-1], graph = FALSE) #use V2:V5 in PCA
score <- as_tibble(factoextra::get_pca_ind(pca)$coord) #extract individual scores
mod <- cbind(df[1], score[1:2]) #use V1 as DV, use Dim.1:Dim.2 as IV
glm(V1~Dim.1+Dim.2, data = mod) #simple linear model
  • asked a question related to Databases
Question
10 answers
We are interested in developing method for predicting siRNA, thus we need a large set of siRNA for developing models. I will highly appreciate if you please suggest best database or databases on siRNA. This will help us in creating large dataset that may cover all experimentally characterize siRNA. Please also suggest best (latest) prediction method on siRNA. Do you think their is possibility for developing better prediction method or this field is already saturated. 
Relevant answer
Answer
Hi I wonder whether you have found the updated siRNA database?
  • asked a question related to Databases
Question
8 answers
Can you help me with some database for neuroscience, for example fMRI database, or database which show underlying mechanisms of the brain, show the connection between brain and behavior, psychiatry database and other things which related to brain, if you were familiar with genetics we have for example Reactome, KEGG, STRING and other database which show lots of pathway and cell connection, I wonder if we have sth like that in neuroscience, a big database which help us to better understand the brain.
Relevant answer
Answer
Actually you can use those databases that you mentioned to conduct research and understand mechanisms of brain activity in health and disease in computational neuroscience!
We recently used KEGG and STRING to study gene networks in different psychiatric diseases.
However, if you are mainly interested in using fMRI and imaging databases in your research, please check the following as well:
1. Blue Brain project by EPFL has a great database in at circuit level that can be used:
2. Human Connectome Project (HCP) is a great database comprised of scans and analyses of more than 1100 subjects:
3. Allen Brain Atlas
and many other databases that you may find and use depending on your main focus in neuroscience.
  • asked a question related to Databases
Question
5 answers
Hi everybody!
About 6 months ago I started a data collection thought online questionnaires sent by email. I got almost 2000 people at baseline, with 60% who agreed to be contacted again for the follow-up (online questionnaire, about 10 minutes) in order to investigate associations over time.
In your mind, considering the size of the sample, the online recruitment, what should be the response rate needed I should concretely wish to get to have a "strong" dataset for my analysis?
Under which response rate I should quit the idea to use the longitudinal information?
Is there any?
I would love to hear your idea and your experience with this matter.
Thanks in advance for sharing!
Relevant answer
Answer
Congratulations on your high response rate. I guess the follow up will be via email as well and online FU's or online recruitment is not somethin I am familiar with but I guess the main thing is to send out the FU reuqest early enough for the participant to get back to you as well. Also - we get so many emails from different companies that unless it is somethin we are expecting or someting that were familiar with we may just skip past the email without realising. So - you could send out an initial email - reminding the participants of the upcoming questionnaire and then send out another one with the link attached. In both communications, make sure you make it clear in the email subject / header that this email is regarding study / project that they agreed to participate in to jog their memory and to make it stand out from all their other emails. Hope this helps
  • asked a question related to Databases
Question
25 answers
I have XRD spectra of a biomass samples, I need to find out the possible mineral phases present. How do I proceed?
Relevant answer
Answer
Tarik Kabak ... here's a link for free download
Good Luck
  • asked a question related to Databases
Question
5 answers
We are looking to implement a web-based lab notebook as well as a tracking system to upload various assay results for several analogs of a parent compound. We will need to keep very close track of lot numbers, dates received, chemists who synthesized them, ect. Does anyone use a service which would be helpful?
Relevant answer
Answer
Hey
SCINOTE
  • Intuitive and easy to use
  • Inventory management and MS Office integration
  • Automatically generates reports & manuscript drafts
  • Exports all data in a readable format and API
  • Free account option
  • asked a question related to Databases
Question
11 answers
Hello, I'm trying to find the specific gene expression in various types of cancers and cancer cell lines, Is there any database in this regard?
Relevant answer
Answer
Hi
I recommend the Human Protein Altas (HPA) database which is visualized on its website.
If you search one specifc gene into HPA, the website could show the RNA expression level of the gene in various cancer cell line And different types of cancer. Moreover, the website (the Pathology category) also showed the Protein expression level (IHC) In different types of cancer (Human Cancer tissues). In addition, the website also show the cellular location of the gene you input.
With regards
  • asked a question related to Databases
Question
6 answers
Hello all,
I'm trying to test out a predictive model but I'm having a very hard time trying to find hourly precipitation data. I've looked at NOAA and on the new data repository (NCDC) but I can't find hourly or 15 minute interval data past 2014. Am I missing something here? Is there an alternative source I don't know about? If it helps typically this is the weather station I have pulled from in the past: USW00014819.
Any help is appreciated. Thanks.
Relevant answer
Answer
Success! I called the customer service line and the nice woman knew exactly what I was talking about. For whatever reason the UI they are using does not have past 2014. But on the back end where you request the data it has all the most recent data. See it here: https://www.ncdc.noaa.gov/cdo-web/datatools/lcd
  • asked a question related to Databases
Question
5 answers
Can anyone recommend a database that contains raw multispectral images with the different bands and in the same database the NDVI and NDWI index to compare the results obtained? Also, I am looking to see if I use my own multispectral images how I can compare between the vegetation and water of real plants and the NDVI and NDWI indices.
Relevant answer
Answer
these indices have a range of values between -1 and +1 so if your values do not fall within these ranges, then there's a problem. The value also depends on the vegetation situation of your land-cover types. Healthy vegetation, for instance, have very high positive NDVI values and vice versa. A good understanding of the vegetation dynamics in your study area is also necessary to know the correctness of the your NDVI or NDMI values.
  • asked a question related to Databases
Question
3 answers
I am looking for databases that contain microRNA-drug interactions. Any suggestions or recommendations?
Relevant answer
Answer
Dear Ali Akbar Jamali ,
Look the link, maybe useful.
Regards,
Shafagat
  • asked a question related to Databases
Question
5 answers
I've spent a lot of time but still could not find a quality yet public/free data for causal inference (binary treatment; e.g., A=0 or 1, non-DCD vs DCD donations) with survival data. I know it's quite specific requirement, but need one for my master degree thesis.. Of course, one might recommend the one from 'survival' package in R.
But I really want to find good, real (if possible, not too old) data.
For example, the one from this journal looks perfect (but cannot access) .
Can I get some advice or recommendation for such data?
Any comment is appreciated !
Relevant answer
Answer
Based on chapter 2 and 9 of the book called "Applied linear statistical Model" (John Netter), the regression analysis with observed data couldn't give information about cause-and-effect problems. However, if a regression analysis is performed with experimental data (experimental study), then the regression can give information about cause-and-effect relations. In such cases, the effect of latent variables on the response variable reaches its minimum due to randomization in the experimental study and the causality effect can be examined.
The survival analysis is the special kind of regression analysis and if the study is experimental, the causality inference could be analyzed.
  • asked a question related to Databases
Question
4 answers
I am processing a 16s RNA next gen sequencing data set and trying to compare between my samples the effects on organisms involved in the nitrogen cycle. I am just wondering if there is some sort of database or even a good paper that goes over all of the known organisms in the nitrogen cycle. The more detail the better but i would settle for just a simple list
Relevant answer
Answer
Thanks Shan Thomas i will check it out, looks like a more convenient solution than trolling through papers
  • asked a question related to Databases
Question
20 answers
I'm interested in automated algae identification using neural networks. I need compose substantial micro photography dataset of algae generas (most significant of Cyanobacteria, Chlorophyta and Bacillariophyta).
Thank you in advance!
Relevant answer
Answer
Automated alge identification using neural networks is a great idea. I played with it few year ago as well :-).
Besides sources mentioned in previous answers https://atlasofcyanobacteria.com/index.php coud be another one.
In general, it is necessary to have reliable and double-verified identifications of micrographs used for machine learning.
I would be a bit cautious with mixing natural material images and photos of cultures, which may look quite different.
Good luck!
  • asked a question related to Databases
Question
3 answers
I have been working on a project to collate species occurrence data inherent from unpublished student theses in an integrated database (currently published in GBIF) and still working on a systematic protocol of data validation. Expert review is really subjective and I got many findings that said "expert" estimation were not always more consistent than amateurs, student, or even public enthusiasts (feel free to message me for the papers I collected regarding this), thus my team was still struggling to find a way. Our current method is just independently evaluate the scientific names through taxonomic checklists and the geographic distribution were validated through available published literature mentioning the geographic distribution of each species. We occasionally ask experts but as we are working on many understudied taxa and geographical area, there was not many around.
Relevant answer
Answer
I suppose it all depends on your study species. For the most part, I think experts in most fields are able to identify the species they're most knowledgeable about with relatively high accuracy, given they have enough information in the photo and geographic location to do so.
It's usually when someone gets a bit overzealous and identifies something to the species level when given minimal information and just going off of an educated guess for species most likely to be in the area.
It also depends on what the question for your study is. If you're doing an SDM for a species, you could always thin the records to about 100 and then self-verify (if you're confident in your abilities to do so). You could see if the species occurrence data has any corresponding NCBI molecular data and use DNA to verify species.
If you're using a dataset of 1000 + (or some other number where it isn't feasible to self verify each account) from inaturalist, you could query the data with >3 verified ID agreements with no "maverick" or disputed IDs. The likelihood of obtaining false positives should decrease with user agreement on a species identification.
  • asked a question related to Databases
Question
3 answers
I am looking for company-level data on R&D expenses, because I would map it to M&A data in order to assess the impact of (cross-border) M&A on R&D intensity. Thank you.
Relevant answer
Answer
C K Gomathy & Stephanie Tonn Goulart Moura - thank you very much for coming back to me
  • asked a question related to Databases
Question
7 answers
I am looking for free online database of Brain Hemorrhage CT images for my research work. 
Relevant answer
Answer
Please find below a link a dataset of intracranial hemorrhage segmentation.
I know my reply is too late, but just in case, other research are looking for similar datasets.
  • asked a question related to Databases
Question
6 answers
Do we have a open-source standardized database of TB microscopic sputum smear images?
Relevant answer
Answer
TB PCR it is Gene Xpert is the latest WHO recommended diagnostic.It is 90% sensitivity test for pulmonary tuberculosis.
  • asked a question related to Databases
Question
7 answers
I Want to download complete KEGG bacterial data please guide me how i can do that?
or if any one have this data kindly send me. its not freely available for academic user too.
Relevant answer
Answer
Sorry I did not read completely the question before. I read again, and it now seems to be more stupid then it appeared before.
  • asked a question related to Databases
Question
3 answers
Dear Community,
For my Master thesis I need the data on stock prices of all firms listed on the NYSE, NASDAQ and AMEX. The database I use is CRSP. For my analysis, I need the data in time-series format rather than panel-data format. In other words, I require the data to be such that every column corresponds with 1 firm rather than all firms in 1 column beneath each other. Is it possible to obtain the data in this format using CRSP? It would help me a lot! Of course, if there is another database which could help me, I am open for suggestions!
Yours truly,
Niek van der Schaaf
Relevant answer
Answer
If you can use Excel, the transformation you want is a very straightforward application of Pivot Table. You basically select all your data, Ctrl+a on your spreadsheet, then Insert >> Pivot Table. You can then choose Date as rows, your company ID as columns and the variable of interests in values. (attached the illustration from your own file)
To some extent, I believe Open Office does it too.
This creates heavy datafile, you can copy-paste in value your Pivot Table to work with later (Paste Special >> Values).
Check any tutorial on Pivot Table, it's a basic and useful Excel function.
Then if you need more professional work, it is a better practice to code your analysis so that all steps of data manipulation are recorded: you can check online how to pivot data with R, Python, or other tools (I guess Stata and SPSS must do that quite directly too).
I hope this is helpful,
Good Luck with your work,
Mathieu
  • asked a question related to Databases
Question
1 answer
I'm trying to find a dataset in the form of Excel - with as much data as possible - that include SAT scores of individuals and whether they play a musical instrument or not. Where could I find it?
Or maybe something similar like IQ and musical instruments.
  • asked a question related to Databases
Question
8 answers
Deep eye, a novel automatic data visualization system leverages ML techniques as black-boxes and expert specified rules with promising results.
What is your thought on automatic data visualization?
Relevant answer
Answer
Anything you can to us Dennis Mazur ?
  • asked a question related to Databases
Question
2 answers
Dear community,
I am currently investigating the effect of cultural differences on the short-term firm performance regarding cross-border M&As (cumulative abnormal returns). I retrieved M&A data from the SDC Platinum database. But in order to estimate the CARs, I need to estimate the normal returns in the estimation window (-200,-30). In order to do this, I need stock data for every individual ACQuiring and TARget firm which consist of approximately 2500 firms.
Could databases like Datastream, WRDS, Compustat or maybe even SDC Platinum help me in overcoming this problem? In order words, do I really need to gather this data manually by plugging in 2500 firms into Datastream and change the dates by hand?
I hope someone could help me.
Kind regards,
Bas
Relevant answer
Answer
Yes I think there is no short cut.
  • asked a question related to Databases
Question
9 answers
Which do you like to go for your data visualization, Tableau, Qlik Sense or Power BI ? and Why?
Relevant answer
Answer
Thank you for sharing your insight on Power BI. It seems that adopting any of the visualization require to examine each futures they have before embarking in one. Thanks again for your time!!
  • asked a question related to Databases
Question
3 answers
I would very much appreciate any info on any available alternatives for the KLD database for the purpose of robustness analysis? The sample is U.S publicly traded firms so any alternative that covers companies that only operate in other contexts is excluded. I am mainly looking for an alternative for the KLD that covers companies listed in the US.
Thanks
Relevant answer
I think the following papers combined would help you in this regard:
Analysis of emotionally salient aspects of fundamental frequency for emotion detection
C Busso, S Lee, S Narayanan - IEEE transactions on audio …, 2009 - ieeexplore.ieee.org
Does product market competition foster corporate social responsibility? Evidence from trade liberalization
C Flammer - Strategic Management Journal, 2015 - Wiley Online Library
Corporate social responsibility and financial performance: the “virtuous circle” revisited
E Nelling, E Webb - Review of Quantitative Finance and Accounting, 2009 - Springer
  • asked a question related to Databases
Question
16 answers
These categories are not mentioned at the "special groups" of FAOSTAT.
Relevant answer
Answer
I need these data too
  • asked a question related to Databases
Question
3 answers
How to describe the process of analyzing statistical data carried out with the help of Business Intelligence in Big Data database systems?
How to describe the main models of statistical data analysis carried out with the help of computerized Business Intelligence tools used for the analysis of large data sets processed in the cloud and analyzed multi-criteria in Big Data database systems?
Please reply
Dear Colleagues and Friends from RG
Some of the currently developing aspects and determinants of the applications of data processing technologies in Big Data database systems are described in the following publications:
The issues of the use of information contained in Big Data database systems for the purposes of conducting Business Intelligence analyzes are described in the publications:
I invite you to discussion and cooperation.
Best wishes
Relevant answer
Answer
Dear Friends and Colleagues of RG
How to describe the process of analyzing statistical data carried out with the help of Business Intelligence in Big Data database systems?
How to describe the main models of statistical data analysis carried out with the help of computerized Business Intelligence tools used for the analysis of large data sets processed in the cloud and analyzed multi-criteria in Big Data database systems?
Please reply
  • asked a question related to Databases
Question
11 answers
Apparently, in some countries, they are founded, usually somewhere underground, in specially created bunkers capable of surviving climatic disasters and other banks of large collections of information on the achievements of human civilization gathered on digital data carriers.
These are properly secured Big Data database systems, data warehouses, underground information banks, digitally recorded.
The underground bunkers themselves can survive various climatic and other calamities for perhaps hundreds or thousands of years.
But how long will the large collections of information survive in these Big Data systems and data warehouses stored on digital media?
Perhaps a better solution would be to write this data analogically on specially created discs?
Already in the 1970s, a certain amount of data concerning the achievements of human civilization was placed on the Pioneer 10 probe sent to space that recently left the solar system and will be nearest 10,000 year flying with the information about human civilization to the Alpha Centauri constellation.
At that time, the amount of data sent to the Universe regarding the achievements of human civilization was recorded on gold discs.
Is there a better form of data storage at the moment when this data should last thousands of years?
Please reply
Best wishes
Relevant answer
Theoretically thousands years unless unexpected disasters occur...
  • asked a question related to Databases
Question
8 answers
Dear colleagues,
The Dolos list team is extracting domain names from all predatory journals and publishers added to the list. The resulting database will be made available to institutional and private email box providers so that the email advertisements sent by parasitic publishers can be automatically classified as spam. This is not yet perfectly the case. This service will be regularly updated and will be completely free for institutions and email boxes providers.
Best regards,
Alexandre.
Relevant answer
Answer
Fantastic initiative, many thanks! My morning routine, along with coffee, includes (since years) the manual deletion of predatory emails from my inbox. It will be wonderful to finally end this. This will save innumerable cumulative hours to each one of us scientists.
  • asked a question related to Databases
Question
6 answers
The National Oceanic and Atmospheric Administration (NOAA) website which provides massive data for Co2 parameters (pH, DIC, TA, fCo2, pCo2) has been shutdown, i hope i can receive suggestions about other platforms and database from other renowned scholars.
Relevant answer
Answer
NOAA website is up again, you can check in over there.
  • asked a question related to Databases
Question
4 answers
For instance: FDI in Agriculture in Zambia? Or across SSA as a whole?
UNCTAD datasets look at Region (as well as Net-Cross Border M&A by sector) but do not provide at an individualised country level (nor combine this data.
ITC, OECD databases only really have developed country coverage. And there appear to be two data-sets compiled by Alfaro et al. (2003) and Aykut & Sayek (2007) which seem to be quite out of date.
Any suggestions?
Relevant answer
Answer
Data for FDI by sectors are usually done by the host countries. Each host country has an organization established by the government to control and forster FDI in its economy . for instance ,we have “The Ghana Investment Promotion Centre” which provides FDI allocated to each sector of the country every year in Ghana . I think for regions of LDC’s you should get that data on United Nations Conference on Trade and Development .
  • asked a question related to Databases
Question
4 answers
In which areas of enterprise operation, in what types of economic activities, in what types of services advanced processing of large information collections is employed an analyst at the Big Data Scientist position?
In which branches of the economy is an analyst employed in the position of Big Data Scientist?
Apparently, this is one of the future occupations, which will grow in demand, because there will be a growing number of companies interested in building their own Big Data database systems or using advanced processing of large data sets as part of the analysis of conducting economic analysis and Business Intelligence.
In what areas of business, economy, operation of companies and corporations will there be a growing need in the future to hire an analyst as Big Data Scientist?
Please reply
Best wishes
Relevant answer
Answer
Dear Dariusz Prokopowicz ,
I have some suggestions on the Big data economy and enterprises.
1. Market analyzing - Market changes everyday and one from the company can't do it better if he know the daily statistics (Doing postmortem better).
2. Better vision of economy - Economy growth is a related one if manufacturing setup in a boom then it blooms other related economy and goes like further and further and finally it will take entire circulation.And goes to what it start and again it will have to do some other circulation if something unexpected happen this routine changes(Macro economy)
3.And now the time Big data is the bloom in technology so where it is there is the economical bloom(I mean the application)
4.Yes, businesses analysis is also another relevant area where the market and how is the market.
5. Big data Statistics - is another area
  • asked a question related to Databases
Question
5 answers
Dear colleagues!
Could you be so kind to suggest any Neisseria gonorrhoeae comparative genomics database like human 1000 genomes (https://www.ncbi.nlm.nih.gov/variation/tools/1000genomes/ ) or variation view (https://www.ncbi.nlm.nih.gov/variation/view/)?
Relevant answer
Answer
Hi Boris,
I dont know what you want to do exactly, but you can compare on a gene by gene basis on the following batsbase - it has many gonococci assembled.
Add your and compare!
Best Wishes,
Robert.
  • asked a question related to Databases
Question
7 answers
Dear All
I am beginning a project to understand the impact on climate change on fish diversity and implications for coral reefs at the Western Indian Ocean (WIO) region. I am targeting top predators of commercial importance (groupers) and I would like to work on groupers traits. Can any one suggest any comprehensive database?
Best
Relevant answer
Answer
An alternative source of information is "http://www.fishtraits.info/".
  • asked a question related to Databases
Question
2 answers
I have tens of thousands of individual scans in proprietary file formats, and I want to make these public. I need a format that is free and open, or to make my own.
Our proprietary software offers a CSV option, but doesn't export all useful data to the file. In addition, the CSV file it creates is more like two spreadsheets, with the second half having the per-channel photon counts.
I've considered using XML because it is both machine and human readable. My only concern is that XML is bloated. XML has the added benefit of being readable over a web browser, and can be quickly converted to almost any language, including JSON.
Microsoft INI format is also machine and human readable, but INI is fairly phased out. Software writers still have full access to INI functions though, so I wonder if this is still a viable format. INI also converts well to object notation.
Both INI and XML could better represent two spreadsheets worth of different-typed content in a single file than a CSV.
What are your thoughts?
Relevant answer
Answer
We currently use only ARTAX Spectra, which I believe is Bruker software. The output format is PDZ23, or a CSV file with very little data about the actual device listed in the output. The output CSV files are two-block files, with the top block being basic two-column header info. The lower block is per-channel total photon counts.
I'd like a better format, with all pertinent header data listed, and in a machine and human readable form. I really expected there to be a standard format, like we have with state and federal water well and climate data, but I don't see anything like that on the internet.
  • asked a question related to Databases
Question
7 answers
When we perform a keyword search using databases like PubMed, will the order of the keywords/MESH terms have the impact on the total hits?
Thanks in advance.
Relevant answer
Answer
Dear Sai Krishna Gudi I may think in medicine related fields the importance emphasized on the most dangerous factor. That is, diabetes is more dangerous than elderly and would lead to be elderly in alarming time. That is what I though according to your answer.
However, as a rule of thumb, Diabetes still more general and comprehensible than elderly and Pharmacist . In this case, my hypothesis holds true.
  • asked a question related to Databases
Question
4 answers
I am working with FactSage databases and I need to include some data from tdb files. Someone knows about a program to covert tdb files to FactSage format?
Relevant answer
Answer
Hello!
I am working at GTT and am happy to announce that a successor to CSFAP is available, ConvTDB. It enables FactSage to read TDB-files and save them as Fact-Databases. It will be available with FactSage 7.3. Please contact me for further questions.
  • asked a question related to Databases
Question
4 answers
With government shutdown the NIST database seems to be not accessible anymore and we have a deadline soon. So, is there an alternative?
  • asked a question related to Databases
Question
16 answers
In order to analyze the sentiment on downloaded data from social media portals (such as Facebook, Tweeter, LinkedIn, but also YouTube, Instagram ...) and aggregated in Big Data database systems, it is necessary to use specialized software for extraction and analysis of these data.
The quality of the data transferred to, for example, Excel sheets depends on the quality of the extraction process carried out with the help of specialized software.
Then, the quality of data analysis software in Excel sheets or in systems of computerized analytical platforms depends on the result obtained, the answer to the question given to the collected, initially unstructured data in the Big Data database system.
In the future, artificial intelligence may be used for this purpose, and the whole process of purposeful analysis of collected data will proceed in a much more effective, automated manner, less probable errors, will be a cheaper research process and will be carried out much faster even on much larger information collection than current.
In view of the above, the current question is: What will the directions of development of analytical processes carried out sentiment analyzes on data collected in Big Data database systems in the future?
Please, answer, comments. I invite you to the discussion.
Relevant answer
  • asked a question related to Databases
Question
6 answers
Generally, the stock price indices are classified into two different categories namely global indices and national indices, the national indices, which are more commonly quoted, represent the performance of the stock market of given country such as Brazil's BOVESPA, India's NSE or BSE, China's Hang Seng or the Shanghai SE Composite Index...etc, these type of indices are provided by the local financial authorities. The global equity indices, on the other hand, are calculated and provided by world agencies such as Thomson DataStream, Standard and Poor, Morgan Stanley...etc. In addition, these agencies also offer stock price indices at country level, the methodology used to calculate the stock price index may differ from agency to another, which may affect the return and volatility. For instance, the datastream market indices offer stock prices indices for 53 countries all over the world, each index covers at least 75-80% of market cap of the publicly listed companies in the country. The Standard and Poor agency has its own indices known as the Broad Market Index (BMI) and covers most of the developed and emerging countries, the method used by Standard and Poor to calculate the stock price index is called the adjusted float or free float methodology, which according to "Investopedia" is the best measurement of stock price movements. The same methodology is used by other agencies such as Morgan Stanley (MSCI), Financial Times and Stock Exchange (FTSE). The investors frequently use these indices as benchmarks for their equity portfolios.
As previously mentioned, these global agencies maintain a record of stock price index for many countries, even sometimes for period longer than the periods covered by national indices.
So, are these global indices suitable for academic researches and papers? Is it used in academic researches, especially in researches concerned with stock prices volatility? or it can only be used as benchmarks for the investors' portfolios?
Relevant answer
Answer
Dear Aboubakr
The choice of national stock indices versus global stock indices depends on the portfolio performance you are measuring. That is, if your portfolio is comprised of stocks from around the world then the better fitting benchmark is a global stock index. If instead the investor's portfolio is limited to 1 country then the national stock index is more appropriate to compare. Note, the global stock index return and risk will be made up of both investment risk as well as currency risk, especially the home currency of the investor's (owner) portfolio.
  • asked a question related to Databases
Question
26 answers
We are looking for a large database (200+) of pictures of human faces with neutral facial expression, in order to conduct an experiment on nonverbal learning mechanisms. We have difficulties finding appropriate pictures because we need people in the pictures to be  Caucasian, age 20 to 40, with neutral facial expression, and on neutral background. Also, it would be very good if the database is free for research purposes use. Can somebody please suggest the existing database that he or she knows?
Thank you in advance, 
Jovana
Relevant answer
Answer
Hi! For anyone else seeking such a database, I have combed through all of the resources listed here in addition to numerous other sources in order to construct the Face Image Meta-Database (fIMDb): https://cliffordworkman.com/resources/
The fIMDb includes info or estimates on: number of photo sets per source (and numbers of neutral and other sets — e.g., facial emotions), number of subjects per source (with approximate sex distributions), total number of images, approximate number of viewpoints, whether the sources includes photos from more than one ethnicity, whether it includes more than one age group, whether meta-data are available, the photo category (e.g., posed, wild), the reference(s) for the source (e.g. DOIs). I hope this will aid others interested in conducting research on responses to faces.
  • asked a question related to Databases
Question
14 answers
Advanced technologies of digitalization and automation of data processing first find their application in business. Then also in public institutions can be introduced including in the field of e-governance. This also applies to Big Data database technologies, which is applicable in various sectors of the economy, but due to the high investment costs of implementing this technology in the business processes of business entities, so far only large corporations and larger enterprises can afford such technologies. However, in the future, investment costs of implementing tech technologies into business processes should decrease and processing technologies and data collection in Big Data database systems should be available also for smaller companies, including business entities of the SME sector.
I invite you to the discussion.
Relevant answer
Answer
The Big Data database technology is finding more and more applications in business.
Multi-criteria processing of huge data sets collected in Big Data database systems allows preparing reports in a relatively short time according to given criteria.
The report development time depends mainly on the computing power of Big Data servers.
In processes of complex economic and financial analyzes, risk management, etc. for the purpose of determining the economic and financial situation of business entities, they are increasingly carried out in computerized analytical platforms of the Business Intelligence type.
Perhaps in the future, artificial intelligence will also be involved in this field of analytics.
In some countries, IT companies have been operating for several years, developing the Big Data database technology for commercial and business purposes.
It is only a matter of time to combine these various analytical and database technologies in computing cloud computing.
In view of the above, the current question is: In which sectors of the economy, in which types of companies and corporations will be the most dynamically developed technologies for analyzing large collections of information in Big Data database systems
Please, answer, comments. I invite you to the discussion.
  • asked a question related to Databases
Question
8 answers
Hi,
I am looking for free speech databases for speaker recognition (at least more than 50 speakers) Do you have any suggestions?
Relevant answer
Answer
Most speaker verification databases like NIST ones are paid. But, there are a couple of freely available databases like Voxceleb (http://www.robots.ox.ac.uk/~vgg/data/voxceleb/) and SITW(http://www.speech.sri.com/projects/sitw/).
Hope that helps!
  • asked a question related to Databases
Question
2 answers
There are methods for cleaning or preprocessing text in python by using sample string . Is there any method to apply preprocessing(cleaning) of text stored in database of tweets . Cleaning of text is necessary for sentiment analysis of tweets stored in database . 
Relevant answer
Answer
Hello
you can check following link, might help.
  • asked a question related to Databases
Question
3 answers
Seeking co-authors/collaborators to participate with me in database studies involving NSQIP, NIS, SEER, NAMC, etc. This is a good opportunity for post-docs and research fellows who can work remotely. Must have a basic understanding of biostats, surgical/medical outcomes, and the afore-mentioned databases. Send me a message and let's see if you are good fit.
Relevant answer
Answer
Hi Inwould like to. I am orthopedic Surgeon and would like to be part of surgical outcome studies. How can i be part of it
  • asked a question related to Databases
Question
8 answers
For several years, there are commercially-operating companies that collect data collected, for example, from social media portals.
This data contains information collected from posts, entries, comments, recordings, etc. posted by millions of users of social media portals.
Data is collected and processed in Big Data database systems. Sentiment analysis carried out on these data allows you to generate reports that are used in business, for example in marketing.
From these reports, the clients of the above-mentioned technology companies learn, for example, about how the recognition of their brand changes over time, what opinions about the products and services offered, etc., dominate.
But if the Big Data database resources analyzed in this way are mainly information collected from social media portals, do the generated reports have the advantages of objectivity?
Considering the current resources of the Internet, are the majority of comments on products, services, companies, institutions, etc. being entered on various websites at the moment? Are comments posted on social media portals?
Relevant answer
Answer
My view, public social media comments/posts/self absorbed photos etc.. does point a researcher in a very general direction. The question, is understanding the trend, demographic, and potential to take action. All reports demand rigorous context.
  • asked a question related to Databases
Question
1 answer
I found multiple publications referencing this dataset but have been unable to find a link to request access to the data.
Relevant answer
Answer
The study is registered in ClinicalTrials.gov
You may try to get in touch with the contacts reported
Contact: James Beck, PhD 1-800-473-4636 jbeck@parkinson.orgContact: Fernando Cubillos, MD 1-800-473-4636 fcubillos@parkinson.org