Science topics: Computer Science and EngineeringDatabases
Science topic
Databases - Science topic
A database is an organized collection of data. The data is typically organized to model relevant aspects of reality in a way that supports processes requiring this information.
Questions related to Databases
How are the AMRFinderPlus and CARD different from each other for predication of AMR genes from bacterial genomic sequences?
How much overlap do AMRFinderPlus and CARD database have?
I have some difficulties in finding a good number of DICOM files for SSM...
This is advised nowadays to submit a dataset to a publicly available repository (eg. Mendeley) before publishing a paper done on these data. Can I reuse such a repository dataset to publish my second paper? Can anybody else use my dataset and publish his/her paper basing on my dataset without my acceptance?
I am looking for the volume of public fund to research topics over time in each country. Is there a reliable database that indexes the public funding allocation into research theme or topics (Particularly US)?
For example, the volume of public funding for "electric battery related research" over the past 30 years in the US.
I'm looking for a food picture database to use in designing a behavioral task. I would be interested in controlling the degree of knowledge and nutritional value of the food. Thank you in advance.
Suggestions of online databases/tools I can use to verify candidate genes
What are the best database for blood cell images for research ?
We are conducting a Systematic Literature Review and we would like to know how to merge the different results in a unique database as to easily recognise duplicates. Merging excel files seems not to be an immediate procedure.
I know of several for the gas phase (e.g. HITRAN, GEISA, PNNL) but not the condensed phase. Most seem to be proprietary databases for matching spectra, but don't allow determining absorption as a function of density or path length.
Hi guys,
I am looking for suggestions/recommendations from the research community regarding public databases that are most commonly used by researchers in their analysis.
Just like GEO, GTex, TCGA, Gnomad, TopMed etc, even databases from other countries besides US.
#genomics #publicdata #genomicdatabases #databases #datamining #TCGA #HCA #GTEX #GEO #ARRAYEXPRESS
How to obtain currently necessary information from Big Data database systems for the needs of specific scientific research and necessary to carry out economic, business and other analyzes?
Of course, the right data is important for scientific research. However, in the present era of digitalization of various categories of information and creating various libraries, databases, constantly expanding large data sets stored in database systems, data warehouses and Big Data database systems, it is important to develop techniques and tools for filtering large data sets in those databases data to filter out of terabytes of data only information that is currently needed for the purpose of conducted scientific research in a given field of knowledge, for the purposes of obtaining answers to a given research question and for business needs, eg after connecting these databases to Business Intelligence analytical platforms. I described these issues in my scientific publications presented below.
Do you agree with my opinion on this matter?
In view of the above, I am asking you the following question:
How to obtain currently necessary information from Big Data database systems for the needs of specific scientific research and necessary to carry out economic, business and other analyzes?
Please reply
I invite you to the discussion
Thank you very much
Dear Colleagues and Friends from RG
The issues of the use of information contained in Big Data database systems for the purposes of conducting Business Intelligence analyzes are described in the publications:
I invite you to discussion and cooperation.
Best wishes

Below are some issues related to Big Data database technologies that can be developed scientifically:
- Application of data processing technology in Big Data database systems for modern education 4.0,
- Improvement of forecasting of natural, climatic, economic, economic, financial, social etc. phenomena based on analyzing large data sets,
- Analysis of sentiment, opinions of citizens, Internet users regarding brand recognition of companies, customer reviews of specific services and products, views on various topics, citizens' worldview based on the analysis of large collections of information downloaded from various websites, from comments downloaded from social media portals,
- Analysis of information and marketing services of commercially operating companies that carry out specific analyzes of sentiment, citizens' opinions, Internet users regarding brand recognition, customer reviews of specific services and products etc. on behalf of other companies that purchase specific analytical reports,
- Analysis of the possibilities of cooperation, synergy, correlation, conducting interdisciplinary research, connecting Big Data database systems with other information technologies typical for the development of the current fourth technological revolution called Industry 4.0, which include technologies such as: cloud computing, machine learning, Internet of Things, Artificial Intelligence, etc.
In what other areas are the technologies of processing and analysis of information in Big Data database systems used?
Please answer
Best wishes
Dear Colleagues and Friends from RG
The issues of the use of information contained in Big Data database systems for the purposes of conducting Business Intelligence analyzes are described in the publications:
I invite you to discussion and cooperation.
Best wishes

Hi
I'm working in Diagnosis Location of aphasia lesion of Stroke patients and I need to database of MRI images for aphasia Patients.
I have "x'pert highscore plus (3.0.5)" software to plot my samples' XRD patterns. To check for the reference, software cannot find the exact match. So, I need to add Reference databases to the software. How can I do it?
What is the proper method for this particular version?
What kind of scientific research dominate in the field of Big Data database systems?
Please, provide your suggestions for a question, problem or research thesis in the issues: Big Data database systems.
Please reply. I invite you to the discussion
Dear Colleagues and Friends from RG
Some of the currently developing aspects and determinants of the applications of data processing technologies in Big Data database systems are described in the following publications:
I invite you to discussion and cooperation.
Best wishes

What kind of scientific research dominate in the field of Popularization of science on the Internet?
Please, provide your suggestions for a question, problem or research thesis in the issues: Popularization of science on the Internet.
Please reply.
I invite you to the discussion
Thank you very much
Best wishes

In some countries, work in the scientific field is paid mediocre. Young people have little interest in the complex problems of science. The popularization of science is necessary. The pages of the RG publish questions and answers to these scientific questions.
Can the publication of answers to questions in the RG be considered a popularization of science?
Do you know any databases that not only specifies the plant origins of a specific phytochemical, but also demonstrates how much of that substance may be extracted from some specified parts of the plant?
I have also found this awesome website but it doesn't work at all! Beside answering my question, could you please let me know if you could get any results by searching a term in it.
Hello,
I am teaching a database systems class and I wish to guide students on how a distributed databases work.
We are using postgres DBMS for illustration.
What other applications do I need to setup a DDBMS in one windows OS machine.
Best
Derdus
Hello everyone,
I have been having problems with gathering all the information that I need for my study and also, some problems with fixed effects.
1) First of all, I am interested in comparing exports of Panama to certain pair countries (of Central America and some other). I am using panel data in stata (1994-2017) and variables of interest are:
1) Exportations (y)
2) Distance of the capitals
3) GDP (c_origin)
4) GDP (destination_c)
5) Population (c_origin)
6) Population (destination_c)
-->+ some dummy variables
7) common language (0,1)
8) borders between countries (0,1)
9) whether the destination country belongs to Central America (main variable of interest; 0, 1)
Is there any page where I could get most of the data? I used CEPII and could find data until 2015 for distance, language, population and GDP (but only the current one). ***I would need data for 2016 and 2017 and also a help of GDP deflator and exportations of Panama to those specific countries in a period of 1994-2017. How do I convert the GDP measured in current dollars with a help of GDP deflator and are there any pages where I already have those values? *** I first used GDP measured in current dollars and with the data I had, the outcome did not turn out okay.
2) Another problem I encountered when doing a hierarchical regression with my first "faulty" database was that, when including fixed effects of the countries, all of the variance was already explained by countries so there was no variance left for my variables of interest to explain. On the other hand, if I do not include it-the model will be biased. What is more, without any fixed effects, in some of the regression blocks I get counter intuitive correaltions/b coefficient values. Therefore, my questions are (apart from the *** sentence marked above):
a) How to solve this problem so my model is not biased and so my variables still explain significant variance of the y (exportations) variable?
b) Which fixed effects should I add to the model? Country, distance, population... which ones?
This is the first time I am using econometrics and fixed effects so your help would mean a lot!
Thank you in advance!
hi all,
how to tell R that the row names is for instance certain column, when exporting files to r using read.csv file function ?
I need to do an analysis with STRUCTURE using dominant data. If you have an example of this type of database, please contact me.
Most of the publicly available databases give only the basic information like age, gender, mode of infection, etc. regarding the infected patients suffering from CoVID 19. So, can anyone recommend or suggest more specific databases related to image, speech or clinical data of the patients that are meant for open research?
I have been trying to find on a online source for FG-Net aeging database , MOPRH database and YGA database. I can't seem to find none of them available to download . Does anyone know any other ageing database that i can use? Thankyou. I need an ageing database for a my school thesis as I am building a face recognition system and classifying the faces by age . It would be very helpful.
The medical conditions include their heart and respiratory rates, systolic and diastolic pressure, etc. It would be more helpful if the dataset also includes information about usage of medications like NSAIDS and DMARDS, by the patients prior to CoVID 19 infection.
I need a dermoscopic image database to test an algorithm for automatic diagnose. In particular I am interested in images with blue-black colors within the lesion.
Can anyone tell me where to find it?
Biomedical
Medical hyperspectral imaging
I need the research paper in which dataset should also be available with that so that i can start my research.
What in your opinion will the applications of the technology of analyzing big information collections in Big Data database systems be developed in the future?
In which areas of industry, science, research, information services, etc., in your opinion, will the applications of technology for the analysis of large collections of information in Big Data database systems be developed in the future?
Please reply
I invite you to the discussion
I described these issues in my publications below:
I invite you to discussion and cooperation.
Best wishes

Currently, I am going to implement the surveying method in one of my research related to business units. Orbis database (of Company information across the globe | BvD) or similar would be useful for me to make a sample according to certain criteria and obtain contacts. My organization does not provide access to the Orbis Database. Maybe someone has access to this database and could provide me with data from it or recommend free alternatives?
Thank you in advance.
I want to perform database operations in distributed database environment. If somebody have idea relating to it. Please share.
Thanks in advance.
To put you in the context, our work consists in realizing a machine learning model which takes a vector with the properties of a farm, includes the weather why not.Then from a database of crops, make a recommendation of the most suitable crop for the soil. Therefore a recognition on the elements which help in this decision is an important part before starting the collection of the data necessary for the model.
Hello,
I wonder what would be the best database that one can use to store and manage a large amount of data (maybe a few hundred gigabytes), in the main basin level, that includes:
1) GIS data:
- raster
- vector
2) Hydrological data:
- time-series of different variables (e.g., rainfall, temperature, humidity, etc.) for different stations.
From the internet, I found that the below databases could be used:
- PostGIS
- MongoDB
Thank you very much
Dears,
I am looking for online free sources of gridded high spatial (1 x 1 km) and temporal (hourly or tri-hourly) resolution weather/climate data to be used in my research. The spatial domain is Europe (or even the world if possible). Please, could you provide me some suggestion on the best available data sources?
Thank you for the support.
Best,
Giorgio
Dear researchers, is it possible to marge two or more RIS file from Scopus database into one RIS file.
Hi there,
Can anyone recommend a Delphi method online tool that I can use?
Mesydel.com has been recommended but you are not able to download it.
I have been recommended http://armstrong.wharton.upenn.edu/delphi2/ as well although the website is less user friendly than I would like.
Any other suggestions?
Thanks.
L
As a part of various financial research initiative, we need different types of dataset. This question is asked for identifying the online sources of financial data both free and paid version. This information can help every researchers to locate Online Directory or data bank for financial and economic analysis.
Please suggest me a standard images database related Age estimation and prediction.
I have gone through many questions but there is no single Discussion thread giving the link, API, JSON for the dataset.
I am making a start so it will be easily searchable by everyone
<COUNTRY NAME> :: <TYPE - JSON/CSV/... etc...> :: <KIND OF DATA - COUNT/NETWORK INFO>
<URL/s>
The " COVID-19 " pandemic has been an unprecedented situation with rapid research and development taking place. There has been a lot of data that is being generated like the total number of patients infected, Active case, Recovered, Deceased.
Data can be obtained from different sources, Example in India " https://www.mohfw.gov.in/ " is the official website of Govt. of India, " https://www.covid19india.org/ " is a website by a group of dedicated volunteers, then we have " https://www.worldometers.info/coronavirus/ " which is worldometer maintained by Dadax
Is there any data validation model or method to verify the data that has been put out.
There is a possibility of overstating or understating the numbers. There can be a discrepancy between sources.
Has there been a solution or discussion about this in the research community, If so what is it?
CTU-UHB Intrapartum CTG database consists of CTG records and clinical information. The data in database has been extracted from the OB TraceVue system to an open format using software. I need the raw data of this database. The database contains .dat and .hea file. But i cann't find a way to extract the raw data in the database. Can anyone help me?
I want to do sample project with SVM .But i can not retrieve data from ZINC database .
I need sample data from this database for classification.My method for classification is SVM.
Thanks
I want to study the relationship between different epigenetic factors and the different types of cancer using existing records in epigenetic and / or oncological databases, but, as a bioinformatician, I have never worked with epigenetics data, so I do not know they are available in what format, they require what type of preprocessing, nor what tools I can use to analyze them.
I would really appreciate if someone gave me some basic indications of how I should start, or if someone recommended me a paper or tutorial about how to work with epigenetic data in cancer bioinformatics.
Hello,
I am looking for databases of quadruplex structures (both C-based i-Motifs and G-based G-quadruplexes (G4s)).
I have only found the databases:
- G4RNA, http://scottgroup.med.usherbrooke.ca/G4RNA/ (RNA G4s)
- G4Hunter supplementary material database (mostly all DNA G4s)
Does anybody else knows if any other source of quadruplex structures exist?
Thanks in advance.
Hello everyone
I am currently a master student in Nevşehir Hacı Bektaş Veli University, Department of Geography, studying Physical Geography. My main areas of expertise are; İn additıon to geographical analysis, plant geography, data mining, map reduce and hadoop systems, land planning, plant taxonomy, I have been working on social media analytics and social media applications and analysis, social media and geography education in scientific and technical terms. But my main focus is on "Data Mining and Plant Geography modeling". I have a technical research article on this topic in my Research gate research account as full text. Study name "Creatıon of Plant Geography Databases Wıth The Map Reduce Modelıng ın The Clusterıng of the Large Geographıcal Data Sets". This study my Map Reduce and Hadoop systems and algorithms addition to data mining, GIS, plant geography research methods - techniques and various international plant databases, taking advantage of biological databases in Turkey and vegetatıon carried out ın the world, plant geography and so on. I tried to develop a new database model with this latest work that will contribute to the fields. In conclusion, I would especially like to listen and take advantage of the ideas and opinions of my colleagues and teachers working on data mining and geography, plant geography or vegetation, especially among geographers. Thanks to everyone who contributed in advance. Sincereley.
In my investigation of the triangular relationship between international sales, international ownership and the riskiness of a stock I am looking for a database that could provide the % of foreign ownership of shares from DAX and MDAX companies between 2003-2019.
I have a list of reference SNPs IDs (rsid) and I need to retrieve the associated diseases ... what are the suitable bioinformatics tools or databases?
I'd like to publish some ideas about taxonomical database. This is possible to do in Biodiversity Data Journal or PeerJ, but I need to pay for an article. Does anybody know similar journals without article processing charge?
For my research, I am trying to find ecological, geographic, hydrologic, social, economic and political spatial/GIS data that is preferably free and easily available. I am especially interested in layers associated with protected areas, distribution of Adivasi populations, Adivasi owned / managed lands, watersheds, dams/roads/mines/embankments, land ownership, or any other data within these fields. I would greatly appreciate some inputs/recommendations/tips as the government data.gov.in website has been difficult to navigate and almost impossible to find data on and the bhuvan website also doesn't allow data downloads.
Hello, can anyone help me?
When I open cif file with file>open , the software (crystal explorer 17.5) shows error processing CIF:
Error in TEXTFILE:open_new_file_for_write ... error opening new file.
I checked the cif file with CCDC checkCIF and it shows correct, opened Mercury generated cif file from my original cif file and gain the same error. I tried cif files from ccdc database but unsuccessfully. Could anyone give me a hand, please?
I am seeking for the minimal inhibitory concentrations of different antimicrobial agents (such as chloramphenicol, ketoconazole, nitrofurantoin, etc.). Maybe there is a database where it should be possible to find MICs against different bacteria and fungi (S. aureus, E.coli, P. aeruginosa, K.pneumoniae, C.albicans, etc.)?
It could be series of images too but not single image for emotion
I am currently trying to search for data on RLFS (R-loop forming sequences) for the human genome on UCSC Table Browser but I cannot find anything. Does anyone know if these data exist?
Currently I am trying to generate it myself using QmRLFS-finder, but it would be great if I could find other sources.
Thank you in advance,
Ana
I've got a big amount of environmental data as independent variables, so I used a PCA to work better with them. But I have problems in extracting/converting the data from de Principal Components to make them work like variables in different GLMs. I'm working with R software. Can anyone help me?
Thank you,
Ferran
We are interested in developing method for predicting siRNA, thus we need a large set of siRNA for developing models. I will highly appreciate if you please suggest best database or databases on siRNA. This will help us in creating large dataset that may cover all experimentally characterize siRNA. Please also suggest best (latest) prediction method on siRNA. Do you think their is possibility for developing better prediction method or this field is already saturated.
Can you help me with some database for neuroscience, for example fMRI database, or database which show underlying mechanisms of the brain, show the connection between brain and behavior, psychiatry database and other things which related to brain, if you were familiar with genetics we have for example Reactome, KEGG, STRING and other database which show lots of pathway and cell connection, I wonder if we have sth like that in neuroscience, a big database which help us to better understand the brain.
Hi everybody!
About 6 months ago I started a data collection thought online questionnaires sent by email. I got almost 2000 people at baseline, with 60% who agreed to be contacted again for the follow-up (online questionnaire, about 10 minutes) in order to investigate associations over time.
In your mind, considering the size of the sample, the online recruitment, what should be the response rate needed I should concretely wish to get to have a "strong" dataset for my analysis?
Under which response rate I should quit the idea to use the longitudinal information?
Is there any?
I would love to hear your idea and your experience with this matter.
Thanks in advance for sharing!
I have XRD spectra of a biomass samples, I need to find out the possible mineral phases present. How do I proceed?
We are looking to implement a web-based lab notebook as well as a tracking system to upload various assay results for several analogs of a parent compound. We will need to keep very close track of lot numbers, dates received, chemists who synthesized them, ect. Does anyone use a service which would be helpful?
Hello, I'm trying to find the specific gene expression in various types of cancers and cancer cell lines, Is there any database in this regard?
Hello all,
I'm trying to test out a predictive model but I'm having a very hard time trying to find hourly precipitation data. I've looked at NOAA and on the new data repository (NCDC) but I can't find hourly or 15 minute interval data past 2014. Am I missing something here? Is there an alternative source I don't know about? If it helps typically this is the weather station I have pulled from in the past: USW00014819.
Any help is appreciated. Thanks.
Can anyone recommend a database that contains raw multispectral images with the different bands and in the same database the NDVI and NDWI index to compare the results obtained? Also, I am looking to see if I use my own multispectral images how I can compare between the vegetation and water of real plants and the NDVI and NDWI indices.
I am looking for databases that contain microRNA-drug interactions. Any suggestions or recommendations?
I've spent a lot of time but still could not find a quality yet public/free data for causal inference (binary treatment; e.g., A=0 or 1, non-DCD vs DCD donations) with survival data. I know it's quite specific requirement, but need one for my master degree thesis.. Of course, one might recommend the one from 'survival' package in R.
But I really want to find good, real (if possible, not too old) data.
For example, the one from this journal looks perfect (but cannot access) .
Can I get some advice or recommendation for such data?
Any comment is appreciated !
I am processing a 16s RNA next gen sequencing data set and trying to compare between my samples the effects on organisms involved in the nitrogen cycle. I am just wondering if there is some sort of database or even a good paper that goes over all of the known organisms in the nitrogen cycle. The more detail the better but i would settle for just a simple list
I'm interested in automated algae identification using neural networks. I need compose substantial micro photography dataset of algae generas (most significant of Cyanobacteria, Chlorophyta and Bacillariophyta).
Thank you in advance!
I have been working on a project to collate species occurrence data inherent from unpublished student theses in an integrated database (currently published in GBIF) and still working on a systematic protocol of data validation. Expert review is really subjective and I got many findings that said "expert" estimation were not always more consistent than amateurs, student, or even public enthusiasts (feel free to message me for the papers I collected regarding this), thus my team was still struggling to find a way. Our current method is just independently evaluate the scientific names through taxonomic checklists and the geographic distribution were validated through available published literature mentioning the geographic distribution of each species. We occasionally ask experts but as we are working on many understudied taxa and geographical area, there was not many around.
I am looking for company-level data on R&D expenses, because I would map it to M&A data in order to assess the impact of (cross-border) M&A on R&D intensity. Thank you.
I am looking for free online database of Brain Hemorrhage CT images for my research work.
Do we have a open-source standardized database of TB microscopic sputum smear images?
I Want to download complete KEGG bacterial data please guide me how i can do that?
or if any one have this data kindly send me. its not freely available for academic user too.
Dear Community,
For my Master thesis I need the data on stock prices of all firms listed on the NYSE, NASDAQ and AMEX. The database I use is CRSP. For my analysis, I need the data in time-series format rather than panel-data format. In other words, I require the data to be such that every column corresponds with 1 firm rather than all firms in 1 column beneath each other. Is it possible to obtain the data in this format using CRSP? It would help me a lot! Of course, if there is another database which could help me, I am open for suggestions!
Yours truly,
Niek van der Schaaf
I'm trying to find a dataset in the form of Excel - with as much data as possible - that include SAT scores of individuals and whether they play a musical instrument or not. Where could I find it?
Or maybe something similar like IQ and musical instruments.
Deep eye, a novel automatic data visualization system leverages ML techniques as black-boxes and expert specified rules with promising results.
What is your thought on automatic data visualization?
Dear community,
I am currently investigating the effect of cultural differences on the short-term firm performance regarding cross-border M&As (cumulative abnormal returns). I retrieved M&A data from the SDC Platinum database. But in order to estimate the CARs, I need to estimate the normal returns in the estimation window (-200,-30). In order to do this, I need stock data for every individual ACQuiring and TARget firm which consist of approximately 2500 firms.
Could databases like Datastream, WRDS, Compustat or maybe even SDC Platinum help me in overcoming this problem? In order words, do I really need to gather this data manually by plugging in 2500 firms into Datastream and change the dates by hand?
I hope someone could help me.
Kind regards,
Bas
Which do you like to go for your data visualization, Tableau, Qlik Sense or Power BI ? and Why?
I would very much appreciate any info on any available alternatives for the KLD database for the purpose of robustness analysis? The sample is U.S publicly traded firms so any alternative that covers companies that only operate in other contexts is excluded. I am mainly looking for an alternative for the KLD that covers companies listed in the US.
Thanks
These categories are not mentioned at the "special groups" of FAOSTAT.
How to describe the process of analyzing statistical data carried out with the help of Business Intelligence in Big Data database systems?
How to describe the main models of statistical data analysis carried out with the help of computerized Business Intelligence tools used for the analysis of large data sets processed in the cloud and analyzed multi-criteria in Big Data database systems?
Please reply
Dear Colleagues and Friends from RG
Some of the currently developing aspects and determinants of the applications of data processing technologies in Big Data database systems are described in the following publications:
The issues of the use of information contained in Big Data database systems for the purposes of conducting Business Intelligence analyzes are described in the publications:
I invite you to discussion and cooperation.
Best wishes

Apparently, in some countries, they are founded, usually somewhere underground, in specially created bunkers capable of surviving climatic disasters and other banks of large collections of information on the achievements of human civilization gathered on digital data carriers.
These are properly secured Big Data database systems, data warehouses, underground information banks, digitally recorded.
The underground bunkers themselves can survive various climatic and other calamities for perhaps hundreds or thousands of years.
But how long will the large collections of information survive in these Big Data systems and data warehouses stored on digital media?
Perhaps a better solution would be to write this data analogically on specially created discs?
Already in the 1970s, a certain amount of data concerning the achievements of human civilization was placed on the Pioneer 10 probe sent to space that recently left the solar system and will be nearest 10,000 year flying with the information about human civilization to the Alpha Centauri constellation.
At that time, the amount of data sent to the Universe regarding the achievements of human civilization was recorded on gold discs.
Is there a better form of data storage at the moment when this data should last thousands of years?
Please reply
Best wishes

Dear colleagues,
The Dolos list team is extracting domain names from all predatory journals and publishers added to the list. The resulting database will be made available to institutional and private email box providers so that the email advertisements sent by parasitic publishers can be automatically classified as spam. This is not yet perfectly the case. This service will be regularly updated and will be completely free for institutions and email boxes providers.
Best regards,
Alexandre.
The National Oceanic and Atmospheric Administration (NOAA) website which provides massive data for Co2 parameters (pH, DIC, TA, fCo2, pCo2) has been shutdown, i hope i can receive suggestions about other platforms and database from other renowned scholars.
For instance: FDI in Agriculture in Zambia? Or across SSA as a whole?
UNCTAD datasets look at Region (as well as Net-Cross Border M&A by sector) but do not provide at an individualised country level (nor combine this data.
ITC, OECD databases only really have developed country coverage. And there appear to be two data-sets compiled by Alfaro et al. (2003) and Aykut & Sayek (2007) which seem to be quite out of date.
Any suggestions?
In which areas of enterprise operation, in what types of economic activities, in what types of services advanced processing of large information collections is employed an analyst at the Big Data Scientist position?
In which branches of the economy is an analyst employed in the position of Big Data Scientist?
Apparently, this is one of the future occupations, which will grow in demand, because there will be a growing number of companies interested in building their own Big Data database systems or using advanced processing of large data sets as part of the analysis of conducting economic analysis and Business Intelligence.
In what areas of business, economy, operation of companies and corporations will there be a growing need in the future to hire an analyst as Big Data Scientist?
Please reply
Best wishes

Dear colleagues!
Could you be so kind to suggest any Neisseria gonorrhoeae comparative genomics database like human 1000 genomes (https://www.ncbi.nlm.nih.gov/variation/tools/1000genomes/ ) or variation view (https://www.ncbi.nlm.nih.gov/variation/view/)?
Dear All
I am beginning a project to understand the impact on climate change on fish diversity and implications for coral reefs at the Western Indian Ocean (WIO) region. I am targeting top predators of commercial importance (groupers) and I would like to work on groupers traits. Can any one suggest any comprehensive database?
Best
I have tens of thousands of individual scans in proprietary file formats, and I want to make these public. I need a format that is free and open, or to make my own.
Our proprietary software offers a CSV option, but doesn't export all useful data to the file. In addition, the CSV file it creates is more like two spreadsheets, with the second half having the per-channel photon counts.
I've considered using XML because it is both machine and human readable. My only concern is that XML is bloated. XML has the added benefit of being readable over a web browser, and can be quickly converted to almost any language, including JSON.
Microsoft INI format is also machine and human readable, but INI is fairly phased out. Software writers still have full access to INI functions though, so I wonder if this is still a viable format. INI also converts well to object notation.
Both INI and XML could better represent two spreadsheets worth of different-typed content in a single file than a CSV.
What are your thoughts?
When we perform a keyword search using databases like PubMed, will the order of the keywords/MESH terms have the impact on the total hits?
Thanks in advance.
I am working with FactSage databases and I need to include some data from tdb files. Someone knows about a program to covert tdb files to FactSage format?
With government shutdown the NIST database seems to be not accessible anymore and we have a deadline soon. So, is there an alternative?
In order to analyze the sentiment on downloaded data from social media portals (such as Facebook, Tweeter, LinkedIn, but also YouTube, Instagram ...) and aggregated in Big Data database systems, it is necessary to use specialized software for extraction and analysis of these data.
The quality of the data transferred to, for example, Excel sheets depends on the quality of the extraction process carried out with the help of specialized software.
Then, the quality of data analysis software in Excel sheets or in systems of computerized analytical platforms depends on the result obtained, the answer to the question given to the collected, initially unstructured data in the Big Data database system.
In the future, artificial intelligence may be used for this purpose, and the whole process of purposeful analysis of collected data will proceed in a much more effective, automated manner, less probable errors, will be a cheaper research process and will be carried out much faster even on much larger information collection than current.
In view of the above, the current question is: What will the directions of development of analytical processes carried out sentiment analyzes on data collected in Big Data database systems in the future?
Please, answer, comments. I invite you to the discussion.

Generally, the stock price indices are classified into two different categories namely global indices and national indices, the national indices, which are more commonly quoted, represent the performance of the stock market of given country such as Brazil's BOVESPA, India's NSE or BSE, China's Hang Seng or the Shanghai SE Composite Index...etc, these type of indices are provided by the local financial authorities. The global equity indices, on the other hand, are calculated and provided by world agencies such as Thomson DataStream, Standard and Poor, Morgan Stanley...etc. In addition, these agencies also offer stock price indices at country level, the methodology used to calculate the stock price index may differ from agency to another, which may affect the return and volatility. For instance, the datastream market indices offer stock prices indices for 53 countries all over the world, each index covers at least 75-80% of market cap of the publicly listed companies in the country. The Standard and Poor agency has its own indices known as the Broad Market Index (BMI) and covers most of the developed and emerging countries, the method used by Standard and Poor to calculate the stock price index is called the adjusted float or free float methodology, which according to "Investopedia" is the best measurement of stock price movements. The same methodology is used by other agencies such as Morgan Stanley (MSCI), Financial Times and Stock Exchange (FTSE). The investors frequently use these indices as benchmarks for their equity portfolios.
As previously mentioned, these global agencies maintain a record of stock price index for many countries, even sometimes for period longer than the periods covered by national indices.
So, are these global indices suitable for academic researches and papers? Is it used in academic researches, especially in researches concerned with stock prices volatility? or it can only be used as benchmarks for the investors' portfolios?
We are looking for a large database (200+) of pictures of human faces with neutral facial expression, in order to conduct an experiment on nonverbal learning mechanisms. We have difficulties finding appropriate pictures because we need people in the pictures to be Caucasian, age 20 to 40, with neutral facial expression, and on neutral background. Also, it would be very good if the database is free for research purposes use. Can somebody please suggest the existing database that he or she knows?
Thank you in advance,
Jovana
Advanced technologies of digitalization and automation of data processing first find their application in business. Then also in public institutions can be introduced including in the field of e-governance. This also applies to Big Data database technologies, which is applicable in various sectors of the economy, but due to the high investment costs of implementing this technology in the business processes of business entities, so far only large corporations and larger enterprises can afford such technologies. However, in the future, investment costs of implementing tech technologies into business processes should decrease and processing technologies and data collection in Big Data database systems should be available also for smaller companies, including business entities of the SME sector.
I invite you to the discussion.
Hi,
I am looking for free speech databases for speaker recognition (at least more than 50 speakers) Do you have any suggestions?
There are methods for cleaning or preprocessing text in python by using sample string . Is there any method to apply preprocessing(cleaning) of text stored in database of tweets . Cleaning of text is necessary for sentiment analysis of tweets stored in database .
Seeking co-authors/collaborators to participate with me in database studies involving NSQIP, NIS, SEER, NAMC, etc. This is a good opportunity for post-docs and research fellows who can work remotely. Must have a basic understanding of biostats, surgical/medical outcomes, and the afore-mentioned databases. Send me a message and let's see if you are good fit.
For several years, there are commercially-operating companies that collect data collected, for example, from social media portals.
This data contains information collected from posts, entries, comments, recordings, etc. posted by millions of users of social media portals.
Data is collected and processed in Big Data database systems. Sentiment analysis carried out on these data allows you to generate reports that are used in business, for example in marketing.
From these reports, the clients of the above-mentioned technology companies learn, for example, about how the recognition of their brand changes over time, what opinions about the products and services offered, etc., dominate.
But if the Big Data database resources analyzed in this way are mainly information collected from social media portals, do the generated reports have the advantages of objectivity?
Considering the current resources of the Internet, are the majority of comments on products, services, companies, institutions, etc. being entered on various websites at the moment? Are comments posted on social media portals?
I found multiple publications referencing this dataset but have been unable to find a link to request access to the data.