Science topic

Database Analysis - Science topic

Explore the latest questions and answers in Database Analysis, and find Database Analysis experts.
Questions related to Database Analysis
  • asked a question related to Database Analysis
Question
3 answers
Bi0.8Sm0.2FeО3
Space group: P n m a or P b a m
Relevant answer
Answer
Hello.
The CIF file of the orthorhombic structure of BiFeО3 can be found in the ICSD entry 168321 from the CCDC database (https://www.ccdc.cam.ac.uk/structures/Search?Ccdcid=168321&DatabaseToSearch=ICSD).
This CIF file could be edited with Notepad or Notepad++ by duplicating the line with the bismuth atom and changing the label and atom type to samarium in the new line.
The occupancy factors could then be added then when importing the structure into Rietveld refinement software.
Attached are the requested files.
Best regards
  • asked a question related to Database Analysis
Question
4 answers
Hello dear scientists
Ideas for good articles or databases on cystic fibrosis and common or potentially pathogenic variants are most welcome. And what strategies would you suggest for carrier screening for cystic fibrosis?  
Relevant answer
Answer
There are various scientific databases and resources.
1. Cystic Fibrosis Mutation Database (CFTR1): The CFTR1 database is a widely used and authoritative resource that provides detailed information on cystic fibrosis mutations. It includes information on the specific mutations, their exon locations, and associated clinical features. You can access the CFTR1 database at: http://www.genet.sickkids.on.ca/Home.html
2. Human Gene Mutation Database (HGMD): HGMD is a comprehensive database of human gene mutations, including those associated with cystic fibrosis. It provides information on the genetic basis of diseases, including the specific mutations and their exon locations. HGMD requires a subscription for full access, but it may be available through academic institutions or libraries. The HGMD website can be found at: http://www.hgmd.cf.ac.uk/
3. ClinVar: ClinVar is a public database maintained by the National Center for Biotechnology Information (NCBI). It contains information on genetic variants and their clinical significance. While it may not provide exon-specific information for all mutations, it can be a useful resource to search for known cystic fibrosis mutations and associated clinical data. You can access ClinVar at: https://www.ncbi.nlm.nih.gov/clinvar/
The field of genetics and genomics is continually evolving, and new mutations and exon locations associated with cystic fibrosis may be discovered over time. It is advisable to consult up-to-date and authoritative sources for the most current information.
Good luck
credit AI tool usage
  • asked a question related to Database Analysis
Question
1 answer
Can I use the SRA Run Selector on GEO Database to compare candidate genes, if yes, then how? I have tried analyzing by GEO2R but I am not getting the right information I need.
Relevant answer
Answer
I do not think so, SRA Run Selector is more for the selection of database. You might try to compare candidate genes after running several GEOs separately or you might try to run the meta-analyses using several GEOs at the same time. GEO2R only runs for one GEO and give you the DEG list, so you need to get several lists and perform the intersection or other comparison
  • asked a question related to Database Analysis
Question
2 answers
Dear All, I would like to ask, is it possible to obtain data in some databases, websites about sexual behavior in different countries of Europe or the World? Thank You! Best regards Stefan
Relevant answer
Answer
Hi Štefan,
I recommend that you contact the ISSM and the European Federation of Sexology (EFS) for more accurate information and data.
In the rest of the world, you can contact sexology academies and similar organizations.
I hope you obtain the necessary information.
Kind Regards,
  • asked a question related to Database Analysis
Question
1 answer
Tl;dr: I’m trying to convert gene IDs of an obscure MRSA strain from Ensembl Bacteria to KEGG.
Hello,
I’m trying to do a pathway enrichment analysis of MRSA strain 107 using GSEA. I have gene expression data that are associated with the gene IDs from Ensembl Bacteria. I plan to use KEGG as my pathway database.
GSEA requires a .gmt file of the gene IDs/enrichment data (of which the gene IDs are from Ensembl), then requires a pathway file (from KEGG). If I try to do the analysis with both of these files, the gene IDs don’t match up, so GSEA can’t do it.
My question is whether there’s a way to convert these gene IDs specifically with these strains of MRSA from Ensembl Bacteria to a site like KEGG. Here are the resources I’ve already tried:
DAVID
Dbtodb
Syngoportal
G:convert
MetaScape
BioMart from Ensembl
Annotationdbi
All these are tools that work, but they don’t include my strain. How should I convert these Ensembl Bacteria gene IDs? Is there another option I don’t know about?
PS. I don’t need to use KEGG; if a different pathway database works, that would also be acceptable.
Relevant answer
Answer
If you're having an issue finding an exact ID match, you can try this method.
You collect all protein sequences of the strain and use BlastKOALA/GhostKOALA (tool available in the KEGG) to perform Blast. It will provide you with the KEGG's KO IDs. These IDs can also be used for pathway analysis.
Thank you
  • asked a question related to Database Analysis
Question
2 answers
Gene Ontology provides many genes annotated as taking part in various morphogenesis processes but I want to get a list of all morphogen coding genes specifically. Uniprot does not have "morphogen" keyword.
I need some gene annotation database that has genes labeled as "morphogen".
Relevant answer
Answer
Fahmida, thank you for answer. Did you mean some database in particular? Could you direct me to the site?
  • asked a question related to Database Analysis
Question
14 answers
Below are some issues related to Big Data database technologies that can be developed scientifically:
- Application of data processing technology in Big Data database systems for modern education 4.0,
- Improvement of forecasting of natural, climatic, economic, economic, financial, social etc. phenomena based on analyzing large data sets,
- Analysis of sentiment, opinions of citizens, Internet users regarding brand recognition of companies, customer reviews of specific services and products, views on various topics, citizens' worldview based on the analysis of large collections of information downloaded from various websites, from comments downloaded from social media portals,
- Analysis of information and marketing services of commercially operating companies that carry out specific analyzes of sentiment, citizens' opinions, Internet users regarding brand recognition, customer reviews of specific services and products etc. on behalf of other companies that purchase specific analytical reports,
- Analysis of the possibilities of cooperation, synergy, correlation, conducting interdisciplinary research, connecting Big Data database systems with other information technologies typical for the development of the current fourth technological revolution called Industry 4.0, which include technologies such as: cloud computing, machine learning, Internet of Things, Artificial Intelligence, etc.
In what other areas are the technologies of processing and analysis of information in Big Data database systems used?
Please answer
Best wishes
Dear Colleagues and Friends from RG
The issues of the use of information contained in Big Data database systems for the purposes of conducting Business Intelligence analyzes are described in the publications:
I invite you to discussion and cooperation.
Best wishes
Relevant answer
Answer
Dear Srdjan Atanasijevic,
Yes. You pointed to the important conditions for the development of big data analysis technology with the use of Big Data Analytics.
Thank you, Regards,
Dariusz Prokopowicz
  • asked a question related to Database Analysis
Question
7 answers
I guess there must be some data collected regarding Covid and related to the field of psychology/psychiatry, considering its psychological impact. It might be gathered from the patients, family members or the society at large, either a public or private collection. Does anybody have any idea on how to access such data for research purposes?
Relevant answer
Answer
Dear Prof. Farhad Montazeri ,
For example, I saw of Lancet & Nature publications, just register, they will send updated data and research as you agreed to get updated - the ones you chose - to your email registered with them e.g.:
ABOUT THIS ALERT Access to article abstracts is open to all Alert recipients. Access to full-text articles is limited to subscribers who have activated their online access. Activate your online access to your subscriptions at (under elsevier):
@
the Lancet COVID-19 Resource Centre
@
Please add briefing@nature.com to your address book.
Enjoying this newsletter? You can use this form to recommend it* to a friend or colleague — thank you!
@
LitCovid is a curated literature hub for tracking up-to-date scientific information about the 2019 novel Coronavirus
@
Novel Coronavirus Information Center
Elsevier’s free health and medical research on the novel coronavirus (SARS-CoV-2) and COVID-19
@
COVID-19: Epidemiology, virology, and prevention
Hope I understand you correctly, dear Prof.
  • asked a question related to Database Analysis
Question
5 answers
Does anyone of you use sentiment analysis in research conducted on data downloaded from the Internet and analyzed in the Big Data database system?
If so, please let me know in which issues, in which research topics do you use sentiment analysis?
Is sentiment analysis helpful in forecasting economic and financial processes?
Please reply
Best wishes
Relevant answer
Answer
Dear Venkatesh Gauri Shankar,
Thanks for the given example and description of building a forecasting model of economic processes based on the use of Python libraries.
Thank you very much,
Regards,
Dariusz Prokopowicz
  • asked a question related to Database Analysis
Question
7 answers
The improvement of specific risk management systems is particularly important in many areas of functioning of commercial business entities, financial institutions, public institutions as well as conducting investment, research and other projects.
How important is this is, for example, the global financial crisis that appeared in mid-September 2008, when specific financial, investment and credit risk management systems were not properly improved and the procedures of investment activity, including credit, were not carried out reliably, as well as customer service, and violation of business ethics in investment banks operating at the time and many other types of financial institutions and business entities.
please reply
Dear Colleagues and Friends from RG
The key aspects and determinants of applications of data processing technologies in Big Data database systems are described in the following publications:
I invite you to discussion and cooperation.
Thank you very much
Best wishes
Relevant answer
Answer
Dear Milan B. Vemić,
Thanks for the information on the webinar on big data collected on Big Data platforms and the processing of information collected in Big Data database systems. Thanks for the link to the YouTube video that addresses this issue.
Thank you very much,
Regards,
Dariusz Prokopowicz
  • asked a question related to Database Analysis
Question
3 answers
Hello advisors. I am glad to be here in this Community. If you allow me, I would ask if anyone of the members here know about Databases of cutting tools. Nowadays, I am working on a project and the main goal is to research about lifetime of cutting tools using the Cox proportional hazards model. At the moment, I just have one Database that contains cutting speed, feed rate, tool failure time and depth of cut. My research question is: Which of these variables are the most representative that give us the time of failure? Any kind of help, comments and questions are welcome.
Relevant answer
Answer
The attached paper solves a problem similar to yours. I would suggest that you get a copy of Jared Lander, R for everyone available from the z-library. If you need programs,
or have questions, please contact me. Best wishes, David Booth
  • asked a question related to Database Analysis
Question
4 answers
In my opinion, analytics based on the processing of economic information accumulated in Big Data database systems will be used for researching and analyzing determinants of current economic processes, for risk management and for forecasting economic processes.
Please reply
Dear Colleagues and Friends from RG
Some of the currently developing aspects and determinants of the applications of data processing technologies in Big Data database systems are described in the following publications:
I invite you to discussion and cooperation.
Best wishes
Relevant answer
Answer
In my opinion, the main determinants of the development of analytics based on Big Data Analytics technology include the development of ICT and Industry 4.0 information technologies and the possibility of applying these technologies to the processes of improving both retrospective and prospective analyzes carried out on large, multi-faceted sets of data and information.
Best regards,
Dariusz Prokopowicz
  • asked a question related to Database Analysis
Question
31 answers
What kind of scientific research dominate in the field of Big Data database systems?
Please, provide your suggestions for a question, problem or research thesis in the issues: Big Data database systems.
Please reply. I invite you to the discussion
Dear Colleagues and Friends from RG
Some of the currently developing aspects and determinants of the applications of data processing technologies in Big Data database systems are described in the following publications:
I invite you to discussion and cooperation.
Best wishes
Relevant answer
Answer
Informative question
  • asked a question related to Database Analysis
Question
4 answers
Does anyone know if an endomyocardial biopsies database exist and can be downloaded?
Relevant answer
Answer
you can github for search your relevant database
  • asked a question related to Database Analysis
Question
11 answers
Will the development of data processing technology accumulated in the Big Data banking database systems improve the credit risk management process or will it contribute to the development of Shadow Banking and the use of unethical practices for the surveillance of potential borrowers?
Large commercial banks generate high financial surpluses allowing for the implementation of modern integrated teleinformatic internet banking systems, Business Intelligence data analysis systems, data processing platforms in Big Data database systems, etc.
There were already situations of unethical use of modern ICT solutions, analysis of comments on social media portals, during which the bank verified the customer's data entered into the loan application by also scanning information that the potential borrower types in social media portals.
This informal verification took place without the knowledge of a potential borrower and could then be the basis for suing the bank.
However, the bank's client is not always aware of the fact that it can be invigilated in such a way by the public trust institutions that the bank should be.
Of course, these types of cases, which we know from the media is supposedly a margin of entire banking, which can be one of the categories of a new type of unethical practices typical of the so-called Shadow Banking.
However, only part of this type of information gets to the media.
Maybe this is just so-called "the tip of the iceberg" of this problem.
The situation is similar in the situation of cybercriminals' attack on bank IT systems or electronic banking platforms.
If it is possible to keep this type of events secret, then customers do not find out about it.
This is because media only receive information about some of these types of events.
Does any of you conduct research in this area?
If so, I invite you to cooperation.
I am asking for comments
Relevant answer
Answer
Big Data Analytics database and analytical technology can be helpful in the process of collecting and processing large sets of information and data on potential bank customers, including borrowers, enterprises being bank customers, data describing the economic and financial situation of enterprises and factors in their market, industry and competitive environment. etc. In this way, Big Data Analytics technology can be helpful in the context of improving the processes of analyzing the creditworthiness of potential borrowers and in terms of improving the credit risk management process.
Best regards,
Dariusz Prokopowicz
  • asked a question related to Database Analysis
Question
12 answers
Dear RG members
I frequently use the Web of Science. What‘s your choice? Thank you so much!
Relevant answer
Answer
In my opinion WoS and Scopus.
  • asked a question related to Database Analysis
Question
22 answers
Is it possible now or in the future to create an artificial intelligence that will draw knowledge directly from the analysis of Internet resources and learn this knowledge?
Please reply
I am conducting research in this area. Based on the findings, I conclude that the rapid development of artificial intelligence (AI), as seen in the increasingly popular chatbots, raises questions about its capacity for self-improvement and autonomous learning. These chatbots, trained on huge data sets and improving through interactions, still operate within algorithms created by humans. Although they can learn and process information, their ability to self-improve is limited. However, advances in technology are moving us in a direction where autonomous learning AI is becoming more and more feasible, although it still requires overcoming technological and ethical challenges.
My research and observations show that artificial intelligence technology has been rapidly developing and finding new applications in recent years, there are new opportunities but also threats. The main determinants, including potential opportunities and threats to the development of artificial intelligence technology are described in my article below:
OPPORTUNITIES AND THREATS TO THE DEVELOPMENT OF ARTIFICIAL INTELLIGENCE APPLICATIONS AND THE NEED FOR NORMATIVE REGULATION OF THIS DEVELOPMENT
Please write what you think about this issue? Do you see rather threats or opportunities associated with the development of artificial intelligence technology?
What is your opinion on this issue?
What is your opinion on this subject?
Please respond,
I invite you all to discuss,
Thank you very much,
Best wishes,
I would like to invite you to scientific cooperation,
Dariusz Prokopowicz
Relevant answer
Answer
The issue of the possibility of self-improvement of the system of artificial intelligence, which gained autonomy, got out of human control and by downloading data from the Internet it is becoming an increasingly powerful and threatening system for humans since the 1990s in science fiction literature and film. With each subsequent year, the technology of artificial intelligence is developed and improved. So, can the scenario presented above come true in the future? What do you think about this? Please reply.
Thank you, Regards,
Dariusz Prokopowicz
  • asked a question related to Database Analysis
Question
66 answers
Yes, in my country, the Scopus indexing base is considered one of the most important. The Scopus database is recognized as the main scientific database for the indexation of scientific publications characterized by high citation. However, on a global scale, the bases of indexing scientific publications recognized in various countries by various centers and scientific institutions are at least a dozen or so. However, these various indexing bases are not usually fully comparable, they are functionally differentiated and thanks to that they are only partially substitutable, but more often they are complementary. The question of complementarity should be developed. Then it will serve the development of scientific research and international cooperation of scientific communities.
In view of the above, I am asking you with the following question: Is the Scopus database recognized in your country as the main database for the indexation of scientific publications?
And if not the Scopus database, which other database of publications and scientific journals is considered the most important in your country?
Do you agree with me on the above matter?
What do you think about this topic?
Please reply.
I invite you to discussion and scientific cooperation.
Thank you very much.
Best wishes.
Dariusz Prokopowicz
Relevant answer
Answer
Dear Dr. Dariusz Prokopowicz , I hold similar views - Dr. Avishag Gordon.
In India, the University (where I work) and many other Universities started seriously considering the Scopus index for Ph.D thesis submission ( 2 articles in scopus indexed journals), promotion (computing scores), and research grants (to assess quality of research undertaken so far) and Ph.D guideship (atleast 2 in scopus or SCI or SSCI)....
So, the point is Scopus is gaining acceptance/importance but there are other indexes too...considered equally important.
Warm regards Yoganandan G
  • asked a question related to Database Analysis
Question
15 answers
Is the security of information collected in social media portals databases currently one of the key determinants of the development of new online media?
Security of social media portals is currently one of the most important topics of social media portals and other new internet media and information services. Therefore, scientists at various universities are involved in researching this issue. Therefore, security tools for information collected in social media portals databases and data security systems on the Internet are being developed. In companies and key public institutions, systems for risk management of information systems and information transfer on the Internet are also developed.
Do you agree with me on the above matter?
In the context of the above issues, the following question is valid:
Is the security of information collected in social media portals databases currently one of the key determinants of the development of new online media?
Please reply
I invite you to the discussion
Thank you very much
I also conduct research in this matter. I am researching the security of social media portals in connection with Big Data database technology. Below are links to my publications:
I invite you to discussion and cooperation.
Thank you very much
Best wishes
Relevant answer
Answer
Social media, in recent times, has with eased an explosion of data with so many social media platforms available to interact and express opinions freely. This has led to easy access to the privacy of social media users which raise broader security concerns … Sharma, S., & Jain, A. (2020). Role of sentiment analysis in social media security and analytics. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, e1366.
  • asked a question related to Database Analysis
Question
4 answers
Greetings from my side.
I am facing problem in getting annual reports banks throughout Asian and Euro-Asian countries for my content analysis. Need suggestions regarding any source where I can get data these easily.
Relevant answer
Answer
Hermann Gruenwald Thank you very much Sir.
  • asked a question related to Database Analysis
Question
15 answers
Considering the specifics of the increasingly common IT systems and computerized advanced data processing in Internet information systems, connected to the internet database systems, data processing in the cloud, the increasingly common use of the Internet of Things etc., the following question arises:
What do you think about the security of information processing in Big Data database systems?
Please reply
Best wishes
Relevant answer
Answer
The risk could be in tow form - one you already have mentioned is Security, a vital risk - needs to addressed by collective efforts on a war footing.
Secondly, the size of data itself, how integration takes place among hardware, software, latest internet serervice providers, cloud, etc. across the globe is also a risk.
  • asked a question related to Database Analysis
Question
9 answers
After some research, I found nothing satisfactory. The data do not allow for strong linkages.
Thanks for your help !
Best regards
Relevant answer
Answer
Hi,
Some time later, is there any news on this issue of the electoral targeting datasets or database ?
Are you aware of any other recent case studies on electoral micro-targeting?
The Internet Policy Review has released a very interesting special issue on Data-driven elections (2019, DOI: 10.14763/2019.4.1433), but still no comparative database/dataset (of any kind).
Best regards,
  • asked a question related to Database Analysis
Question
3 answers
Where I can find a global database free for Arab Maghreb Union (AMU) (classic and Islamic banks) with information about profitability, liquidity, capital adequacy, asset quality ... of Islamic banks worldwide?
Relevant answer
Answer
bloomberg , fitchconnect , orbis ,datastream
  • asked a question related to Database Analysis
Question
23 answers
Given your specific discipline. Have you ever irretrievably lost data of an ongoing research project? How did you handle it? Thanks in advance.
(Also, this is my story. A couple of years ago, in a study that included collection, preservation, identification and weighing of soil invertebrates, after an unfortunate event in the laboratory, the notebook that contained the weight notes of one of 10 sets of collected organisms, which belonged to the control group, was lost, so were the preserved organisms. I'm tagging this with an entomology lablel, so in case you're familiar with this topic: Would you consider trying some method of reconstructing the weight data or is there just nothing to do? There is no way to recover the notebooks, nor the preserved organisms).
Thank you.
Relevant answer
Answer
I have not lost the data of any ongoing research project, because in parallel I work at home and do independent archiving. Experiments have been suspended, which is a waste of time.
  • asked a question related to Database Analysis
Question
7 answers
Bi0.8Sm0.2FeО3
Crystal system: Orthorhombic
Space group: P n m a
Space group number: 62
Relevant answer
Dear R. M. Emirov
Here CIF BiFeO3 and Bi0.8Sm0.2FeО3
Crystal system: Orthorhombic
Space-group Pnma (#62)
Best regards!
  • asked a question related to Database Analysis
Question
9 answers
The current Legionella pneumophila SBT databse ( http://bioinformatics.phe.org.uk/legionella/legionella_sbt/php/sbt_homepage.php / http://www.ewgli.org/ ) maintaint by Public Health England’s Bioinformatics Unit is down for the past 2 months, and no email replied from their listed email.
Is there any other database I can refer to, I stuck in the middle of my study.
Thanks
Relevant answer
Answer
The website is not active, but you can still access it from archives. Download any useful data promptly.
  • asked a question related to Database Analysis
Question
4 answers
Hi!
My name is Catarina and I'm looking for some help in Innovation Marketing- Product Innovation.
I'm going to join a project which aims to analyse the consequences of implementation of an improved product in a certain store -emphasis on the fact that this is about incremental innovation, which means improving an already existing product and not developing a new one.
What I need to know is:
  • How to perform a statistical analysis on changes in the market after the implementation of this case of innovation marketing, specifically product innovation, in order to assess the consequences of this implementation.
  • What methods should I use to conclude what the clients consider to be an innovation?
  • What statistical analyzes are usually performed in the evaluation of the implementation of Innovation marketing?
  • How to measure the weight of positive / negative aspects?
  • What are the stages of this study before implementation (forecast) and after (evaluation)?
All of these in a perspective of statistical analysis.
Also, if there is any documentation on this subject that could be helpful please let me know.
Thank you :
Relevant answer
Answer
Thank you so much!
  • asked a question related to Database Analysis
Question
3 answers
Do you know any database for searching about endemic species, their genomes, differences, DNA barcodes etc.?
Relevant answer
Answer
Exactly, since endemic species are not universal but variable based on the geography, there can not be a standard database to have the information in one place.
In you original question you did not mention about which species you are interested in, animals; plants etc. thus referring you to any database is not logical. Also, most of the databases contains genomic data and metadata about environmental parameters, genome content etc. But I have not seen any database which contains the barcodes/primer information, and secondary metabolites.
Ofcourse you can use universal primers to target the species in question, and there are many out there on innternet or in published papers.
I dont think you can find all at one place, what you want to find, as of today. However, it would be a good database to make and have all such information in one place. May be you can make one and publish.
  • asked a question related to Database Analysis
Question
5 answers
We are looking to implement a web-based lab notebook as well as a tracking system to upload various assay results for several analogs of a parent compound. We will need to keep very close track of lot numbers, dates received, chemists who synthesized them, ect. Does anyone use a service which would be helpful?
Relevant answer
Answer
Hey
SCINOTE
  • Intuitive and easy to use
  • Inventory management and MS Office integration
  • Automatically generates reports & manuscript drafts
  • Exports all data in a readable format and API
  • Free account option
  • asked a question related to Database Analysis
Question
4 answers
We want to ascertain a reliable and effective database for the analysis of LC-QTOF-MS/MS data. The database should be a freely available one.
Relevant answer
Answer
in this case you have to prepare data sets of molecular formula and compare it with different sort of data.
  • asked a question related to Database Analysis
Question
5 answers
I'm looking for a database, which contains financial statements of companies but presented quarterly not yearly. Amadeus database presents data only yearly and only for last 10 years. Do you know some other sources?
Relevant answer
Answer
You can get quarterly financial information on companies from yahoo finance for example:
Also, they have a premium version for historic data.
For a complete dataset i recommend you:
they have an API as well:
And check hquandl.com they have various datasets.
Hope this helps. Good luck!
  • asked a question related to Database Analysis
Question
3 answers
I am looking for company-level data on R&D expenses, because I would map it to M&A data in order to assess the impact of (cross-border) M&A on R&D intensity. Thank you.
Relevant answer
Answer
C K Gomathy & Stephanie Tonn Goulart Moura - thank you very much for coming back to me
  • asked a question related to Database Analysis
Question
4 answers
Hi there,
Could anyone suggest any database that could be used to find protein binding peptides ?
Thanks,
Chun
Relevant answer
Answer
Amanda Lovecraft ,I am looking for the sequence of peptide epitope. While on the meantime, the antibody that binds to the epitope is needed as well. If you know this kind of database, that would be great.
Chun
  • asked a question related to Database Analysis
Question
18 answers
Currently, various data security tools are used in Big Data database systems. The basic principle is the parallel use of several types of IT security and compliance with specific procedures for analyzing and securing systems against potential materialization of operational risks, including technical risks associated with used computer hardware and specific database technologies and personnel risks associated with employees who support these systems.
The key issue is also whether built database systems are directly connected to the Internet online or are not permanently connected to the Internet and certain data from the Internet are added from time to time to Big Data databases after their analysis by anti-virus software, detecting malware worms, such as keyloggers and other malicious software created by cybercriminals and used to steal information from database systems of data warehouses and Big Data.
In a situation when Big Data database systems or other systems where important information is collected are connected to the Internet online, then the information sent should be encrypted, and system gateways connecting the Big Data database with the Internet should be equipped with a good firewall and other filtering security incoming information. If the employees operating the Big Data database system use certain e-mailboxes, they should be only company mailboxes and verified from the security side of data transfer on the Internet. The company should have strict security procedures for using e-mail boxes, because in recent years via e-mails cybercriminals have sent ransomware programs hidden in e-mail attachments, used to encrypt hard disks used in company and server databases.
Do you agree with me on the above matter?
In the context of the above issues, I am asking you the following question:
How should Big Data database systems be protected against the activities of cybercriminals? What types of programs and systems for securing Big Data databases against cybercrime are currently used? What other types of security instruments for Big Data database systems are currently used?
Please reply
I invite you to the discussion
Thank you very much
Best wishes
Relevant answer
Answer
How should Big Data database systems be protected against the activities of cybercriminals?
Possible ways include:
  1. ensure access rights are only given to the right users, continuously monitor their access & revoke their access once the campaign / project is completed.
  2. data at rest encryption.
  3. data access / transfer encryption.
  4. layers of network security with different types of firewalls & only the right port(s) are turned on.
  5. backup data frequently to ensure 3-2-1 rule applies - this is to ensure your data availability when they are encrypted by ransomware.
  • asked a question related to Database Analysis
Question
25 answers
What are the important issues for you related to the collection and processing of large information sets in Big Data database systems?
The current technological revolution known as Industry 4.0 is motivated by the development of the following factors:
- Big Data database technologies,
- cloud computing,
- machine learning,
- Internet of Things,
- artificial intelligence.
On the basis of the development of the new technological solutions mentioned in recent years, the processes of innovatively organized analyzes of large collections of information collected in Big Data database systems dynamically develop.
In my opinion, the fastest-growing business projects are primarily those that are the subject of innovative startups developing dynamically for a minimum period of several years. Startups develop innovative business projects in such areas as: information technology, ICT, Internet, biotechnology, energy, ecology, environmental protection, medicine, agribusiness, etc. In addition, a number of innovative technologies in construction, material, process and marketing innovations have recently been created in the field of smart city, life science, cleantech, medical intelligence and others that are used by companies and corporations operating in various sectors of the national or international economy.
In addition, innovative business projects are also developed in the fields of various fields of information services, advanced data processing, business analytics and the development of teleiform technologies, which together are the pillars of a knowledge-based economy. The current technological revolution known as Industry 4.0 is motivated by the development of the following factors:
Big Data database technologies, cloud computing, machine learning, Internet of Things, artificial intelligence. It is anticipated that in the next years in these fields of science and technology many large startups will be created, which will be developed effectively based on innovative business projects related to the topics mentioned above. On the basis of the development of the new technological solutions mentioned in recent years, the processes of innovatively organized analyzes of large information collections gathered in Big Data database systems dynamically develop.
In each of these areas, many specific design topics can be distinguished, in which business startups develop and reach the minimum level of a medium-sized company or large corporation in a situation of spectacular business success based on a well-designed business and an effectively implemented innovative business project.
What other technological improvements, innovative organizational, technical and IT solutions will be developed in the future based on the development of the above-mentioned factors?
Will the development of data mining technology, machine learning, artificial intelligence, Big Data data analysis, etc. develop new branches of the knowledge-based economy or only make use of these technologies in already existing branches, sectors of currently developing economies?
In view of the above, I am asking you: What are the important issues for you related to the collection and processing of large information sets in Big Data database systems?
Please reply
This issue is described in the following publication:
I invite you to discussion and cooperation.
Best wishes
Relevant answer
Answer
Safeguarding Big Data Systems against possible intrusive information sabotage is , at least to me, the most significant requirement. In other words, security and confidentiality are top priority requirements.
  • asked a question related to Database Analysis
Question
3 answers
I'm working on NrCAM gene mutation. I'm searching to know about any known mutations recorded in NrCAM gene. But I'm not getting any databases or articles regarding that. Can anyone suggest me any mutation database or links to know about the mutations of NrCAM gene in humans.
Relevant answer
Answer
You can use gnomAD (https://gnomad.broadinstitute.org/) to find variants and mutations found in a large dataset of whole exome and whole genome sequencing.
You can search in ClinVar for any variants/mutations with clinical significance (https://www.ncbi.nlm.nih.gov/clinvar/?term=nrcam%5Bgene%5D)
  • asked a question related to Database Analysis
Question
20 answers
Big Data database systems can significantly facilitate the analytical processes of advanced processing and testing of large data sets for the needs of statistical surveys.
The current technological revolution, known as Industry 4.0, is determined by the development of the following technologies of advanced information processing: Big Data database technologies, cloud computing, machine learning, Internet of Things, artificial intelligence, Business Intelligence and other advanced data mining technologies. All these advanced data processing and analysis technologies can significantly change and facilitate the analysis of large statistical datasets in the future.
Do you agree with my opinion on this matter?
In view of the above, I am asking you the following question:
Will analytics based on data processing in Big Data database systems facilitate the analysis of statistical data?
Please reply
I invite you to discussion and scientific cooperation
Best wishes
Relevant answer
Answer
I am with Alexander & James on this. First, Nearly, all so-called big-data is "it-is-what-it-is" data (i.e. The analyst has no control over how the data is collected. Under these circumstances the selection process for the sample is non-random & the selection probabilities are typically unknowable. One is often not even certain if the sample is from 1 unique population.) Under such circumstances biases can arise & their effects are not measurable. For something like sales; big-data can be highly applicable & population specific because it represents a sample from one's particular population: your customers. Yet, if it represents health records, one has to wonder whether or not the source for these record misses some parts of the population or over-represents sub-populations. I think solid "statistical thinking" can help big-data analysis more than the other way around.
  • asked a question related to Database Analysis
Question
1 answer
I am looking for a face (male and female) which is proved to be sympathic and is free for use. Does anyone have a suggestion?
Relevant answer
Answer
you can try Wikimedia and check the rigths. https://commons.wikimedia.org/wiki/Category:Images
  • asked a question related to Database Analysis
Question
3 answers
While doing analysis on NRD database with SAS software, I defined IndexEvent using primary diagnosis variable (DX1) and calculated 30-day readmission rates for IndexEvent.
I want to find out the Primary Diagnosis for Readmission. I want to know the reason for readmission if its same as IndexEvent or any other reason for Readmisssion. Which variable I should use to look for Readmission Primary Diagnosis?? Any Specific SAS code I should use??
Anyone familiar with this please guide.
Thank you in advance!
Relevant answer
Answer
It can be very challenging at times to know which was the diagnosis for admission and then which for readmission with the NRD IMHO
For procedures it is much more straightforward and easy
  • asked a question related to Database Analysis
Question
16 answers
Future of Big Data Analytics
In my opinion Big Data database technologies find more and more applications in business analytics, in Business Intelligence, in marketing analysis, in consumer preferences research, sentiment analysis based on comments on online portals, including social media. However, how will this Big Data analytics development look like in the future? This is determined by many factors regarding various issues, including the security of transfer and processing of data contained on the Internet, technological progress of data processing, information and marketing policy of online technology companies, including companies that manage leading social media portals, etc. Add analytical capabilities conducted research on the development of Big Data technology, the potential for using Big Data for industrial espionage, cybercrime and for maintaining information security and information systems by national and supranational security services to combat cybercrime, international money laundering transactions, transfer money to tax havens , terrorism, inducing destabilization in capital markets, etc. In this regard, an "arms race" will be the formats employed in the development of Big Data technology, employed on the one hand by legally operating companies, financial institutions, including banks and security services and hackers employed by criminal organizations who will continue to break into company information systems, banks and agencies by creating new cybercrime techniques government. This "arms race" is endless and is probably one of the key determinants of technological progress that is taking place on the Internet, including Big Data technologies.
I am currently conducting research in this area and I invite you to cooperation.
Relevant answer
Answer
What will be the directions of development of analytical processes of sentiment analyzes on data collected in Big Data database systems in the future?
In order to analyze the sentiment on downloaded data from social media portals (such as Facebook, Tweeter, LinkedIn, but also YouTube, Instagram ...) and aggregated in Big Data database systems, it is necessary to use specialized software for extraction and analysis of these data.
The quality of the data transferred to, for example, excel sheets depends on the quality of the extraction process carried out with the help of specialized software.
Then, the quality of data analysis software in Excel sheets or in systems of computerized analytical platforms depends on the result obtained, the answer to the question given to the collected, initially unstructured data in the Big Data database system.
In the future, artificial intelligence may be used for this purpose, and the whole process of purposeful analysis of collected data will proceed in a much more effective, automated manner, less probable errors, will be a cheaper research process and will be carried out much faster even on much larger information collection than current.
In view of the above, the current question is: What will the directions of development of analytical processes carried out sentiment analyzes on data collected in Big Data database systems in the future?
Please, answer, comments. I invite you to the discussion.
  • asked a question related to Database Analysis
Question
3 answers
I read an interesting question this week about a researcher wanting to know how to sell a patient's database... Then, the related question emerged in my brain: is there a set formula that is used to value the database ?
Effectively, a database owner may be asking himself for a price per client record - that is used to value your database when you are looking to sell your clinic or a Research Databse.
An interesting concept don't you think?
The idea being that is you have 1000 clients on your clinic database - when you come to sell the business - you simply multiply the number of clients by a set dollar amount - say $10 per contact - meaning that your database would be worth  $10,000 of your sale price.
However - as I a sure you can appreciate - this simple calculation method is not that easy - or accurate for that matter.
There are many things that can impact on the value of your database - so a simple "dollar amount per record" would not work in many cases.
One of the variables that impact on the value of your database is how responsive this list of people are to your marketing messages.
If you have 1000 contacts in your database - who - when you email them an offer - you get a 50% email opening rate - and 20% of them take you up on your offer - then this is a valuable list.
If your 1000 contacts - have an email opening rate of 10% and 1% respond to your offer - then this list is not as valuable.
Another variable in this valuation process is the lifetime value of the contacts on your list.
If your 1000 contacts are all high income clients who regularly attend your clinic and buy lots of extra products and services- then they are much more valuable than a list of people who attended for a single "lead generation" low price offer - and never return.
Another value determining factor in health business databases is the concept of "Recency" - that being - when was the last time these clients actually attended your clinic for a paid service.
If your business has been established for many years - there may be a large number of past clients who no longer live in your area , have swapped to a new health provider - or sadly - may have even died.
So of your original 1000 contacts - there may only be 500 who have actually visited your clinic in the past 5 years.
Again - this lack of recent buying activity will also impact on the value of your database when it comes time to sell your clinic.
So what is the take home message for you as a current or future health business owner?
If you are looking to buy a clinic - ask lots of questions about email opening rates, age of the database, responsiveness to marketing messages (assuming any marketing messages are even being sent by the current owner) , buying habits, purchase frequency and recency of last visit.
If you are selling a business and want to get maximum value for it - make sure you are in regular contact with your client list and can demonstrate a solid and recent buying history for these clients.
I have personally seen practice databases with over 10,000 clients details but due to poor marketing and lack of follow up - the list is largely worthless to a potential buyer.
Do you have any interesting research database for selling? Do you need help to estimate a good VALUE and CLIENTS for it?
Relevant answer
Answer
Dear colleagues, Michal Svoboda and Muhammad Farooq , thank you very much for your kind and valious contribution. I fully agree. Thank you for your comments!
  • asked a question related to Database Analysis
Question
5 answers
In recent years, the field of research and business applications in the field of obtaining, archiving, analyzing and processing data in Big Data database systems has been developing strongly.
In many companies, especially in large corporations, integrated risk management systems are built and improved.
Integrated risk management systems combine risk management processes in various areas of a company, institution or other organization.
One of the areas of risk management, the importance of which in many companies is growing, is risk management in the area of ​​obtaining, archiving, analyzing and processing data in Big Data database systems.
In view of the above, I am asking you: Are the risk management instruments and models for the acquisition, archiving, analysis and processing of data in Big Data systems already being developed? Are you already familiar with examples of this type of systems with risk management risk, which concern the acquisition, archiving, analysis and processing of data in Big Data systems? Please reply
Relevant answer
Answer
If I understand you correctly, you are asking about risk management tools used in the field of Big Data analytics. In practice, I can see two main aspects of this.
  1. Managing security risks regarding the information in storage and transit.
  2. Ensuring functional integrity of the tools used in acquisition, processing, storage and delivery of information.
There exists technical protocols and best practices to ensure information security. These are ensured both physically and logically. For an example, we follow the SOC-2 standards, which is audited once a year by a third party evaluation and certification body.
There are automated intrusion detection systems to periodically test information systems for security weak-points. These alerts the administrators about possible security loopholes before a hacker would discover them.
Before a software is released for production, there are several layers of thorough testing both automated and manual to verify the functional integrity as well as security. Software are released for production environment only after passing these simulations in a separate virtual environment.
Sometimes, data in a Big Data information system is not all contained in a single Database, but spans across multiple storage systems and servers. All these storages of information require both storage level encryption and access control.
Access control is another aspect of information security risk management. Nowadays many information systems offload user authentication into third-party authentication providers. This is implemented as Single Sign On [SSO].
Software agents processing data in large enterprise applications also have an identity, called a service account. These are also authenticated and authorized to access various services for information.
  • asked a question related to Database Analysis
Question
14 answers
Advanced technologies of digitalization and automation of data processing first find their application in business. Then also in public institutions can be introduced including in the field of e-governance. This also applies to Big Data database technologies, which is applicable in various sectors of the economy, but due to the high investment costs of implementing this technology in the business processes of business entities, so far only large corporations and larger enterprises can afford such technologies. However, in the future, investment costs of implementing tech technologies into business processes should decrease and processing technologies and data collection in Big Data database systems should be available also for smaller companies, including business entities of the SME sector.
I invite you to the discussion.
Relevant answer
Answer
The Big Data database technology is finding more and more applications in business.
Multi-criteria processing of huge data sets collected in Big Data database systems allows preparing reports in a relatively short time according to given criteria.
The report development time depends mainly on the computing power of Big Data servers.
In processes of complex economic and financial analyzes, risk management, etc. for the purpose of determining the economic and financial situation of business entities, they are increasingly carried out in computerized analytical platforms of the Business Intelligence type.
Perhaps in the future, artificial intelligence will also be involved in this field of analytics.
In some countries, IT companies have been operating for several years, developing the Big Data database technology for commercial and business purposes.
It is only a matter of time to combine these various analytical and database technologies in computing cloud computing.
In view of the above, the current question is: In which sectors of the economy, in which types of companies and corporations will be the most dynamically developed technologies for analyzing large collections of information in Big Data database systems
Please, answer, comments. I invite you to the discussion.
  • asked a question related to Database Analysis
Question
6 answers
I´m searching for databases to install in my laptop to identify XRD patterns, and phases, if you could please post me the link to a free database it would be extremely helpfull
Relevant answer
Answer
Qualx2 (http://www.ba.ic.cnr.it/softwareic/qualxweb/) is free software for search-matching, and it has a COD database. Powcod database is free too, the link for download is http://www.ba.ic.cnr.it/softwareic/qualx/powcod-download/.
  • asked a question related to Database Analysis
Question
3 answers
I am using fuzzy logic to help decision-makers to get "quickly" and reliable decisions. I have collected many projects and I am interested on 12 parameters. So Finally my system has 12 inputs, each one has 3 membership functions(MFs), and one output with 4 MFs.
But I have a problem regarding the Inference system, i.e. If-Then-Rules. My parameters have a solid Relationship between them, so I think that I Don't have to consider all possible combination ( 3^12) but I have to make sure that my system got a good training and then I can trust him.
Before talking about the testing the system with a sall dataset, let's talk about the way I can consider all possible If then rules. should I come bach to each project and according to each one I see on which membership it belongs, or I use Simply the parameters ranges that I already realised and build my If-then-rules according to that?
One more question relating to unsuccessful projects. how could I consider these information (example: when a was low and b was Med and ….. Then the project X was insuccessful, Can I use it like this:
If a is Low and b is Med and …. Then output is not X ?)
I will more than happy f you share with me your experience so I can adapt it to my problem and act accordingly.
Thank you, and I am wainting for your interactions.
King Regards,
Relevant answer
Answer
>should I come bach to each project and according to each one I see on >which membership it belongs, or I use Simply the parameters ranges that I >already realised and build my If-then-rules according to that?
If you are using MATLAB then when you view the output. It already highlights the execution of rules and values on membership functions which depend on the data value passed by you and the way your rules have executed. The design of rules and membership functions depends on the
range of your data, results obtained from existing techniques, Experts opinion...
  • asked a question related to Database Analysis
Question
1 answer
It's the problem I have to solve, I don't have much knowledge about Distributed database, can any one help me?
Relevant answer
Answer
your Question was not clear enough so you could get help
However, I recommend you this publication of mine you might find usefull
  • asked a question related to Database Analysis
Question
4 answers
i have developed a bioinformatic tool (Link: geltowgs.uofk.edu). that compares PFGE gel image analysis results to mathematical models of band sizes derived from WGS (FASTA files). I have suggested a new algorithm to count DNA fragments that co-migrates across each lane. We are about to publish ower work (Research gate DOI: 10.13140/RG.2.2.32752.76806) . The attached file illustrate our method.
thank you
Relevant answer
Answer
sorry, the uploaded file was an old draft. The attched document is the complete one
  • asked a question related to Database Analysis
Question
7 answers
Hello
Do any one know how to add an institution to SciVal to have its own performance and analysis? All researchers belong to institution had Scopus indexed Author ID.
Thanks
Relevant answer
Answer
Thanks a lot for asking this question, please sir follow this link and give me your opinion:
Regards,
Emad
  • asked a question related to Database Analysis
Question
3 answers
I am working on financial inclusion status both cross country analysis and within Bangladesh. It will be helpful for me to get some recent publications related with this and a fruitful database.
  • asked a question related to Database Analysis
Question
23 answers
Hello,
I am conducting some preliminary concentration-fixing steps on a pesticide used for house fly larvae. My initial data from 4 initial concentrations came out like this: highest conc. - 12/20 dead, second highest - 10/20 dead, third highest - 7/20 dead, fourth highest - 3/20 dead, control 4/20 dead. When I do Abbott's correction on the fourth highest, I get a negative value. Do I simply not use Abbott's correction at all for any of these concentrations, or do I count the fourth highest response as 0/20? Thanks in advance! 
Relevant answer
Answer
I Duly agree with @ Prof. Udikeri,
According to the WHO,
If the control mortality is above 20%, the tests must be discarded. When control mortality is greater than 5% but less than 20%, then the observed mortality has to be corrected using Abbot's formula. Also, If the control mortality is below 5%, it can be ignored and no correction of test is necessary whereas mortality of ≥5% requires correction.  
For more information, You can follow the below link.
Therefore, in my view, you should repeat the procedure again to get authenticated result.
  • asked a question related to Database Analysis
Question
2 answers
Hi all,
for my current literature review I would like to try something new. I would like to search papers graphically, by highlighting the relations between papers in one "conversation", i.e. which are linked through a common factor, like a keyword. 
Ideally the output should look something like the attached picture, with each dot representing one journal article and the links representing the citations between the articles. 
Does anyone know a database or a software that is able to do so?
Thanks for your help!
Best,
Jens
Relevant answer
Answer
Hello 
Web of science has a similar tool but may not be that much comprehensive. I added the link.
Bests
  • asked a question related to Database Analysis
Question
4 answers
I need some kind of database from a few years ago and now 
Relevant answer
Answer
Dr.Xiao has gave you a good database. Pls check it out.
  • asked a question related to Database Analysis
Question
1 answer
Hi everyone. I have a database of operating conditions in a power system for different kind of faults (1000 operating condition, each one has been reevaluated for single line outages, so total number of samples are: # of branches * 1000). The problem I'm facing right now is the tremendous amount of samples in my database, because for each line outage I'll have 1000 operating conditions and that will give me so many samples to analyze.
To reduce the samples, I decided to make use of ReliefF algorithm. I reshaped my database matrix in a form that the rows are my operating conditions and columns represent outage of each line. The entries of the data matrix are stability index for each operating condition in single line outages. Using k-means, I classified similar operating conditions and then using ReliefF, I found the best features(here branch outages). Now using the branches that are more important attributes, I decided to delete other operating conditions that are related to less important branch outages.
So my question: Is the whole thing sounds logical?
If not, is there any idea that I can use. I want to reduce my samples.
Thank you for helping me.
Relevant answer
Answer
Hi Sill, 
Basically for such purpose,  we use the ReliefF algorithm in conjunction with ranker method, and this is a very healthy process. By doing so, you can reduce the dimensionality of the data based on ranking process. However, bear in mind that K-means is a clusterer algorithm (not a classifier one).
HTH.
Samer
  • asked a question related to Database Analysis
Question
3 answers
I want to study which arginine or lysine residues in target proteins are preferentially modified by glycating agents like methylglyoxal. Is there a database or any other resource which lists the modified proteins with specific data on the amino acid position and the nature of the modification (e. g., MG-H1 at position Arg xy in protein z)?
I would appreciate your help.
Thanks and best regards, Christian
Relevant answer
Answer
Dear Christian,
I couldn't find a database of glycated proteins. But I found some databases which might be useful for your work. 
InterPro provides functional analysis of proteins by classifying them into families and predicting domains and important sites. http://www.ebi.ac.uk/interpro/
GlycoSiteAlign selectively aligns amino acid sequences surrounding glycosylation sites (by default, 20 positions on each side of the glycosylated residue) depending on structural properties of the glycan attached to the site. http://glycoproteome.expasy.org/glycositealign/
NetGlycate 1.0 server predicts glycation of ε amino groups of lysines in mammalian proteins. http://www.cbs.dtu.dk/services/NetGlycate/
The NetOglyc server produces neural network predictions of mucin type GalNAc O-glycosylation sites in mammalian proteins. http://www.cbs.dtu.dk/services/NetOGlyc/
The NetNglyc server predicts N-Glycosylation sites in human proteins using artificial neural networks that examine the sequence context of Asn-Xaa-Ser/Thr sequons. http://www.cbs.dtu.dk/services/NetNGlyc/
  • asked a question related to Database Analysis
Question
4 answers
Dear Researcher.
I have only one meteorological station inside the study area, the others are outside but near the study area. The only information available for these stations is rainfall and temperature data. Kindly guide me with this information , Is it recommended to use SWAT for stream flow (Runoff/ Discharge) simulation ?
Relevant answer
Answer
Dear Muhammad,
Your question is rather cryptic, and it is difficult to provide feedback on such a generic question. The following points may help you proceed in your project:
The first point to clarify is to identify exactly the purpose of your investigation: what are you trying to achieve? Even assuming you want to estimate a "flow" of water, which flow are you talking about: evaporation, transpiration, surface runoff, infiltration, water table recharge, underground runoff? Maybe a simple "bucket model" could be sufficient? Your ultimate goal should guide your choice of a particular model.
Assuming you want to use a "Soil and Water Assessment Tool (SWAT)" to investigate one of those hydrologic issues, the next point is to indicate which model you are using or planning to use, as different models may take advantage of different numbers and types of input data. The following site offers a number of SWAT models and associated tools, as well as documentation and information on related events, though a search for SWAT models on the Web will generate hundreds of thousands of related links:
Then, anyone of these models can be used with synthetic, "made-up" input data: if you are interested in learning about the tools or exploring the sensitivity of the model to various inputs, you do not need specific or even very realistic input data: you can drive the model with artificial inputs to explore its limits of applicability. On the other hand, if you want to simulate a real, practical situation, you will need to drive the model with actual measurements. Again, the list of required inputs will depend on the model chosen, and those that are not available (in your case, anything else than temperature and rainfall) will have to be provided artificially.
The next question you must address is the level of accuracy you need on the output product: if you don't have any such requirement, then any input and any model will work! This also implies you must have a method to assess the performance of your model: How will you determine that the results are sufficiently good for your purpose? What is the maximum tolerable error or uncertainty on the outcome?
Once these various points have been clarified, you can run your model with the available inputs and conduct sensitivity analyses by modifying those inputs that you created artificially, to determine whether their influence on the outcome is such that the errors exceed your requirements. In that case, you must find a way to specify those inputs in a realistic manner. Otherwise, your inputs will be sufficient for your purpose.
I hope this may help you plan your investigation and in particular determine whether the limited input data you do have will be adequate or not to reach your goal. Cheers, Michel.
  • asked a question related to Database Analysis
Question
4 answers
I have pavia dataset at the given web location. Here, class & samples are define. I want to know how to define these classes at what parameter in matlab. 
Relevant answer
Answer
Dear Shrish
          Your question is not clear, but as per my understanding of your question, you can load the ground truth of pavia dataset in matlab and can see the location of each class 
  • asked a question related to Database Analysis
Question
5 answers
Result of disease can be in a form of ......cure/ not cure/ better/ improved
or anyset of binary which reflect condition of patient in terms of result...
Relevant answer
Answer
  • asked a question related to Database Analysis
Question
2 answers
I want to analyze classification datasets based on their characteristics; such as features, attributes, data type, data complexity, dataset size, number of instances etc.
If anyone has already performed such or related analysis, kindly guide me or suggest techniques, ideas for effective analysis.
Thank you.
Relevant answer
Answer
Aleksandre explains you good outlook.
I investigate the discrimination of the microarray data and resolved it completely.
Many researchers approached wrong strategy. Download my papers from RG.
  • asked a question related to Database Analysis
Question
2 answers
I collect fingerprint using sensor. It obtained two data: 1) raw data and 2) base data. The raw data obtained after touching sensor, while base data obtained without touching. I manually saw the difference but  it is difficult for differentiate in large database, so is there any algorithm to find the difference between these two database?
Relevant answer
Answer
Dear Sir,
I can obtain fingerprint by subtracting raw data from base data. But,  for this i have to check every single data whether it is raw or base data . How can I automatically differentiate that it is raw data or base data after assigning some threshold value? 
Thank you .
  • asked a question related to Database Analysis
Question
2 answers
What are the applications of ERD and normalization?
Relevant answer
Answer
ERD(Entity-Relationship Diagram) is the conceptual model. It represents relationship between entities. ERD is converted into Relation model(tables) using concepts of normalization(rules to prepare tables to store data). Any application which requires data to be stored into database(tables) could be considered as application of ERD and Normalization. For ex., Hospital Management system, Financial Accounting System, Library Management System, Hotel Management System, Online systems to book place orders, book tickets, etc.
  • asked a question related to Database Analysis
Question
9 answers
I need large datasets (>100,000) for some query questions using natural language and their corresponding answers
Relevant answer
Yes, you are right. What I need is something like that:
For example (based on the famous Northwind database):
Natural Language Query:
Get the last name, job title, city and phone number of all employees who live in either Seattle or Kirkland, and have a phone number that starts with (206) and works as Sales Representatives.
Corresponding SQL Statement is:
SELECT LastName , Title , City , HomePhone FROM Employees WHERE City IN ('Seattle','Kirkland') AND HomePhone LIKE '(206)%' AND Title = 'Sales Representative'
  • asked a question related to Database Analysis
Question
2 answers
We have developed some novel algorithms in our group for early prediction of SCA using MIT-BIH database. We now need some other data sets to validate our algorithms.
Relevant answer
Answer
Dear Brunett Parra
Thanks for your valuable information.
Regards
M Murugappan
  • asked a question related to Database Analysis
Question
9 answers
I'm working on modeling DSM in smart grid. I need the electricity usage of different appliances in different homes.
where can i find them?
Relevant answer
Answer
DSM in smart grid is mainly based on appliances scheduling based on some optimization technique. For this purpose you need only the wattage and length of hours for appliance operation. To prove the effectiveness of your model the supposition of these two parameters is justified.
  • asked a question related to Database Analysis
Question
2 answers
Exist a new governance and social hierarchy based on databases. We live inside a network of relationships that produce value and its measurement through algorithms. (Perhaps here there is the same risk too).
Does anyone know a critical organized?
Relevant answer
Answer
Any sense of value used in any algorithm, ranking or rating is (or should be) the product of a decision making task, process or effort, carried out by humans. This is the subject of the wide scientific area of Decision Making Theory (within Operations research, mathematics, etc).
So the sense of value comes from human values. There is an excellent book about this that you might want to consider, called "Value focused thinking" from Ralph Keeney (it already became a classic). 
I did not understand what you mean with "critical organized".
Good luck.
  • asked a question related to Database Analysis
Question
1 answer
Genome is a big database, how can we use the potential resources? Are there good methods or ways in addition to first mapping? I think the study of methods and tools about utilization of the sequenced genome is more important, such as studying statistics.
Relevant answer
Answer
Dear Zhansheng Li, I am an expert in Information Retrieval and by noticing the GDB retrieval structure in 4 steps (Step 1 Define the region, .Step 2 Specify what you want to retrieve and the two optional. Step 3 Restrict search to polymorphisms and Step 4 Restrict search based on date) I can tell you that it is quite limited for novice users. Accounting on that, I suggest you read some  journal articles that were based on GDB satistics, where researchers depict the way in which they handle the GDB. I have a method for Information Retrieval eficiency posted in an article, here at RG. It is called Linguistic Storm. I am sure it will lead you to a more concrete basis to your purpose.
  • asked a question related to Database Analysis
Question
10 answers
How do I determine that a database is cluster-friendly and therefore that it's possible to be confident in using an algorithm as k-means (for example)?
to discover the structure of the database
Note : the question is not related to the idea that that database can easily be distributed on lots of machines
Relevant answer
Answer
Dear Richard,
At first thanks for your long answer.
My opinion based on the paragraphs of your answer :
1) Yes I can try but as you said in 5) I do not want to waist my time ;-). My question in more related if we can "imagine" criterion (may be several as in "Basu - 'Data Complexity in Pattern Recognition' (Book springer 2006)" for classification problem)  to find an idea for the answer ...
2) Visualization are not in the scope on my question, I do not work on toy problem :-) where the number of explanatory is small
3) and 4) Yes of course I think that the answer of my question may be 'algorithm dependent' , may be we have to think to a "tree of answers"
5) yes this idea could the beginning of one criterion among a set of criteria
thanks again
  • asked a question related to Database Analysis
Question
4 answers
I want to use PHP to pull data from the PostgreSQL Database into my application. I wanted to know whether this method is secure, if my various users will have to interact with data.
Relevant answer
Answer
restful api.
  • asked a question related to Database Analysis
Question
2 answers
I need to run a federated SPARQL query (containing SPARQL Service Clause) via Sesame 2.7. Any help/examples would be highly appreciated. Thanks
  • asked a question related to Database Analysis
Question
3 answers
Hello, I am new to genmod and I am trying to calculate RR with an modified Poisson approach using robust error variance (by using the repeated statement and the subject identifier).
I have two questions:
1.) my output does not show me the output of the exp option on the estimate statement completely and not in the way as it was in other examples I am following. The xp estimate is missing, however L'Beta estimate and alpha etc are given ind the specific row. And I have no idea why? Can anyone help?
2.) Secondly, in other examples I found to ways to state the estimate statement "estimate 'beta' smoking 1/exp" and "estimate 'beta' smoking 1-1/exp" where exactly is the difference?
I will add my model below unfortunately the output cannot be displayed in a readable manner.
Thank you in advance!!
Relevant answer
Answer
(1)  since the actual statistical results  (e.g. p-value) are the same for beta and L'beta, sas only shows them on one line.
(2)  in most sas procs that allow 
estimate 'smoking'  smoking 1 / exp; 
syntax, you only need the single 1.  the use of 1 -1 comes from situations where the whole matrix was required (like what you see when you use a class variable and sas prints out the contrast matrix).  i personally find the matrix version confusing (too hard to count the zeros when you have a lot of categories) and like the syntax i used above, which is clear.  also, i think sas will even react to the 1 -1 syntax as you show it with a warning that there is extra stuff in the estimate statement.
  • asked a question related to Database Analysis
Question
14 answers
What is the most popular database model to be used in business? I would be grateful if I could have an evidences for the answer.
Relevant answer
Answer
There's little question that the relational database model is the most popular and will be for some time. Relational databases include Oracle's database, Microsoft's SQL server, MySQL and Postgres.
  • asked a question related to Database Analysis
Question
5 answers
Microsoft access is a software example for relational databases. I need more examples for relational databases. I need also some more examples for Object oriented databases and XML databases.
Relevant answer
Answer
Caché, ConceptBase, Db4o, GemStone/S, NeoDatis ODB, ObjectDatabase++, ObjectDB, Objectivity/DB, ObjectStore, ODABA, OpenAccess, OpenLink Virtuoso, Perst, Picolisp, siaqodb, Twig, Versant Object Database, WakandaDB, Zope Object Database.
  • asked a question related to Database Analysis
Question
4 answers
I would like to use this paper as my reference in evaluating databases. Please help me. 
Relevant answer
Answer
Sure, if their work suits your work the most. You may also mention in your paper about the reason you use their benchmark.
  • asked a question related to Database Analysis
Question
7 answers
While working with date variables in STATA, I encountered an interesting problem. There is a date variable and there are hundreds of other common variables. I have to find out the initial and ending date for each variables based on that single date variable. Such as, if a variable has all missing observations up to 1st December but at least 1 respondent responds on 2 December, then the initial date for that variable would be 2 December and so on. But I would like to request for clues about how this can be done for all the variables through STATA command (or any other software) and not by looking in the actual data which is gruesome for a large dataset.
Relevant answer
Answer
Easy enough:
foreach v of varlist V1-V999 {
di "Results for `v'"
summdate X if `v' <.
}
Or, if you use variable labels:
foreach v of varlist V1-V999 {
local l : var label `v'
if "`l'" == "" {
local l = "`v'"
}
summdate X if `v' <.
}
  • asked a question related to Database Analysis
Question
4 answers
In a genome database (nucleotide sequence) located in my computer, I would like to identify a given conserved domain, e.g., a given PSSM or a given Pfam.
For this purpose, I looked at PSI-BLAST and DELTA-BLAST, but they are protein-protein search tools, while I need searching a nucleotide database. Similarly, http://pfam.xfam.org/ allows searching protein-protein.
Is there any tool suitable for me to be used locally on my computer?
Relevant answer
Answer
We also do not have a tradition to use paid software, and as there are good old versions in some places on line, freeware is possible sometimes :) Din't work only for PEAKS, we lost it in Russia, as it is not only impossible for key-treatment but even stubborn to sit in only one computer
  • asked a question related to Database Analysis
Question
1 answer
Hello Everyone,
I want to know how to get the DBLP and SIGMOD query set. If you know the links, please can you share me? But if it is not gained query set from the links,these tested query is created by yourself when the query is processed. Please share me.. Thank you all.
Relevant answer
Answer
I am not sure about DBLP data set. But, if you can explore, the following link is useful to get good data set for typical analytical problems. I hope this may be of use.
  • asked a question related to Database Analysis
Question
6 answers
I just have a basic question- after doing various machine learning and data science, how can we know that we have got the most from our data? I know data is always important it is not outdated anytime in the future or present. But i want to know that at present after all processing how to know that we have completed getting analysis types on that data.
Relevant answer
Answer
The simple answer will be, whenever you have answer all your initial questions and/or you have prove of disprove hypothesis.
  • asked a question related to Database Analysis
Question
3 answers
CBR is considered to be a methodology not a technology to use. Different applications and techniques can be used to find the similarities and make use of objects/cases within the case-library you have.
Such as CBR using fuzzy logic, using Rough-sets, similarity measures and maybe K-nearest neighbor. What about CBR using DB technology?
Relevant answer
Answer
But the problem here with the usage of DB technology within CBR is the retrieval as in DB the SQL statements maybe a bit harder for some cases. The usage of patterns and wild cards i mean... so in that case maybe the usage of AI and similarity measures could be much efficient in the retrieval process.
  • asked a question related to Database Analysis
Question
4 answers
If we have changed the source data, then do we have to follow the same step for finding/generating the rules? Or change the method?
Relevant answer
Answer
It depends on the type of data. For sequential data, you might prefer sequential pattern mining algorithms. You might also transform your data, ignore the timestamps, and apply an itemset mining algorithm.
If your data is not static, there are several cases, including:
-incremental (not too much) updates: then you should appy an incremental algorithm (itemsets, or sequential patterns, depending on your choice).
- streaming data (updates at high rate): then aplpy a streaming solution. It might be based on different model for the "observation window" such as sampling, batches, jumping windows or sliding window.
  • asked a question related to Database Analysis
Question
18 answers
I have read a couple of articles which are trying to sell the idea that the organization should basically choose between either implementing Hadoop (which is a powerful tool when it comes to unstructured and complex datasets) or implementing Data Warehouse (which is a powerful tool when it comes to structured datasets). But my question is, can´t they actually go along, since Big Data is about both structured and unstructured data?
Relevant answer
Answer
It's very hard to answer this question in general without taking into considerations what your specific needs are. Also, "Data Warehouse" is a pretty general term which basically can mean any kind of technology where you put in your data for later analysis. It can be a classical SQL database, Hadoop (yes, Hadoop can be a Data Warehouse, too), or anything else. Hadoop is a general Map Reduce framework you can also use for a lot of different tasks, including Data Warehousing, but also many other things. You also have to bear in mind that Hadoop itself is a piece of infrastructure which will require a significant amount of coding on your part to do anything useful. You might want to look into projects like Pig or Hive which build on Hadoop and provide a higher level query language to actually do something with your data.
Ultimately you have to ask yourself what existing infrastructure is already in place, how much data you have, what the kind of questions are you want to extract from your data and so on, and then use something which fits your needs.
  • asked a question related to Database Analysis
Question
5 answers
For writing review paper data is not currently available.
Relevant answer
Answer
That time formal ( key availability of related research papers
  • asked a question related to Database Analysis
Question
3 answers
Sometimes it is very hard to find a lectotypification of old taxa. This is very time consumig and frustrating. A database compiling all designated lectotypes could be helpful. On the other hand, the code requires that lectotypification should be made my scientists knowing the taxon and working methods of the author of the taxon for which the lectotypification is done. If it is known that a taxon lacks a specific type this could lead to the automatic lectotypifications described in the code.
So, in the end I would like to have an impression whether in your opinion the disadvatages outbalance the benefits of such a database.
Relevant answer
Answer
A database holding records of previous lectotypifications would indeed be useful. Although more and more publications become searchable using tools like Biodiversity Heritage Library, it is still a hard job to find a lectotypification, and if you find one, to be sure it is the first lectotypification for that name (quite some names have been lectotypified more than once).
Of course such a database should NOT be the place itself of the lectotypification, and it should also not be encouraged to start mass-typifying any name that does not seem to have a clear lectotype.
In general, the best person to lectotypify a name would be someone revising the group the taxon belongs to. He or she is gathering world material of that taxon, and has the best overview of all material available, and should also be capable of deciding to which taxa those specimens belong, and pick the specimen that would serve stability of the name. As soon as there might be nomenclatural issues involved, it is always good to make that decision in such a way that stability is served. However, one might even argue that in case all syntypes or original material seems to belong to the same taxon, one might refrain from chosing a type, because at this point it doesn't matter for stability since all syntypes belong to the taxon anyway, and if a future project would find out that several taxa are involved afterall, the best choise should be made at that point, and one should not be hindered by a previous (maybe disadvantagable) lectotypification.
  • asked a question related to Database Analysis
Question
6 answers
I want to test a new approach for periodicity extraction from real and synthetic images and I need an universal database to do it.
Relevant answer
Answer
it should be "Near-Regular Texture Database", originated in CMU and continued growing at PSU ...
  • asked a question related to Database Analysis
Question
1 answer
I need different heuristics for query optimization.
Relevant answer
Answer
  • asked a question related to Database Analysis
Question
3 answers
I have been teaching RDMBS in an undergraduate database class. I would like to teach Big Data also. I want to make a smooth transition from RDBMS to Big Data. Can you suggest a textbook or good material which would give me the chance to compare between RDBMS and BIG data? Please let me know.
Relevant answer
Answer
I have a chance to find interesting youtue video
" Big Data for Relational Practitioners - EPC Group "
  • asked a question related to Database Analysis
Question
9 answers
I want to analyze the relationship between lots of genes/molecules in human, so I need the information of pathway.
Relevant answer
Answer
I hope this page can help you :)
  • asked a question related to Database Analysis
Question
10 answers
To get the first hint about tissue distribution of a particular protein I usually refer to GeneCards. There you can find at least three different datasets from BioGPS, RNASeq (Illumina Body Map) and SAGE (Serial Analysis of Gene Expression). The results sometimes do not overlap: i.e. BioGPS and RNASeq show me a signal let´s say in the brain, whereas SAGE shows no expression in the brain at all. How should I interprete it and why is there such a drastic difference there?
Relevant answer
Answer
Unlike genomic sequence data, transcriptomic data points are highly contextual and require a lot of detailed spatiotemporal annotation to realize their full value. Unfortunately such annotation is inconsistent at best and this example serves to illustrate that. The brain is the most complex organ in the body, with expression levels of factors varying widely across different regions and cell types, and even on a circadian level. Without knowing exactly what region of the brain was being assayed, it's difficult to compare two expression levels in two different databases. There's a wider issue here concerning the need for greater granularity of data annotation in transcriptomic datasets but until such times as standards are properly observed, the safest approach is to use these resources as a guide while bearing the above caveats in mind.
  • asked a question related to Database Analysis
Question
4 answers
Which tool/database/algorithm might ease the quest of searching the best gene/s to characterize a bacterial species?
Relevant answer
Answer
Yes, I know what orthlogs genes are.
That's the point: I do not know the gene/s. Initially I don't care about its function, either. It's about data mining. I want to find unique genes-sequences in a given genome, i.e., sequences with neither BLAST hits nor orthologous sequences in other related taxa. In bacteria LGT is not uncommon and hence this task is not straightforward. For instance sequences in humans which are absent in other primates are most likely also missing in other mammals. However, in bacteria that is not necessarily so because xenologous sequences are quite common
I
  • asked a question related to Database Analysis
Question
2 answers
Currently all giant companies are using Nosql database for serving their customers like HBase, Cassandra, Dynamodb , MongoDb , Google’s Big table . Most are open source and capable to handle Big Data requests but all these are still evolving so is there a need of standardization to maintain ACID property and rules for data accessing?
Relevant answer
Answer
On the basis of my experience with MongoDb, I agree with Martin Holzhauer's observation that standardization runs counter to the special-purpose use of NoSql databases.
Also, the notion that standardization would be needed to maintain ACID properties seems no to apply to NoSql -- currently these properties are enforced by programmers as needed.