Science topics: LepidopteraData
Science topic

Data - Science topic

Explore the latest questions and answers in Data, and find Data experts.
Questions related to Data
  • asked a question related to Data
Question
3 answers
In your opinion, could a new generation of generative artificial intelligence be created in the future, whereby a highly sophisticated language model capable of simulating the consciousness of a specific person, answering questions based on knowledge derived from publications written by that person and documented statements, previously given interviews?
For example, if in a few years it will be possible to create a kind of new generation of artificial intelligence equipped with artificial thought processes, artificial consciousness and integrate a language model with a database of data, knowledge, etc. derived from publications written by that person and documented statements, previously given interviews then perhaps in a few years it will be possible to talk to a kind of artificial consciousness simulating the consciousness of a specific person who has long since died and would answer questions simulating that person, e.g. the long-dead Albert Einstein. In this way, there could be language models available on the Internet based on generative artificial intelligence equipped with artificial thought processes, artificial consciousness, with the knowledge of a specific person with whom the Internet user could converse. If this kind of highly intelligent tools were created and offered as a service to talk to a specific Person, known and living many years ago, this kind of service could probably become very popular as a new Internet service. However, the question of ethics and possible copyright of works, publications, books written by a specific person many years ago and whose knowledge, data and information would be used by a generative artificial intelligence simulating the consciousness of this person and answering questions, participating in discussions with people, with Internet users, remains to be considered. Beyond this, however, there is a specific category of risk of disinformation within this kind of online service that could be created in the future. This risk of disinformation would occur if there were situations of responses given by artificial intelligence to questions posed by humans, which would contain content, information, data, wording, phrases, suggestions, etc., which would never be uttered by a specific person simulated by artificial intelligence. The level of this kind of risk of misinformation would be inversely proportional to and determined by the level of sophistication, perfection, etc. of the construction of this kind of new generation of artificial intelligence equipped with artificial thought processes, artificial consciousness and the integration of a linguistic model with a database of data, knowledge, etc. derived from publications written by this person and documented statements, previously given interviews, etc., and the perfection of the learning system to give increasingly perfect answers given to the questions asked and to learn by the generative artificial intelligence system to actively participate in discussions.
In view of the above, I address the following question to the esteemed community of scientists and researchers:
In your opinion, can a new generation of generative artificial intelligence be created in the future, whereby a highly advanced language model capable of simulating the consciousness of a specific person, answering questions based on knowledge derived from publications written by that person and documented statements, previously given interviews?
Could an artificial intelligence be created in the future that is capable of simulating the consciousness of a specific person?
What do you think about this topic?
What is your opinion on this subject?
Please respond,
I invite you all to discuss,
Thank you very much,
Counting on your opinions, on getting to know your personal opinion, on an honest approach to discussing scientific issues and not the ready-made answers generated in ChatGPT, I deliberately used the phrase "in your opinion" in the question.
The above text is entirely my own work written by me on the basis of my research.
I have not used other sources or automatic text generation systems such as ChatGPT in writing this text.
Copyright by Dariusz Prokopowicz
Best wishes,
Dariusz Prokopowicz
  • asked a question related to Data
Question
3 answers
I am trying to run a spatio-temporal autoregressive model (STAR). Therefore I need to create a spatial weight matrix W with N × T rows and N × T columns to weight country interdependencies based on yearly trade data. Could someone please tell me how I to create such a matrix in R or Stata?
Relevant answer
Answer
Dear Jan,
OK ! I see ! you need to create the spatial weight matrix indeed !
There are many possibilities in R:
I strongly advise to work with sf because it so easier now !
but spdp may still be clearly adaptated to you context :
This is one of the definitive books on the subject in R:
there are other references but they are more geospatial (point process) oriented.
Here you should use one of those packages and that nb2mat from package spdp might do the trick !
All the best.
Franck.
  • asked a question related to Data
Question
3 answers
Is analytics based on Big Data and artificial intelligence already capable of predicting what we will think about tomorrow, that we need something, that we should perhaps buy something we think we need?
Can an AI-equipped internet robot using the results of research carried out by Big Data advanced socio-economic analytics systems and employed in the call centre department of a company or institution already forecast, in real time, the consumption and purchase needs of a specific internet user on the basis of a conversation with a potential customer and, on this basis, offer internet users the purchase of an offer of products or services that they themselves would probably think they need in a moment?
On the basis of analytics of a bank customer's purchases of products and services, analytics of online payments and settlements and bank card payments, will banks refine their models of their customers' purchase preferences for the use of specific banking products and financial services? for example, will the purchase of a certain type of product or service result in an offer of, for example, a specific insurance or bank loan to a specific customer of the bank?
Will this be an important part of the automation of the processes carried out within the computerised systems concerning customer relations etc. in the context of the development of banking in the years to come?
For years, in databases, data warehouses and Big Data platforms, Internet technology companies have been collecting information on citizens, Internet users, customers using their online information services.
Continuous technological progress increases the possibilities of both obtaining, collecting and processing data on citizens in their role as potential customers, consumers of Internet offers and other media, Internet information services, offers of various types of products and services, advertising campaigns that also influence the general social awareness of citizens and the choices people make concerning various aspects of their lives. The new Industry 4.0 technologies currently being developed, including Big Data Analytics, cloud computing, Internet of Things, Blockchain, cyber security, digital twins, augmented reality, virtual reality and also machine learning, deep learning, neural networks and artificial intelligence will determine the rapid technological progress and development of applications of these technologies in the field of online marketing in the years to come as well. The robots being developed, which collect information on specific content from various websites and webpages, are able to pinpoint information written by internet users on their social media profiles. In this way, it is possible to obtain a large amount of information describing a specific Internet user and, on this basis, it is possible to build up a highly accurate characterisation of a specific Internet user and to create multi-faceted characteristics of customer segments for specific product and service offers. In this way, digital avatars of individual Internet users are built in the Big Data databases of Internet technology companies and/or large e-commerce platforms operating on the Internet, social media portals. The descriptive characteristics of such avatars are so detailed and contain so much information about Internet users that most of the people concerned do not even know how much information specific Internet-based technology companies, e-commerce platforms, social media portals, etc. have about them.
Geolocalisation added to 5G high-speed broadband and information technology and Industry 4.0 has, on the one hand, made it possible to develop analytics for identifying Internet users' shopping preferences, topics of interest, etc., depending on where, specifically geographically, they are at any given time with the smartphone on which they are using certain online information services. On the other hand, the combination of the aforementioned technologies in the various applications developed in the applications installed on the smartphone has made it possible, on the one hand, to increase the scale of data collection on Internet users, and, on the other hand, also to increase the efficiency of the processing of this data and its use in the marketing activities of companies and institutions and the implementation of these operations increasingly in real time in the cloud computing, the presentation of the results of the data processing operations carried out on Internet of Things devices, etc.
It is becoming increasingly common for us to experience situations in which, while walking with a smartphone past some physical shop, bank, company or institution offering certain services, we receive an SMS, banner or message on the Internet portal we have just used on our smartphone informing us of a new promotional offer of products or services of that particular shop, company, institution we have passed by.
In view of the above, I would like to address the following question to the esteemed community of scientists and researchers:
Is analytics based on Big Data and artificial intelligence, conducted in the field of market research, market analysis, the creation of characteristics of target customer segments, already able to forecast what we will think about tomorrow, that we need something, that we might need to buy something that we consider necessary?
Is analytics based on Big Data and artificial intelligence already capable of predicting what we will think about tomorrow?
The text above is my own, written by me on the basis of my research.
In writing this text, I did not use other sources or automatic text generation systems such as ChatGPT.
Copyright by Dariusz Prokopowicz
What do you think about this topic?
What is your opinion on this subject?
Please answer,
I invite you all to discuss,
Thank you very much,
Best regards,
Dariusz Prokopowicz
Relevant answer
Answer
Predicting individual thoughts and opinions is a complex task due to the inherent complexity and subjectivity of human cognition. Thoughts and opinions are influenced by a myriad of factors such as personal experiences, values, beliefs, cultural background, and individual idiosyncrasies. These factors create a highly nuanced and dynamic landscape that is challenging to capture accurately through data analysis alone.
While analytics based on Big Data and AI can provide valuable insights into general trends and patterns, predicting individual thoughts requires a deep understanding of the context and personal factors that shape an individual's thinking. AI algorithms typically analyze historical data to identify correlations and patterns, which can be useful in predicting collective behavior or trends at a broader level. For example, analyzing social media data can help identify sentiments about a particular topic within a given population.
However, predicting individual thoughts requires accounting for unique and specific circumstances that can significantly impact an individual's perspectives. These circumstances may not be adequately captured in the available data sources or may change rapidly over time. Furthermore, individual thoughts and opinions are not solely influenced by external factors but are also shaped by internal cognitive processes that can be highly subjective and difficult to quantify.
Another challenge lies in the interpretability of AI algorithms. While AI can make predictions based on complex models, explaining how those predictions were generated can be challenging. This lack of interpretability makes it difficult to gain a deep understanding of the underlying factors influencing individual thoughts and opinions, limiting the reliability and trustworthiness of such predictions.
It is important to note that the field of AI is rapidly advancing, and new techniques and approaches are continually emerging. Researchers are working on developing more sophisticated models that can better capture and understand human cognition. However, the ability to predict individual thoughts with complete accuracy still remains a significant challenge.
In summary, while analytics based on Big Data and AI can provide valuable insights and predictions at a collective level, accurately predicting individual thoughts and opinions is a complex task due to the multifaceted nature of human cognition and the limitations of available data sources. While advancements are being made, predicting individual thoughts with certainty remains beyond the current capabilities of AI.
  • asked a question related to Data
Question
4 answers
My team and I are trying to open a dialogue about designing a Continuum of Realism for synthetic data. We want to develop a meaningful way to talk about data in terms of the degree of realism that is necessary for a particular task. We feel the way to do this is by defining a continuum that shows that as data becomes more realistic, the analytic value increases, but so does the cost and risk of disclosure. Everyone seems to be interested in generating the most realistic data, but let's be honest, sometimes that's not the level of realism that we actually need. It is expensive and carries a high reidentification risk when working with PII. Sometimes we just need data to test our code, and we can't justify using this level of realism when the risk is so high. Have you also encountered this issue? Are you interested in helping us fulfill our mission? Ultimately we are trying to save money and protect consumer privacy. We would love to hear your thoughts!
Relevant answer
Answer
Yes, there is a continuum of realism for synthetic data. At one end of the continuum, we have completely synthetic data that is generated based on mathematical models or simulations. This type of data can be useful for testing hypotheses, exploring different scenarios, and evaluating methods without the constraints and biases of real-world data. However, it may not reflect the complexity and diversity of real-world data, and may not be useful for certain applications, such as training machine learning models.
At the other end of the continuum, we have real-world data that is collected directly from sources such as surveys, medical records, or social media platforms. This type of data can provide a rich and diverse representation of the phenomena of interest but may be limited by factors such as sample size, data quality, and ethical considerations.
Between these two extremes, we have various levels of realism that can be achieved through the use of synthetic data. For example, data may be generated based on real-world data using methods such as data augmentation or data synthesis, which can create new data points that are similar to the real data but with some degree of randomness or variability. Alternatively, data may be generated based on simulations or generative models that incorporate known properties of the real-world data, such as distributional properties or relationships between variables.
As for your second question, as an AI language model, I am always ready to provide help and guidance on topics related to synthetic data and statistics. Please let me know if there is anything specific that I can assist you with.
  • asked a question related to Data
Question
4 answers
Hi everyone,
I need to convert standard error (SE) into standard deviation (SD). The formula for that is
SE times the square root of the sample size
By 'sample size', does it mean the total sample size or sample sizes of individual groups? For example, the intervention group has 40 participants while the control group has 39 (so the total sample size is 79) So, when calculating SD for the intervention group, do I use 40 as the sample size or 79?
Thank you!
Relevant answer
Answer
7.7.3.2 Obtaining standard deviations from standard errors and (cochrane.org)
also, there is useful calculator in the attached Excel file from Cochrane.
  • asked a question related to Data
Question
2 answers
If ChatGPT is merged into search engines developed by internet technology companies, will search results be shaped by algorithms to a greater extent than before, and what risks might be involved?
Leading Internet technology companies that also have and are developing search engines in their range of Internet information services are working on developing technological solutions to implement ChatGPT-type artificial intelligence into these search engines. Currently, there are discussions and considerations about the social and ethical implications of such a potential combination of these technologies and offering this solution in open access on the Internet. The considerations relate to the possible level of risk of manipulation of the information message in the new media, the potential disinformation resulting from a specific algorithm model, the disinformation affecting the overall social consciousness of globalised societies of citizens, the possibility of a planned shaping of public opinion, etc. This raises another issue for consideration concerning the legitimacy of creating a control institution that will carry out ongoing monitoring of the level of objectivity, independence, ethics, etc. of the algorithms used as part of the technological solutions involving the implementation of artificial intelligence of the ChatGPT type in Internet search engines, including those search engines that top the rankings of Internet users' use of online tools that facilitate increasingly precise and efficient searches for specific information on the Internet. Therefore, if, however, such a system of institutional control on the part of the state is not established, if this kind of control system involving companies developing such technological solutions on the Internet does not function effectively and/or does not keep up with the technological progress that is taking place, there may be serious negative consequences in the form of an increase in the scale of disinformation realised in the new Internet media. How important this may be in the future is evident from what is currently happening in terms of the social media portal TikTok. On the one hand, it has been the fastest growing new social medium in recent months, with more than 1 billion users worldwide. On the other hand, an increasing number of countries are imposing restrictions or bans on the use of TikTok on computers, laptops, smartphones etc. used for professional purposes by employees of public institutions and/or commercial entities. It cannot be ruled out that new types of social media will emerge in the future, in which the above-mentioned technological solutions involving the implementation of ChatGPT-type artificial intelligence into online search engines will find application. Search engines that may be designed to be operated by Internet users on the basis of intuitive feedback and correlation on the basis of automated profiling of the search engine to a specific user or on the basis of multi-option, multi-criteria search controlled by the Internet user for specific, precisely searched information and/or data. New opportunities may arise when the artificial intelligence implemented in a search engine is applied to multi-criteria search for specific content, publications, persons, companies, institutions, etc. on social media sites and/or on web-based multi-publication indexing sites, web-based knowledge bases.
In view of the above, I address the following question to the esteemed community of scientists and researchers:
If ChatGPT is merged into search engines developed by online technology companies, will search results be shaped by algorithms to a greater extent than before, and what risks might be associated with this?
What is your opinion on the subject?
What do you think about this topic?
Please respond,
I invite you all to discuss,
Thank you very much,
Best wishes,
Dariusz Prokopowicz
Relevant answer
Answer
Machine learning has been integrated into search engines for a while (see for example reference [1-3]). The most prevalent use that I know of is query expansion which provide suggestions for search. Prior to ML implementations, Levenshtein edit distance or dictionaries were used to correct misspellings and provide alternative query words.
There has been a worry for some time of possible preference nudging [4] by AI, machine learning [5] and even search engines themselves. This leads to influence in user behavior that can be amplified by technologies such as ChatGPT. This concern is amplified if implemented in search engines such as Google Scholar and similar scientific literature were the results are dictated not by boolean search results but by biased (in both the technical and common use of the term) algorithms.
Besides nudging, the factual accuracy of ChatGPT is of concern (but covered in your other posts). But taken together, the nudging plus inaccuracy can lead to possible undesired outcomes were individuals are guided towards incorrect decisions without being offered a choice.
References
[1] deep learning for Search (2019) by Tommaso Teofili
[2] Imani, A., Vakili, A., Montazer, A., & Shakery, A. (2019). Deep neural networks for query expansion using word embeddings. In Advances in Information Retrieval: 41st European Conference on IR Research, ECIR 2019, Cologne, Germany, April 14–18, 2019, Proceedings, Part II 41 (pp. 203-210). Springer International Publishing.
[3] Diaz, F., Mitra, B., & Craswell, N. (2016). Query expansion with locally-trained word embeddings. arXiv preprint arXiv:1605.07891.
[4] de Ridder, D., Kroese, F., & van Gestel, L. (2022). Nudgeability: Mapping conditions of susceptibility to nudge influence. Perspectives on Psychological Science, 17(2), 346-359.
[5]Lembcke, T. B., Engelbrecht, N., Brendel, A. B., & Kolbe, L. (2019). To nudge or not to nudge: ethical considerations of digital nudging based on its behavioral economics roots.
  • asked a question related to Data
Question
2 answers
How to build a Big Data Analytics system based on artificial intelligence more perfect than ChatGPT that learns but only real information and data?
How to build a Big Data Analytics system, a Big Data Analytics system, analysing information taken from the Internet, an analytics system based on artificial intelligence conducting real-time analytics, integrated with an Internet search engine, but an artificial intelligence system more perfect than ChatGPT, which will, through discussion with Internet users, improve data verification and will learn but only real information and data?
Well, ChatGPT is not perfect in terms of self-learning new content and perfecting the answers it gives, because it happens to give confirmation answers when there is information or data that is not factually correct in the question formulated by the Internet user. In this way, ChatGPT can learn new content in the process of learning new but also false information, fictitious data, in the framework of the 'discussions' held. Currently, various technology companies are planning to create, develop and implement computerised analytical systems based on artificial intelligence technology similar to ChatGPT, which will find application in various fields of big data analytics, will find application in various fields of business and research work, in various business entities and institutions operating in different sectors and industries of the economy. One of the directions of development of this kind of artificial intelligence technology and applications of this technology are plans to build a system of analysis of large data sets, a system of Big Data Analytics, analysis of information taken from the Internet, an analytical system based on artificial intelligence conducting analytics in real time, integrated with an Internet search engine, but an artificial intelligence system more perfect than ChatGPT, which will, through discussion with Internet users, improve data verification and will learn but only real information and data. Some of the technology companies are already working on this, i.e. on creating this kind of technological solutions and applications of artificial intelligence technology similar to ChatGPT. But presumably many technology start-ups that plan to create, develop and implement business specific technological innovations based on a specific generation of artificial intelligence technology similar to ChatGPPT are also considering undertaking research in this area and perhaps developing a start-up based on a business concept of which technological innovation 4.0, including the aforementioned artificial intelligence technologies, is a key determinant.
In view of the above, I address the following question to the esteemed community of scientists and researchers:
How to build a Big Data Analytics system, a system of Big Data Analytics, analysis of information taken from the Internet, an analytical system based on Artificial Intelligence conducting real-time analytics, integrated with an Internet search engine, but an Artificial Intelligence system more perfect than ChatGPT, which will, through discussion with Internet users, improve data verification and will learn but only real information and data?
What do you think about this topic?
What is your opinion on this subject?
Please respond,
I invite you all to discuss,
Thank you very much,
Best wishes,
Dariusz Prokopowicz
Relevant answer
Answer
In the last years, more and more managers in organizations are reliant on a wide range of information systems that provide them with analytics methods and features to support their decisions and planning activities. With the trend of the growing automation of business processes and subsequently growing a huge amount of data stored in databases, big data analytics methods can now take full advantage of this trend to support decision makers in their decision-making process.
Regards,
Shafagat
  • asked a question related to Data
Question
5 answers
How can artificial intelligence such as ChatGPT and Big Data Analytics be used to analyse the level of innovation of new economic projects that new startups that are planning to develop implementing innovative business solutions, technological innovations, environmental innovations, energy innovations and other types of innovations?
The economic development of a country is determined by a number of factors, which include the level of innovativeness of economic processes, the creation of new technological solutions in research and development centres, research institutes, laboratories of universities and business entities and their implementation into the economic processes of companies and enterprises. In the modern economy, the level of innovativeness of the economy is also shaped by the effectiveness of innovation policy, which influences the formation of innovative startups and their effective development. The economic activity of innovative startups generates a high investment risk and for the institution financing the development of startups this generates a high credit risk. As a result, many banks do not finance business ventures led by innovative startups. As part of the development of systemic financing programmes for the development of start-ups from national public funds or international innovation support funds, financial grants are organised, which can be provided as non-refundable financial assistance if a startup successfully develops certain business ventures according to the original plan entered in the application for external funding. Non-refundable grant programmes can thus activate the development of innovative business ventures carried out in specific areas, sectors and industries of the economy, including, for example, innovative green business ventures that pursue sustainable development goals and are part of green economy transformation trends. Institutions distributing non-returnable financial grants should constantly improve their systems of analysing the level of innovativeness of business ventures planned to be implemented by startups described in applications for funding as innovative. As part of improving systems for verifying the level of innovativeness of business ventures and the fulfilment of specific set goals, e.g. sustainable development goals, green economy transformation goals, etc., new Industry 4.0 technologies implemented in Business Intelligence analytical platforms can be used. Within the framework of Industry 4.0 technologies, which can be used to improve systems for verifying the level of innovativeness of business ventures, machine learning, deep learning, artificial intelligence (including e.g. ChatGPT), Business Intelligence analytical platforms with implemented Big Data Analytics, cloud computing, multi-criteria simulation models, etc., can be used. In view of the above, in the situation of having at one's disposal appropriate IT equipment, including computers equipped with new generation processors characterised by high computing power, it is possible to use artificial intelligence, e.g. ChatGPT and Big Data Analytics and other Industry 4.0 technologies to analyse the level of innovativeness of new economic projects that plan to develop new start-ups implementing innovative business solutions, technological, ecological, energy and other types of innovations.
In view of the above, I address the following question to the esteemed community of scientists and researchers:
How can artificial intelligence such as ChatGPT and Big Data Analytics be used to analyse the level of innovation of new economic projects that plan to develop new startups implementing innovative business solutions, technological innovations, ecological innovations, energy innovations and other types of innovations?
What do you think?
What is your opinion on this subject?
Please respond,
I invite you all to discuss,
Thank you very much,
Warm regards,
Dariusz Prokopowicz
Relevant answer
Answer
Enhancements to Tableau for Slack focuses on sharing, search and insights with automated workflows for tools like Accelerator. The goal: empower decision makers and CRM teams to put big data to work...
The changes also presage what’s coming next: integration of recently announced generative AI model Einstein GPT, the fruit of Salesforce’s collaboration with ChatGPT maker OpenAI, with natural language-enabled interfaces to make wrangling big data a low-code/no-code operation...
  • asked a question related to Data
Question
3 answers
Dear researchers,
I am working on a project related to solar wind. I want to download a 1-minute resolution data from a BepiColombo spacecraft. However, I am struggling with that. Do you know any websites to download the data? OR, If you could help me to provide a BepiColombo data just for few days, it would be very helpful. I am expecting a valuable comments from a wonderful personalities.
Many thanks.
Relevant answer
Answer
Gotcha.
Umph. And Ulysses data are too old and Cluster's are in the wrong place...
I'd drop Dr. Heyner a line:
  • asked a question related to Data
Question
3 answers
Does analytics based on sentiment analysis of changes in Internet user opinion using Big Data Analytics help detect fakenews spread as part of the deliberate spread of disinformation on social media?
The spread of disinformation on social media used by setting up fake profiles and spreading fakenews on these media is becoming increasingly dangerous in terms of the security of not only specific companies and institutions but also the state. The various social media, including those dominating this segment of new online media, however, differ considerably in this respect. The problem is more acute in the case of those social media which are among the most popular and on which mainly young people function, whose world view can be more easily influenced by factual information and other disinformation techniques used on the Internet. Currently, among children and young people, the most popular social media include Tik Tok, Instagram and YouTube. Consequently, in recent months, the development of some social media sites such as Tik Tok is already being restricted by the governments of some countries by banning the use, installation of this application of this portal on smartphones, laptops and other devices used for official purposes by employees of public institutions. These actions are argued by the governments of these countries in order to maintain a certain level of cyber security and reduce the risk of surveillance, theft of data and sensitive, strategic and particularly security-sensitive information of individual institutions, companies and the state. In addition, there have already been more than a few cases of data leaks on other social media portals, telecoms, public institutions, local authorities and others based on hacking into the databases of specific institutions and companies. In Poland, however, the opposite is true. Not only does the organised political group PIS not restrict the use of Tik Tok by employees of public institutions, but it also motivates the use of this portal by politicians of the ruling PIS option to publish videos as part of the ongoing electoral campaign, which would increase the chances of winning parliamentary elections for the third time in autumn this year 2023. According to analysts researching the problem of growing disinformation on the Internet, in highly developed countries it is enough to create 100 000 avatars, i.e. non-existent fictitious persons, created as it were and seemingly functioning thanks to the Internet by creating profiles of these fictitious persons on social media portals referred to as fake profiles created and functioning on these portals, to seriously influence the world view, the general social awareness of Internet users, i.e. usually the majority of citizens in the country. On the other hand, in third world countries, in countries with undemocratic systems of power, all that is needed for this purpose is about 1,000 avatars of these fictitious people with stories modelled, for example, on famous people such as, in Poland, a well-known singer claiming that there is no pandemic and that vaccines are an instrument for increasing control of citizens by the state. The analysis of changes in the world view of Internet users, changes in trends concerning social opinion on specific issues, evaluations of specific product and service offers, brand recognition of companies and institutions can be conducted on the basis of sentiment analysis of changes in the opinion of Internet users using Big Data Analytics. Consequently, this type of analytics can be applied and of great help in detecting factual news disseminated as part of the deliberate spread of disinformation on social media.
In view of the above, I address the following question to the esteemed community of scientists and researchers:
Does analytics based on sentiment analysis of changes in the opinions of Internet users using Big Data Analytics help in detecting fakenews spread as part of the deliberate spread of disinformation on social media?
What is your opinion on this topic?
What is your opinion on this subject?
Please respond,
I invite you all to discuss,
Thank you very much,
Warm regards,
Dariusz Prokopowicz
Relevant answer
Answer
Yes, sentiment analysis based on Big Data Analytics can help in detecting fake news spread as part of the deliberate spread of disinformation on social media. Sentiment analysis involves the use of natural language processing and machine learning techniques to analyze large amounts of textual data, such as social media posts, to identify the sentiment expressed in the text. By analyzing changes in the sentiment of Internet users towards a particular topic or event, it is possible to identify patterns of misinformation and disinformation.
For example, if there is a sudden surge in negative sentiment towards a particular politician or political party, it could be an indication of a disinformation campaign aimed at spreading negative propaganda. Similarly, if there is a sudden increase in positive sentiment towards a particular product or service, it could be an indication of a paid promotion or marketing campaign.
However, it is important to note that sentiment analysis alone may not be enough to detect fake news and disinformation. It is also important to consider other factors such as the source of the information, the credibility of the information, and the context in which the information is being shared. Therefore, a comprehensive approach involving multiple techniques and tools may be necessary to effectively detect and combat fake news and disinformation on social media.
  • asked a question related to Data
Question
6 answers
I am tiding up with the below problem, it's a pleasure to have your ideas.
I've written a coding program in two languages, Python and R, but each came to a completely different result. Before jumping to a conclusion, I declare that:
- Every word of the code in two languages has multiple checks and is correct and represents the same thing.
- The used packages in two languages are the same version.
So, what do you think?
The code is about applying deep neural networks for time series data.
Relevant answer
Answer
Good morning, without the code it is difficilt to know where is the difference I do not use Python i work on R but maybe these difference is due to the stage of spitting dataset do you try to add thr same number in the count of generator of randomly for example seed(1234) (if my memory is good this function is also used in Python language. Were your results and metrics of evaluation totally different? In this case, mayve there is a reliability issue in your model. You should check your data preparation and features selection .
  • asked a question related to Data
Question
3 answers
Where can I get the original paper by Ackoff from 1989 titled "From data to wisdom" published in Journal of applied systems analysis?
The file that is publicly available under this title is a 2-page article from 1999.
Relevant answer
Answer
See enclosed file. I think this is what you are looking for.
Best regards.
  • asked a question related to Data
Question
5 answers
Where can I get global data (freely accessible) for small and micro-enterprises across countries and time?
Relevant answer
  • asked a question related to Data
Question
13 answers
I am currently using Fisher's exact test as some of my cell counts are <5. I have done this for lots of data in the same dataset which have generally been 2x3 or more (so reported Cramer's V as well), however now I am running 2x2, the Fisher's output is blank and I can't figure out why?! I have attached an example of the output - any help would be gratefully received!
Relevant answer
Answer
Daniel Wright Sal Mangiafico thank you both for your help! Think I was too immersed in what I was doing that I hadn’t considered I would only be reporting the p-values for Fishers! The previous outputs with values on the Fishers row had thrown me and my common sense!! Really appreciate your help
  • asked a question related to Data
Question
10 answers
Since statistics is the golden key to interpret data in almost all scientific & social branches !
Relevant answer
Answer
Am in support of your proposal Sinan Ibaguner but such a course needs the orientation framework of scientific methodology and, of course, data science. Commanding such a course, defining some textbook and following examinations is more an administrative task; I stay with Gauss, that we need to understand the underlying mathematical idea of a measurement, before we execute the mechanical operation of interpreting data.
  • asked a question related to Data
Question
3 answers
I would like to fit some data sets. I read that LEVMW is the best program. does anyone has this program or a link for it?
Or are there other programms better than LEVMW?
Relevant answer
Answer
Here you can find the website for LEVM/LEVMW.
  • asked a question related to Data
Question
5 answers
Greetings respectable community of ResearchGate. I encountered some issues while gathering data from the World Bank Database, hence I would like to know if there are alternatives or other websites like the World Bank Database in which we can gather raw data.
The website can contain whatever form of indicators such as (developments, governance, competitiveness, economics, financial sector, etc.….) Thank you in advance for your assistance.
Relevant answer
Answer
There are many websites and databases that provide access to raw data on a variety of indicators, such as those related to development, governance, competitiveness, economics, and the financial sector. Some examples of websites that provide such data include:
United Nations Development Programme: https://data.undp.org/
International Monetary Fund: https://www.imf.org/en/Data
United Nations Statistics Division: https://unstats.un.org/unsd/default.htm
World Health Organization: https://www.who.int/data/gho
Organisation for Economic Co-operation and Development: https://data.oecd.org/
These websites provide access to a wide range of indicators and data on various topics, and can be a useful resource for those looking for raw data for their research or analysis.
  • asked a question related to Data
Question
6 answers
It is a dta file, can not be read with read.csv. I used haven to read it. However, the second row of column is real name, such as Checking Year and Checking Month , how can you extract it?
Relevant answer
Answer
Jochen Wilhelm Sorry to make you misunderstanding my questions. I want to extract all labels of those columns as column names. For example, we want extract label names: Individual id, Household ID, Community ID, Checking Year, Checking Season, Checking Date, Checking Day. And Set those names as column names.
Please contact me if more information was needed :)
  • asked a question related to Data
Question
1 answer
I had the same problem with the data collected from Thermo Scientific XPS. Here is the solution I found out, it may help other researchers.
Step 1: Open Avantage (on the XPS measurement computer), then go to "file"->"save as", a new window pops out.
📷(see the picture below)
Step 2: Name your .vgp file and click save. A new folder will be created, like Data Grid 2. DATA
Step 3: Open the software "DataSpace_BatchDump.exe". It can be found on the same computer with Avantage installed, in "C:\Program Files (x86)\Thermo\Avantage\Bin".
Step 4: Open the folder with .vgp/.vgd files in DataSpace_BatchDump, and click OK. Then, a new window pops out, find a location to export the files, like “C:\export”, then click ok twice. New .avg files will be saved in that location.
📷(see the picture below)
Step 5: Open CasaXPS, click “covert”, find “C:\export”. Type “.dth” and click ok. Then, the .vms files would be created.
📷(see the picture below)
Relevant answer
Answer
Weixin,thank you! Your detailed explanation helped me a lot.
  • asked a question related to Data
Question
4 answers
Can artificial intelligence help optimize remote communication and information flow in a corporation, in a large company characterized by a multi-level, complex organizational structure?
Are there any examples of artificial intelligence applications in this area of large company operations?
In large corporations characterized by a complex, multi-level organizational structure, the flow of information can be difficult. New ICT and Industry 4.0 information technologies are proving to be helpful in this regard, improving the efficiency of the flow of information flowing between departments and divisions in the corporation. One of the Industry 4.0 technologies that has recently found various new applications is artificial intelligence. Artificial intelligence technology is finding many new applications in recent years. The implementation of artificial intelligence, machine learning and other Industry 4.0 technologies into various business fields of companies, enterprises and financial institutions is associated with the increase in digitization and automation of processes carried out in business entities. For several decades, in order to refine and improve the flow of information in a corporation characterized by a complex organizational structure, integrated information systems are being implemented that informationally connect applications and programs operating within specific departments, divisions, plants, etc. in a large enterprise, company, corporation. Nowadays, a technology that can help optimize remote communication and information flow in a corporation is artificial intelligence. Artificial intelligence can help optimize information flow and data transfer within a corporation's intranet.
Besides, the technologies of Industry 4.0, including artificial intelligence, can help improve the cyber security techniques of data transfer, including that carried out in email communications.
In view of the above, I address the following question to the esteemed community of researchers and scientists:
Can artificial intelligence help optimize remote communication and information flow in a corporation, in a large company characterized by a multi-level, complex organizational structure?
What is your opinion on this issue?
Please answer,
I invite everyone to join the discussion,
Thank you very much,
Best regards,
Dariusz Prokopowicz
Relevant answer
Answer
AI as technology has tremendous capabilities. However, any practical result is implementation talent dependent. A great poet and a mediocre poet may use similar words/language, but the results would be very different.
  • asked a question related to Data
Question
3 answers
Hello, i'm looking for a reliable insect pest database that shares information about the occurrence, geographic distribution, hosts of all insect pest over the world... I made a little research on my own but the results aren't quite reliable in my opinion from the databases i found (gbif, cabi..). I also believe that many technical reports revealing the occurrence of those insects are being published in governmental research centers of every country however they aren't accessible online. Is there a way to get access to those reports?
Relevant answer
Answer
Dear Jane Dalley
Thank you for the reply.
I am indeed interested in agricultural pests however I am looking for an online database that allows me to get informations about those insects (taxonomic, feeding mode, distribution, origin...)
  • asked a question related to Data
Question
4 answers
Hi Researchers,
I am looking for journals that publish scientifically valuable datasets, and research that advances the sharing and reuse of scientific data. Please let me know if you have any recommendations.
FYI, the dataset we are curating are bioinformatics image data.
Relevant answer
Answer
Have a look at Scientific Data: https://www.nature.com/sdata/
  • asked a question related to Data
Question
13 answers
Greetings everyone! could anyone kindly tell me where I can get data concerning the stock and bonds market?
Relevant answer
Answer
Yahoo. Finance, . Blomberg.
  • asked a question related to Data
Question
3 answers
Am looking for an email datasets from B2B sales. Is any data sources available?
Relevant answer
Answer
Shafagat Mahmudova , Fredrick Ishengoma what am looking for mail conversations among B2B sales, not as a email list . Thanks for this
  • asked a question related to Data
Question
1 answer
I'm new to fsQCA and would like to conduct a fsQCA analysis in one of my dissertation studies. The majority of fsQCA methods begin with data collection, followed by the preparation of the data matrix and the truth table. Conditions and outcomes are represented in the data matrix. Conditions, as we know, are the responses to surveys or interviews. I'm curious where the "outcomes" come from. Do we ask participants to rate the outcomes on a scale, as they did in the conditions?
For example, in the data matrix attached, there are five conditions (LPI, TAB, WPP, PAP, and NR) and outcomes are indicated by PubINF.
I acknowledge this is a very basic question, but I look forward to receiving your response.
Relevant answer
Answer
Nayan Kadam Fs/QCA (fuzzy-set qualitative comparative analysis) is a social science approach that combines case-oriented and variable-oriented quantitative research. It began with the development of qualitative comparative analysis, and Fs/QCA was later developed using fuzzy-set theory.
  • asked a question related to Data
Question
8 answers
I tried using Gigasheet but it does not have many features that are available in excel. Suggest me some freely available sources where I can load my ~1.7 million rows and do some calculations like sort multiple columns, remove duplicates
TIA
Relevant answer
Answer
I'd do this in R. It's free and sufficiently powerful to hande such tables with ease.
This could be an example work-flow (assuming you have a file named "my_huge_file.txt" which is a tab-delimited text file with many rows and a header row that contains the texts "ID", "value", "name" and "amount" [just for example!] as column names):
# read the file into a data.frame:
df <- read.delim("my_huge_file.txt")
# sort the rows by values in the column "name", then "value":
df <- df[order(df$name, df$value), ]
# remove all rows with dupplicate entries in the column "ID":
df <- df[!duplicated(df$ID), ]
# get the 5 rows with the largest values in the column "amount":
o <- order(df$amount, decreasing = TRUE)[1:5]
df <- df[o, ]
# getting the mean of the values in the colum "value"
# by each value in the column "name":
tapply(df$value, INDEX = df$name, FUN = mean)
  • asked a question related to Data
Question
4 answers
Hello,
Does anyone know how to change the time length to 2 or 3s in Ansys Static Structural -> Model -> Static Structural -> Pressure -> Magnitude -> Function -> Tabular data? The current setup only allows me to go up to 1s.
Thanks!
Relevant answer
Answer
You must start entering from higher end to the start. for example, enter 30th second step end time data and then proceed to to 1st second step end time data, that way the step controls entering is smoothly done.
  • asked a question related to Data
Question
5 answers
Hi all,
I'm having trouble converting one particular variable in my dataset from string to numeric. I've tried manually transforming/recoding into a different variable and automatic recoding. I've also tried writing syntax (see below). The same syntax has worked for every other variable I needed to convert but this one. For all methods (manual recode, automatic recode, and writing a syntax), I end up with missing data.
recode variablename ('Occurred 0 times' = 0) ('Occurred 1 time' = 1) ('Occurred 2 times' = 2) ('Occurred 3+ times' = 3) into Nvariablename.
execute.
VALUE LABELS
Nvariablename
0 'Occurred 0 times'
1 'Occurred 1 time'
2 'Occurred 2 times'
3 'Occurred 3+ times'.
EXECUTE.
Thank you in advance for your help!
Relevant answer
Answer
Konstantinos Mastrothanasis, by introducing manual copying & pasting etc., you make reproducibility much more difficult. IMO, anything that can be done via command syntax ought to be done via command syntax. The basic code Ange H. posted will work for the particular values she showed in her post--see the example below. If it is not working, that suggests there are other values present in the dataset other than the ones she has shown us. But we are still waiting for her to upload a small file including the problematic cases.
Meanwhile, here is the aforementioned example that works.
* Read in the values Angela showed in her post.
NEW FILE.
DATASET CLOSE ALL.
DATA LIST LIST / svar(A20).
BEGIN DATA
'Occurred 0 times'
'Occurred 1 time'
'Occurred 2 times'
'Occurred 3+ times'
END DATA.
LIST.
* Recode svar to nvar.
RECODE svar
('Occurred 0 times' = 0)
('Occurred 1 time' = 1)
('Occurred 2 times' = 2)
('Occurred 3+ times' = 3) into nvar.
FORMATS nvar (F1).
VALUE LABELS nvar
0 'Occurred 0 times'
1 'Occurred 1 time'
2 'Occurred 2 times'
3 'Occurred 3+ times'
.
CROSSTABS svar BY nvar.
  • asked a question related to Data
Question
6 answers
Dear researchers, we tried to download AOD data from aeronet for Nepal stations, however maximum data are missing. Are there any other appropriate website to download AOD data? Need your suggestions, thanks. :)
Relevant answer
Answer
You can switch to satellite data sets or Re-analysis datasets to get aerosol optical depth !!
  • asked a question related to Data
Question
3 answers
My lab wants to try to do as much of our pre-processing, processing, and analysis in R as possible, for ease of workflow and replicability. We use a lot of psychophysiological measures and have historically used MATLAB for our workflow with this type of data. We want to know if anyone has been successful in using R for these types of tasks.
Any good R packages?
Relevant answer
Answer
Begin by consulting this very useful book by Jared Lander R for everyone to get started its available from the z-library and contains much useful research grade code. To find useful packages in R Google what you want to do alongside R package. Example attached. Best wishes David Booth
  • asked a question related to Data
Question
6 answers
I wanted to know the purity of MgSO4 and i went for XRF analysis for the same. I received data in the following format
[Quantitative Result]
---------------------------------------------------------------------------------
Analyte Result Proc-Calc Line Net Int. BG Int.
---------------------------------------------------------------------------------
S 61.3709 % Quant.-FP S Ka 1165.666 3.578
Mg 37.9584 % Quant.-FP MgKa 158.225 0.918
Ca 0.5466 % Quant.-FP CaKa 4.244 1.142
Si 0.1241 % Quant.-FP SiKa 0.862 0.148
can anyone please help me by explaining how to find purity of MgSO4 from this.
Relevant answer
Answer
Oxygen is tough for XRF because of its very low K-apha energy.
So it does not show up in the list.
However there seems to be something odd, as Gustavo Henrique de Magalhães Gomes mentioned above.
For MgSO4 your will have Mg:S = 1:1 in atomic ratio.
This will also hold for MgS.
But your XRF data say Mg:S = 38:61 ~ 1:2 (very roughly).
You should check your XRF system with certified materials...
  • asked a question related to Data
Question
9 answers
I welcome Answers and opinions.
Relevant answer
Answer
Citing under References is enough but if you want to publish any data table you need to take permission from the publisher
  • asked a question related to Data
Question
10 answers
Based on the literature review we get idea of moderators. But what we if want to introduce a new moderator in the literature.
1) What are the criteria for new moderator ?
2) How to theoretically support moderating variable ?
3) Is is necessary to adopt new moderating variable from same theory ?
Relevant answer
Answer
  • asked a question related to Data
Question
3 answers
Is there any special procedure to follow to get data?
  • asked a question related to Data
Question
2 answers
For RWE studies is important to find the correct RWD sources, there fore I am looking for other sources of Japanese drugs codes, diagnosis codes, etc. Other than JMDC. It would be helpful if any of you could help me.
  • asked a question related to Data
Question
7 answers
Hello;
We have two twenty years data sets, for a historical time span, and a future prediction. for both, statistical distributions are fitted for five-year intervals, and for historical and predicted data, the same statistical distribution (Johnson SB, Gen. Pareto, and Wakeby) اave been selected as the most appropriate distributions.
Similar statistical distributions have been obtained for all five-year intervals and for the entire twenty-year time series. We want to know what this similarity in data analysis means?
Best
Saeideh
Relevant answer
Answer
Thanks for the plots they look pretty good to me.,Best wishes David Booth
  • asked a question related to Data
Question
2 answers
Hello,
I have a huge number of data and I need to calculate such categorical statistical indices (e.g., POD, FAR, CSI, ETS) using python or R. I will be thankful for any kind of help.
Regards,
Relevant answer
Answer
Binary Forecast Verification is included within SwirlsPy - open-source Python lib:
  • asked a question related to Data
Question
3 answers
Dear colleagues
I've a CSV file (5 thousand lines) with information about the year,country,distance,product,quantity..
I can to open the file in notepad++ also..
Could you tell me please, how I can to construct the graph in excel or R studio with quantity or how to consider every quantity , which corresponds to the respective country?
Thank you very much
Relevant answer
Answer
  • asked a question related to Data
Question
4 answers
If I want the annual average of the country production of oil for 2019 and I have 25 stations,
1- should I take the sum ( of 12 months) for each station individually so I get the annual sum for each station and then divide by 25 to calculate country annual
2- or I take the sum of January for the 25 stations and then February .... etc. and then divide by 12 which is number of months to get the annual average of the country
Relevant answer
Answer
These are 2 different averages. The numerator is the same for both 1 and 2 -this is the sum of production of 25 stations for 12 months, i.e. the total annual production of all 25 stations.
But division this numerator by 25 gives you the annual average production per station.
Division by 12 gives you the average production of all 25 stations per month.
There is no single correct average. The average depends on how you define it and what you want to characterize-production per station or production per month..
  • asked a question related to Data
Question
8 answers
I have a model that needs calibration, but I am afraid that if I calibrate using too many model parameters, I will overfit to my data, or the calibration will not be well-done.
Can anyone suggest a method to determine the maximum number of parameters I should use?
Relevant answer
Answer
I think It is Better to guess the function instead of using simple polynomial fit. Ofcourse in all cases we need to keep in mind the outliers . The advantage of using a function is the parameters will have some physical significance according to your model. Also from the chisq/ ndf of different fit you can compare the goodness of the fit.
  • asked a question related to Data
Question
3 answers
I was exploring differential privacy (DP) which is an excellent technique to preserve the privacy of the data. However, I am wondering what will be the performance metrics to prove this between schemes with DP and schemes without DP.
Are there any performance metrics in which a comparison can be made between scheme with DP and scheme without DP?
Thanks in advance.
Relevant answer
Answer
Dear Anik Islam Abhi,
You may want to review the data below:
What is differential data privacy?
Differential privacy (DP) is a system for publicly sharing information about a dataset by describing the patterns of groups within the dataset while withholding information about individuals in the dataset.
Why is differential privacy so important?
Preventing attackers from access to perfect data This deniability aspect of differential privacy is important in cases like linkage attacks where attackers leverage multiple sources to identify the personal information of a target.
What is privacy budget in differential privacy?
Also known as the privacy parameter or the privacy budget. (i) When ε is small. (ε,0)-differential privacy asserts that for all pairs of adjacent databases x, y and all outputs M, an adversary cannot distinguish which is the true database on the basis of observing the output.
What is differential privacy in machine learning?
Differential privacy is a notion that allows quantifying the degree of privacy protection provided by an algorithm on the underlying (sensitive) data set it operates on. Through the lens of differential privacy, we can design machine learning algorithms that responsibly train models on private data.
How much is enough choosing Epsilon for differential privacy?
... The recommended values for ε vary in a big interval, from as small as 0.01 and 0.1 to as big as 7
Who uses differential privacy?
Apple launched differential privacy for the first time in macOS Sierra and iOS 10. Since then, we have expanded to other use cases such as Safari and Health types.
Differential Privacy: General Survey and Analysis of Practicability in the Context of Machine Learning
Franziska Boenisch
  • asked a question related to Data
Question
3 answers
Hello everyone,
Could you recommend an alternative to IDC please to get records from the global datasphere for free?
Thank you for your attention and valuable support.
Regards,
Cecilia-Irene Loeza-Mejía
  • asked a question related to Data
Question
3 answers
I am a research assistant to a doctoral student and she has been asked that for her thesis she must include on a CD a document management system with all the documentation she used, so that the reviewers of the work can quickly search through the documents, filter, search by keywords or between texts, etc....
I have searched and find that there are several such systems:
- OpenDocMan
- Seeddms
- I, Librarian
- OpenKm
- LogicalDOC
- Kimios
- others...
Several of them are web based and would be ideal as they offer the functionality we are looking for, but they are free as long as you are the one setting up a server. Others work as windows software but are not packable on their own to store on a CD. On the other hand I have not found options for free hosting even if it is low capacity and it does not make sense to pay indefinitely for such a system for a thesis work. *Excel is not an option for her unfortunately*.
I would like to know what system you know of that I could set up to search through documents and all this, so that I could save the whole system along with the documents on a CD, or it could be a Web solution but that I could have free hosting.
Thank you.
Relevant answer
Answer
Usually a PhD can be submitted on Word or as a pdf. Everyone can access and search these on just a normal computer. No management system required
  • asked a question related to Data
Question
8 answers
Dear colleagues,
I am studying the most relevant disinformation topics on a given subject and over a period of time. I intend to analyze the content of fact-checking articles.
Relevant answer
Answer
Thank you very much for this question of what content analysis tools u could use to check to (fact-check) the most current misinformation or disinformation fake news.
I am aware that there are always running cases of misinformation worldwide during this globalisation era propelled by telecommunications systems, the internet, and above all, the role of social media.
In some cases, misinformation is a cultural value, a religious principle, a national policy strategy, or a gender-specific issue. You are already tools and websites that help to list sources or qualities of misinformation (disinformation).
I suggest that you focus on the current misinformation (disinformation) activities that are trending in the war between Russia and Ukraine. This will be a very good and current area.
As you are better informed, content analysis is guided by the research topic, problem, and above all the specific objective or research questions. You need to select one or more categories of media that you intend to study (analyse) and the period involved.
How do you do the content analysis in your current research will be done in a systematic way. For example, w to conduct a content analysis do:
1. Select the content you will analyze. Based on your research question, choose the texts that you will analyze.
2. Define the units and categories of analysis.
3. Develop a set of rules for coding.
4. Code the text according to the rules
5. Analyze the results and draw conclusions.
You will draft your first report and keep revising it. Take note that misinformation can be attributed in some cases to official policies at all levels, honest mistakes, poor editing or proofreading, ignorance or functional illiteracy, prejudices or discrimination, as well as good intentions for temporary solutions.
Best regards
Wilson
  • asked a question related to Data
Question
5 answers
I am developing a model to optimise a sustainable energy system for community buildings. The system uses renewable energy, battery storeage and intelligent building management to optimise the energy used by the building. I cannot find any data on electricity use patterns for community builings (Village/Church Halls) across the year. There seems to be lost for domisetic property and some for normal commercial property (offices/shops/factories). I have limited data which shows a marked summer/winter pattern but would be grateful if anyone could share any larger data sets. At the moment the buildings are all in the north of England but ideally we woudl like to develop a model that works anywhere.
Relevant answer
Answer
Referring the data in similar buildings would not be realistic for the success outcomes due to the number of uncertainties; such as zonal area location, altitude & elevation, type of activity, number of occupants & their behaviors, type of utility applicable, building life etc.
My advice is to collect the particular building electricity, water and fuel consumption bills and establish the energy scenario. Further estimate the monthly electrical energy consumption of particular building by the help of an energy equipment inventory with those rated capacities and average operational times. Then you can compare both inventory estimation and bill propagations to decide the energy consumption pattern in the particular premises. This will help you for the accurate and proper system design.
  • asked a question related to Data
Question
3 answers
In R, how do you generate random data whose distribution needs to satisfy specified skewness and kurtosis
Relevant answer
Answer
Skew and kurtosis just two measurements of a distribution. Say more about what you want the distributions to be like. In we create several distributions with skew and kurtosis of different population values, but you need to provide more information to know if any of these are appropriate. If you want sample values of these that are specific, you could transform other distributions into these.
  • asked a question related to Data
Question
3 answers
I have recorded behavioural data, such as incidences of aggression and grooming partners, in a troop of lemurs over three conditions.
What tests should I be using to compare the rates of aggression in the three conditions?
For the grooming partner data, I want to compare grooming between sex dyads. For instance, the frequency of male-male grooming compared to male-female grooming within each condition and then compare the average proportion of grooming between sex dyads in the three conditions. How would I do this?
Thank you in advance for your help. Apologies if this question is poorly worded, I am very new to data analysis.
Relevant answer
Answer
Your explanation or the dataset sounds that it will be suitable for a two-way ANOVA. Other things being fine
  • asked a question related to Data
Question
3 answers
I'm doing a systematic review and I need an access to EMBASE and Scopus databases to do research (my institution doesn't offer such an access)
Could someone help?
Regards,
Relevant answer
Answer
I ran into the same issue a while back, your best bet is to reach out to someone that studies/works in an institution that has access (preferably a friend or a colleague not a stranger) and ask for their institution credentials (you access their account on your computer) though this might not always work since the IP difference is usually detected and you are not always granted access from outside the institution. I would suggest someone in Qatar University, you can use their credentials at your working place and it has basically all the databases. Someone in the US or Occupied Palestine can also do the trick but watch out for the IP issue. Finally, if none of the above works, you can try emailing your search strategy (I would advise tailoring it according to the rules of the database) to someone with access and have them run the search for you and send the results. Good luck!
  • asked a question related to Data
Question
11 answers
Hi,
My research involves looking at whether an education initiative (workplace training) increase employee knowledge and engagement in corporate sustainability missions.
Research includes:
1) A pre Likert Scale questionnaire (18 questions divided roughly into two sections, knowledge and engagement)
2) An education intervention (training)
3) A post Likert Scale questionnaire (18 questions divided roughly into two sections, knowledge and engagement)
These have already taken place and I have questionnaire responses for 20 participants.
How do I go about interpreting this/ analyzing this? I have read lots of different answers online and can't seem to find a common answer
Any help will be appreciated- Thank you
Eimear
Relevant answer
Answer
Hi Lewis,
Thank you. Before the research, I had looked up before/after and decided on Pearson Correlation coefficient but when doing more research, realised this might be wrong?
I'll take a look on YouTube. For my literature review, I have identified several papers that have similar studies, but again, they all use a different method so that's why I'm unsure on which way to approach it
Thank you for your help - I'll also check out that book!
Eimear
  • asked a question related to Data
Question
8 answers
hey guys, I'm working on a new project where I should transfer Facebook ads campaigns data to visualize in tableau or Microsoft power BI, and this job should be done automatically daily, weekly or monthly, I'm planning to use python to build a data pipeline for this, do you have any suggestions or any Resources I can read or any projects similar I can get inspired from ? thank you .
Relevant answer
Answer
To create an ETL pipeline using batch processing, you must first:
1. Construct reference data: create a dataset that outlines the range of possible values for your data.
2. Extract data from various sources: Correct data extraction is the foundation for the success of future ETL processes.
  • asked a question related to Data
Question
22 answers
Data science is a growing field of technology in present context. There have been notable applications of data science in electronic engineering, nanotechnology, mechanical engineering and artificial intelligence. What kind of future scopes available for data science at civil engineering aspects in the field of structural analysis, structural design, geotechnical engineering, hydrological engineering, environmental engineering and sustainable engineering?
Relevant answer
  • asked a question related to Data
Question
3 answers
We ran different experiments in our lab where we exposed corals to different factors, e.g. Experiment 1 looking at ocean acidification and phosphate enrichment, and Experiment 2 looking at ocean acidification and nitrate. In both experiments, we have a control group, each factor (acidification and eutrophication) alone, and then a group exposed to both stressors at the same time. As our sample size is rather small, we thought of pooling the data from different experiments when corals experienced the same treatment, e.g. the pure acidification groups from Experiments 1 and 2. And here is the question: Which test(s) should we run to decide whether we can pool our data or not? We assume that we can only pool the data if there is no significant difference between the response (like respiration rate) in corals exposed to pure acidification in Experiments 1 and 2, correct?
We thought that we could compare the means of the groups (using ANOVA or Kruskal-Wallis), compare the ranks (using PERMANOVA), or look at similarity (using PCO and ANOSIM). Unfortunately, depending on the test, the outcomes of them are different (surprise!) and we don’t know which test is the “correct” one to make the decision to pool or not to pool.
Or maybe we don’t have to test them at all and can just pool them? What is the correct way/test to make this decision?
Relevant answer
Answer
Hello Selma,
I see three options:
1. Treat the two experiments as nested within occasions (use a mixed model for analysis). This would allow you to segregate effects that were due to time/occasion/experiment differences, and would be defensible if there was any likelihood that time/occasion/experiment differences could have influenced results.
2. If you and your team are convinced that the conditions from which you propose to combine data were identical in every other way save for time/occasion/experiment, then pool them and treat as one set.
3. Analyze each set independently, then use Fisher's method to compute a combined p-value for the set of two experiments. [Compute: -2*sum of natural log of individual outcome p-values; refer to a chi-square distribution based on 2*k, where k is the number of outcome p-values being combined.] This approach is sometimes used in meta-analytic studies.
Obviously, do explain your chosen method and the rationale in writing up results.
Good luck with your work.
  • asked a question related to Data
Question
3 answers
Usecase- To provide the security of the data by building Next-generation firewalls or Is there any better firewall type to handle the normal systems. Please do suggest me any answers!!.
Relevant answer
Answer
Dear Roshan Reddy,
To enable security of electronic communications there are three groups of problems - endpoint security, cloud / network security, and identity and access management security. Firewalls are important, but it's just a part of the puzzle.
  • asked a question related to Data
Question
4 answers
I'm trying to analyse some data from my last experiment, where I grew two varieties of potato in a range of pot sizes with well-watered and water-restricted conditions, to see if the size of the pot would affect the relationships between water restriction and measures of plant morphophysiology over time.
Unfortunately, I have absolutely no idea how to analyse these data, which looks like this (5 pot sizes, 2 genotypes, 2 treatments, and about 11 dates)... Each combination of factors was replicated in triplicate. To be honest, I'm not even sure what I'm trying to look for, my brain's not great with numbers so I'm just sitting staring at Minitab. Any help at all would be amazing. Thanks.
Relevant answer
Answer
Cindy – I presume someone is supervising your experiment. It's time to investigate the statistical support resources available to you.
You also need to write your results section now – I'm not kidding about this. Look at papers that have done similar experiments and see how they presented the data, and how they wrote it up. Use this as a template. Write the results section drawing up blank tables to show your results and leaving gaps in the text awaiting the appropriate result.
I say this because with repeated measurements you need to define your study endpoint precisely. Are you interested in how big the plants are at a given time point, or how long it takes them to reach a given size? And indeed, is size what counts? Measured how?
You can also be getting a better feel for your data by graphing the results. Draw line graphs for each plant's growth, coloured to show which group they belong to. Do this for each variable separately – water level, genotype, pot size etc. Look out for peculiar data points! They can have devastating effects on mean values.
So read similar studies and dummy up your analysis plan, meanwhile making lots of graphs. Then make an appointment with your local friendly statistics service to discuss how you implement the analysis in whatever software you are using.
  • asked a question related to Data
Question
7 answers
Dear all,
I wanted to evaluate the accuracy of a model using observation data. My problem is the correlation of the model with observed data is really good (bigger than 0.7) but RMSE is very high too (like bigger than 100 mm in a month for monthly rainfall data). How can I explain it? the model also has low bias.
How to explain this case?
Thank you all
Relevant answer
Answer
The choice of a model should be based on underlying physical explanations first.
See the manual of my software for examples:
  • asked a question related to Data
Question
3 answers
Dear collegues,
I try to a neural network.I normalized data with the minimum and maximum:
normalize <- function(x) {
return ((x - min(x)) / (max(x) - min(x)))
}
maxmindf <- as.data.frame(lapply(mydata, normalize))
and the results:
results <- data.frame(actual = testset$prov, prediction = nn.results$net.result).
So I can see the actual and predicted values only in normalized form.
Could you tell me please,how do I scale the real and predicted data back into the "unscaled" range?
P.s. minvec <- sapply(mydata,min)maxvec <- sapply(mydata,max)
> denormalize <- function(x,minval,maxval) {
+ x*(maxval-minval) + minval
doesn't work correct in my case.
Thanks a lot for your answers
Relevant answer
Answer
It actually works (but you have to consider rounding errors):
normalize <- function(x, min, max) (x-min)/(max-min)
denormalize <- function(x, min, max) x*(max-min)+min
x <- rnorm(1000)
r <- range(x)
nx <- normalize(x, r[1], r[2])
dnx <- denormalize(nx, r[1], r[2])
all(x == dnx)
# FALSE ---> rounding errors
all(abs(x - dnx) < 1E-8)
# TRUE ---> identical up to tiny rounding errors
  • asked a question related to Data
Question
16 answers
Some literature clarify that they used quarterly data from the source of world-bank is that available? or they transform the annual data? how this transformation done?
Relevant answer
Answer
International Debt Statistics • These include the high frequency, quarterly data for high-income economies and select developing countries reporting to the joint World Bank–IMF Quarterly External Debt Statistics (QEDS) and the Quarterly Public Sector Debt (PSDS) database. [on-line] https://data.worldbank.org/products
• Quarterly Public Sector Debt (QPSD) [on-line] https://www.worldbank.org/en/programs/debt-statistics/qpsd
  • asked a question related to Data
Question
2 answers
Hi,
I am looking for the latest data of population from authentic sources/government authorities for anytime between 2012-2021. It will be for my study area, South 24 Parganas district, West Bengal. Any leads/contacts for acquiring the same will be of great help for my research work.
Thank you.
  • asked a question related to Data
Question
7 answers
I have used the merge function in SPSS loads of times previously with no problems. However this time I am running into an unusual issue and can't find any information online to overcome it.
I am trying to merge 2 SPSS data files: set 1 contains demographic data on 8504 cases and set 2 contains blood data on 6725 of those cases.
Using point-and-click options in SPSS I tried MERGE FILES>ADD VARIABLE>one-to-one merge based on key values (key variable = ID). However this results in a file with duplicate cases i.e. row 1 and row 2 are both subject ID 1, row 1 shows the values for the demographic data for that subject, while the blood data cells are blank, and in row 2 the demographic data cells are blank and the blood data are there. Screenshot attached.
I tried following this up with the restructuring command to try and merge the duplicate case rows but it did not alter the data set.
I've double checked that my ID variable in set 1 and set 2 matches in type/width/decimals etc.
I've tried the following syntax
MATCH FILES /FILE=*
/FILE='DataSet2'
/BY ID.
EXECUTE.
But none of the above has worked. Any advice would be HUGELY appreciated!
Relevant answer
Answer
Good morning Caoileann Murphy. I don't know if this will work with non-printing ascii codes, but I would try this:
* Strip non-printable ascii control codes from ID.
RENAME VARIABLES (ID=oldID).
* Make a new ID variable--adjust length as needed.
STRING ID (A5).
COMPUTE ID = oldID.
LOOP #i = 0 to 31.
- COMPUTE ID = REPLACE(ID,STRING(#i,PIB),"").
END LOOP.
* References.
Good luck!
  • asked a question related to Data
Question
16 answers
I have been struggling to get the export import data of Taiwan. In wb website, Taiwan is not listed as a country, so nothing can be found. Is there any reliable sources for country specific (Taiwan) data?
Relevant answer
Answer
Also check please the following very good link: https://www.ceicdata.com/en/indicator/taiwan/total-exports
  • asked a question related to Data
Question
5 answers
Hello everyone,
I am looking for links of scientific journals with dataset repositories.
Thank you for your attention and valuable support.
Regards,
Cecilia-Irene Loeza-Mejía
Relevant answer
Answer
Dear Cecilia-Irene Loeza-Mejía
I think you should have a look at the site «re3data: Registry of Research Data Repositories» (https://www.re3data.org).
There you will find the following search/browsing options: Browse by content type Browse by subject Browse by country
When you choose "Browse by content type", you will get "Raw data" or "Scientific and statistical data formats" (among others): https://www.re3data.org/browse/by-content-type/.
With best regards Anne-Katharina
  • asked a question related to Data
Question
8 answers
all rows are not visible due to large no. of rows, is there any method to view entire rows? I am using pandas to read the file
Relevant answer
Answer
If you don't want to set pandas global options you could also do
with pd.option_context("display.max_rows", None, "display.max_columns", None):
display(HTML(df.to_html()))
and this will apply to the "with" context only.
PS: you may need to import HTML from IPython
from IPython.display import HTML
  • asked a question related to Data
Question
15 answers
I am a PhD student and am currently working on metabolite profiles of some marine invertebrates.
While analysing some raw data generated from LC-MS, HRMS, NMR and FTIR, I was told by some researchers that these raw data, once submitted to a journal as supporting files, cannot be used further for any other analysis. For each analysis I need to generate the raw data again, otherwise it will be treated as a case of self-plagiarism.
I can see that my raw data has a potential of producing three distinct publications. I can analyse different parts of my raw data differently to present distinct conclusions.
But generating all the raw data again from these analyses, and that too for each publication, does not look sustainable to me. And clubbing all three publications in one also does not seem to be a good option here.
So I would like to know your views on this matter as a researcher and also as an Editor/Reviewer. Also, please share your similar experiences and solutions to it.
Relevant answer
Answer
It depends. Much data, for example fisheries records, are published for a given purpose - for example to manage a fishery. Many years after that data was published, it may provide other researchers with other information - for example on understanding "shifting baselines". There are many very good pieces of research using historic datasets. My herbarium vouchers are a form of "data". They are lodged in public Herbaria across Australia. The Herbarium staff make the vouchers, and the associated environmental data, available to researchers around the world. They do not limit the number of times any particular voucher may be used to provide a datapoint in someone's research. Those of us who contribute these data rarely hear about their reuse unless we subscribe to platforms like Bionomia. Over a career collecting environmental data I have much that I may never explore fully. I use platforms like the Atlas of Living Australia's BioCollect platform to host my old datasets. It is possible that one day someone may use the hypersaline lake diatom and physico-chemical data from Australia to develop a "diatom metric" index of water quality for these understudied habitats... or for gnammas, or coastal lagoons...
IAs you know what analyses you plan to subject the dataset to, maybe you would be better served in clubbing a group of papers together that look at the different things you have extracted from the dataset, and then provide the entire group to one journal to publish as a "set".
But I would not like for you to have the opinion that data may only be used once then needs be discarded. What a waste of effort. Well conserved datasets, with excellent metadata relating to the methodologies and data collection, can be valuable into the future, in ways we have no current understandings about.
  • asked a question related to Data
Question
5 answers
When should household panel data be weighted to increase the representativeness of the data? Does this depend on the number of reporting households?
Relevant answer
Answer
The weighted mean used in meta-analysis. It gives higher weight to accurate study and low weight to study less accurate.
  • asked a question related to Data
Question
4 answers
Hi,
Should the sociodemographic data of qualitative research be equally distributed? I will be glad if you send me your opinions and sources about this issue. Thanks in advance.
Relevant answer
Answer
In qualitative research you are aiming to cover the ground, so to speak. Often, this involves purposive sampling to make sure that all relevant points of view, experiences etc are included. However, the demographics may not be (probably are not) the best way of selecting participants. What you want is some way of making sure that e