Science topics: LepidopteraData
Science topic
Data - Science topic
Explore the latest questions and answers in Data, and find Data experts.
Questions related to Data
In your opinion, could a new generation of generative artificial intelligence be created in the future, whereby a highly sophisticated language model capable of simulating the consciousness of a specific person, answering questions based on knowledge derived from publications written by that person and documented statements, previously given interviews?
For example, if in a few years it will be possible to create a kind of new generation of artificial intelligence equipped with artificial thought processes, artificial consciousness and integrate a language model with a database of data, knowledge, etc. derived from publications written by that person and documented statements, previously given interviews then perhaps in a few years it will be possible to talk to a kind of artificial consciousness simulating the consciousness of a specific person who has long since died and would answer questions simulating that person, e.g. the long-dead Albert Einstein. In this way, there could be language models available on the Internet based on generative artificial intelligence equipped with artificial thought processes, artificial consciousness, with the knowledge of a specific person with whom the Internet user could converse. If this kind of highly intelligent tools were created and offered as a service to talk to a specific Person, known and living many years ago, this kind of service could probably become very popular as a new Internet service. However, the question of ethics and possible copyright of works, publications, books written by a specific person many years ago and whose knowledge, data and information would be used by a generative artificial intelligence simulating the consciousness of this person and answering questions, participating in discussions with people, with Internet users, remains to be considered. Beyond this, however, there is a specific category of risk of disinformation within this kind of online service that could be created in the future. This risk of disinformation would occur if there were situations of responses given by artificial intelligence to questions posed by humans, which would contain content, information, data, wording, phrases, suggestions, etc., which would never be uttered by a specific person simulated by artificial intelligence. The level of this kind of risk of misinformation would be inversely proportional to and determined by the level of sophistication, perfection, etc. of the construction of this kind of new generation of artificial intelligence equipped with artificial thought processes, artificial consciousness and the integration of a linguistic model with a database of data, knowledge, etc. derived from publications written by this person and documented statements, previously given interviews, etc., and the perfection of the learning system to give increasingly perfect answers given to the questions asked and to learn by the generative artificial intelligence system to actively participate in discussions.
In view of the above, I address the following question to the esteemed community of scientists and researchers:
In your opinion, can a new generation of generative artificial intelligence be created in the future, whereby a highly advanced language model capable of simulating the consciousness of a specific person, answering questions based on knowledge derived from publications written by that person and documented statements, previously given interviews?
Could an artificial intelligence be created in the future that is capable of simulating the consciousness of a specific person?
What do you think about this topic?
What is your opinion on this subject?
Please respond,
I invite you all to discuss,
Thank you very much,
Counting on your opinions, on getting to know your personal opinion, on an honest approach to discussing scientific issues and not the ready-made answers generated in ChatGPT, I deliberately used the phrase "in your opinion" in the question.
The above text is entirely my own work written by me on the basis of my research.
I have not used other sources or automatic text generation systems such as ChatGPT in writing this text.
Copyright by Dariusz Prokopowicz
Best wishes,
Dariusz Prokopowicz

I am trying to run a spatio-temporal autoregressive model (STAR). Therefore I need to create a spatial weight matrix W with N × T rows and N × T columns to weight country interdependencies based on yearly trade data. Could someone please tell me how I to create such a matrix in R or Stata?
Is analytics based on Big Data and artificial intelligence already capable of predicting what we will think about tomorrow, that we need something, that we should perhaps buy something we think we need?
Can an AI-equipped internet robot using the results of research carried out by Big Data advanced socio-economic analytics systems and employed in the call centre department of a company or institution already forecast, in real time, the consumption and purchase needs of a specific internet user on the basis of a conversation with a potential customer and, on this basis, offer internet users the purchase of an offer of products or services that they themselves would probably think they need in a moment?
On the basis of analytics of a bank customer's purchases of products and services, analytics of online payments and settlements and bank card payments, will banks refine their models of their customers' purchase preferences for the use of specific banking products and financial services? for example, will the purchase of a certain type of product or service result in an offer of, for example, a specific insurance or bank loan to a specific customer of the bank?
Will this be an important part of the automation of the processes carried out within the computerised systems concerning customer relations etc. in the context of the development of banking in the years to come?
For years, in databases, data warehouses and Big Data platforms, Internet technology companies have been collecting information on citizens, Internet users, customers using their online information services.
Continuous technological progress increases the possibilities of both obtaining, collecting and processing data on citizens in their role as potential customers, consumers of Internet offers and other media, Internet information services, offers of various types of products and services, advertising campaigns that also influence the general social awareness of citizens and the choices people make concerning various aspects of their lives. The new Industry 4.0 technologies currently being developed, including Big Data Analytics, cloud computing, Internet of Things, Blockchain, cyber security, digital twins, augmented reality, virtual reality and also machine learning, deep learning, neural networks and artificial intelligence will determine the rapid technological progress and development of applications of these technologies in the field of online marketing in the years to come as well. The robots being developed, which collect information on specific content from various websites and webpages, are able to pinpoint information written by internet users on their social media profiles. In this way, it is possible to obtain a large amount of information describing a specific Internet user and, on this basis, it is possible to build up a highly accurate characterisation of a specific Internet user and to create multi-faceted characteristics of customer segments for specific product and service offers. In this way, digital avatars of individual Internet users are built in the Big Data databases of Internet technology companies and/or large e-commerce platforms operating on the Internet, social media portals. The descriptive characteristics of such avatars are so detailed and contain so much information about Internet users that most of the people concerned do not even know how much information specific Internet-based technology companies, e-commerce platforms, social media portals, etc. have about them.
Geolocalisation added to 5G high-speed broadband and information technology and Industry 4.0 has, on the one hand, made it possible to develop analytics for identifying Internet users' shopping preferences, topics of interest, etc., depending on where, specifically geographically, they are at any given time with the smartphone on which they are using certain online information services. On the other hand, the combination of the aforementioned technologies in the various applications developed in the applications installed on the smartphone has made it possible, on the one hand, to increase the scale of data collection on Internet users, and, on the other hand, also to increase the efficiency of the processing of this data and its use in the marketing activities of companies and institutions and the implementation of these operations increasingly in real time in the cloud computing, the presentation of the results of the data processing operations carried out on Internet of Things devices, etc.
It is becoming increasingly common for us to experience situations in which, while walking with a smartphone past some physical shop, bank, company or institution offering certain services, we receive an SMS, banner or message on the Internet portal we have just used on our smartphone informing us of a new promotional offer of products or services of that particular shop, company, institution we have passed by.
In view of the above, I would like to address the following question to the esteemed community of scientists and researchers:
Is analytics based on Big Data and artificial intelligence, conducted in the field of market research, market analysis, the creation of characteristics of target customer segments, already able to forecast what we will think about tomorrow, that we need something, that we might need to buy something that we consider necessary?
Is analytics based on Big Data and artificial intelligence already capable of predicting what we will think about tomorrow?
The text above is my own, written by me on the basis of my research.
In writing this text, I did not use other sources or automatic text generation systems such as ChatGPT.
Copyright by Dariusz Prokopowicz
What do you think about this topic?
What is your opinion on this subject?
Please answer,
I invite you all to discuss,
Thank you very much,
Best regards,
Dariusz Prokopowicz

My team and I are trying to open a dialogue about designing a Continuum of Realism for synthetic data. We want to develop a meaningful way to talk about data in terms of the degree of realism that is necessary for a particular task. We feel the way to do this is by defining a continuum that shows that as data becomes more realistic, the analytic value increases, but so does the cost and risk of disclosure. Everyone seems to be interested in generating the most realistic data, but let's be honest, sometimes that's not the level of realism that we actually need. It is expensive and carries a high reidentification risk when working with PII. Sometimes we just need data to test our code, and we can't justify using this level of realism when the risk is so high. Have you also encountered this issue? Are you interested in helping us fulfill our mission? Ultimately we are trying to save money and protect consumer privacy. We would love to hear your thoughts!
Hi everyone,
I need to convert standard error (SE) into standard deviation (SD). The formula for that is
SE times the square root of the sample size
By 'sample size', does it mean the total sample size or sample sizes of individual groups? For example, the intervention group has 40 participants while the control group has 39 (so the total sample size is 79) So, when calculating SD for the intervention group, do I use 40 as the sample size or 79?
Thank you!
If ChatGPT is merged into search engines developed by internet technology companies, will search results be shaped by algorithms to a greater extent than before, and what risks might be involved?
Leading Internet technology companies that also have and are developing search engines in their range of Internet information services are working on developing technological solutions to implement ChatGPT-type artificial intelligence into these search engines. Currently, there are discussions and considerations about the social and ethical implications of such a potential combination of these technologies and offering this solution in open access on the Internet. The considerations relate to the possible level of risk of manipulation of the information message in the new media, the potential disinformation resulting from a specific algorithm model, the disinformation affecting the overall social consciousness of globalised societies of citizens, the possibility of a planned shaping of public opinion, etc. This raises another issue for consideration concerning the legitimacy of creating a control institution that will carry out ongoing monitoring of the level of objectivity, independence, ethics, etc. of the algorithms used as part of the technological solutions involving the implementation of artificial intelligence of the ChatGPT type in Internet search engines, including those search engines that top the rankings of Internet users' use of online tools that facilitate increasingly precise and efficient searches for specific information on the Internet. Therefore, if, however, such a system of institutional control on the part of the state is not established, if this kind of control system involving companies developing such technological solutions on the Internet does not function effectively and/or does not keep up with the technological progress that is taking place, there may be serious negative consequences in the form of an increase in the scale of disinformation realised in the new Internet media. How important this may be in the future is evident from what is currently happening in terms of the social media portal TikTok. On the one hand, it has been the fastest growing new social medium in recent months, with more than 1 billion users worldwide. On the other hand, an increasing number of countries are imposing restrictions or bans on the use of TikTok on computers, laptops, smartphones etc. used for professional purposes by employees of public institutions and/or commercial entities. It cannot be ruled out that new types of social media will emerge in the future, in which the above-mentioned technological solutions involving the implementation of ChatGPT-type artificial intelligence into online search engines will find application. Search engines that may be designed to be operated by Internet users on the basis of intuitive feedback and correlation on the basis of automated profiling of the search engine to a specific user or on the basis of multi-option, multi-criteria search controlled by the Internet user for specific, precisely searched information and/or data. New opportunities may arise when the artificial intelligence implemented in a search engine is applied to multi-criteria search for specific content, publications, persons, companies, institutions, etc. on social media sites and/or on web-based multi-publication indexing sites, web-based knowledge bases.
In view of the above, I address the following question to the esteemed community of scientists and researchers:
If ChatGPT is merged into search engines developed by online technology companies, will search results be shaped by algorithms to a greater extent than before, and what risks might be associated with this?
What is your opinion on the subject?
What do you think about this topic?
Please respond,
I invite you all to discuss,
Thank you very much,
Best wishes,
Dariusz Prokopowicz

How to build a Big Data Analytics system based on artificial intelligence more perfect than ChatGPT that learns but only real information and data?
How to build a Big Data Analytics system, a Big Data Analytics system, analysing information taken from the Internet, an analytics system based on artificial intelligence conducting real-time analytics, integrated with an Internet search engine, but an artificial intelligence system more perfect than ChatGPT, which will, through discussion with Internet users, improve data verification and will learn but only real information and data?
Well, ChatGPT is not perfect in terms of self-learning new content and perfecting the answers it gives, because it happens to give confirmation answers when there is information or data that is not factually correct in the question formulated by the Internet user. In this way, ChatGPT can learn new content in the process of learning new but also false information, fictitious data, in the framework of the 'discussions' held. Currently, various technology companies are planning to create, develop and implement computerised analytical systems based on artificial intelligence technology similar to ChatGPT, which will find application in various fields of big data analytics, will find application in various fields of business and research work, in various business entities and institutions operating in different sectors and industries of the economy. One of the directions of development of this kind of artificial intelligence technology and applications of this technology are plans to build a system of analysis of large data sets, a system of Big Data Analytics, analysis of information taken from the Internet, an analytical system based on artificial intelligence conducting analytics in real time, integrated with an Internet search engine, but an artificial intelligence system more perfect than ChatGPT, which will, through discussion with Internet users, improve data verification and will learn but only real information and data. Some of the technology companies are already working on this, i.e. on creating this kind of technological solutions and applications of artificial intelligence technology similar to ChatGPT. But presumably many technology start-ups that plan to create, develop and implement business specific technological innovations based on a specific generation of artificial intelligence technology similar to ChatGPPT are also considering undertaking research in this area and perhaps developing a start-up based on a business concept of which technological innovation 4.0, including the aforementioned artificial intelligence technologies, is a key determinant.
In view of the above, I address the following question to the esteemed community of scientists and researchers:
How to build a Big Data Analytics system, a system of Big Data Analytics, analysis of information taken from the Internet, an analytical system based on Artificial Intelligence conducting real-time analytics, integrated with an Internet search engine, but an Artificial Intelligence system more perfect than ChatGPT, which will, through discussion with Internet users, improve data verification and will learn but only real information and data?
What do you think about this topic?
What is your opinion on this subject?
Please respond,
I invite you all to discuss,
Thank you very much,
Best wishes,
Dariusz Prokopowicz

How can artificial intelligence such as ChatGPT and Big Data Analytics be used to analyse the level of innovation of new economic projects that new startups that are planning to develop implementing innovative business solutions, technological innovations, environmental innovations, energy innovations and other types of innovations?
The economic development of a country is determined by a number of factors, which include the level of innovativeness of economic processes, the creation of new technological solutions in research and development centres, research institutes, laboratories of universities and business entities and their implementation into the economic processes of companies and enterprises. In the modern economy, the level of innovativeness of the economy is also shaped by the effectiveness of innovation policy, which influences the formation of innovative startups and their effective development. The economic activity of innovative startups generates a high investment risk and for the institution financing the development of startups this generates a high credit risk. As a result, many banks do not finance business ventures led by innovative startups. As part of the development of systemic financing programmes for the development of start-ups from national public funds or international innovation support funds, financial grants are organised, which can be provided as non-refundable financial assistance if a startup successfully develops certain business ventures according to the original plan entered in the application for external funding. Non-refundable grant programmes can thus activate the development of innovative business ventures carried out in specific areas, sectors and industries of the economy, including, for example, innovative green business ventures that pursue sustainable development goals and are part of green economy transformation trends. Institutions distributing non-returnable financial grants should constantly improve their systems of analysing the level of innovativeness of business ventures planned to be implemented by startups described in applications for funding as innovative. As part of improving systems for verifying the level of innovativeness of business ventures and the fulfilment of specific set goals, e.g. sustainable development goals, green economy transformation goals, etc., new Industry 4.0 technologies implemented in Business Intelligence analytical platforms can be used. Within the framework of Industry 4.0 technologies, which can be used to improve systems for verifying the level of innovativeness of business ventures, machine learning, deep learning, artificial intelligence (including e.g. ChatGPT), Business Intelligence analytical platforms with implemented Big Data Analytics, cloud computing, multi-criteria simulation models, etc., can be used. In view of the above, in the situation of having at one's disposal appropriate IT equipment, including computers equipped with new generation processors characterised by high computing power, it is possible to use artificial intelligence, e.g. ChatGPT and Big Data Analytics and other Industry 4.0 technologies to analyse the level of innovativeness of new economic projects that plan to develop new start-ups implementing innovative business solutions, technological, ecological, energy and other types of innovations.
In view of the above, I address the following question to the esteemed community of scientists and researchers:
How can artificial intelligence such as ChatGPT and Big Data Analytics be used to analyse the level of innovation of new economic projects that plan to develop new startups implementing innovative business solutions, technological innovations, ecological innovations, energy innovations and other types of innovations?
What do you think?
What is your opinion on this subject?
Please respond,
I invite you all to discuss,
Thank you very much,
Warm regards,
Dariusz Prokopowicz

Dear researchers,
I am working on a project related to solar wind. I want to download a 1-minute resolution data from a BepiColombo spacecraft. However, I am struggling with that. Do you know any websites to download the data? OR, If you could help me to provide a BepiColombo data just for few days, it would be very helpful. I am expecting a valuable comments from a wonderful personalities.
Many thanks.
Does analytics based on sentiment analysis of changes in Internet user opinion using Big Data Analytics help detect fakenews spread as part of the deliberate spread of disinformation on social media?
The spread of disinformation on social media used by setting up fake profiles and spreading fakenews on these media is becoming increasingly dangerous in terms of the security of not only specific companies and institutions but also the state. The various social media, including those dominating this segment of new online media, however, differ considerably in this respect. The problem is more acute in the case of those social media which are among the most popular and on which mainly young people function, whose world view can be more easily influenced by factual information and other disinformation techniques used on the Internet. Currently, among children and young people, the most popular social media include Tik Tok, Instagram and YouTube. Consequently, in recent months, the development of some social media sites such as Tik Tok is already being restricted by the governments of some countries by banning the use, installation of this application of this portal on smartphones, laptops and other devices used for official purposes by employees of public institutions. These actions are argued by the governments of these countries in order to maintain a certain level of cyber security and reduce the risk of surveillance, theft of data and sensitive, strategic and particularly security-sensitive information of individual institutions, companies and the state. In addition, there have already been more than a few cases of data leaks on other social media portals, telecoms, public institutions, local authorities and others based on hacking into the databases of specific institutions and companies. In Poland, however, the opposite is true. Not only does the organised political group PIS not restrict the use of Tik Tok by employees of public institutions, but it also motivates the use of this portal by politicians of the ruling PIS option to publish videos as part of the ongoing electoral campaign, which would increase the chances of winning parliamentary elections for the third time in autumn this year 2023. According to analysts researching the problem of growing disinformation on the Internet, in highly developed countries it is enough to create 100 000 avatars, i.e. non-existent fictitious persons, created as it were and seemingly functioning thanks to the Internet by creating profiles of these fictitious persons on social media portals referred to as fake profiles created and functioning on these portals, to seriously influence the world view, the general social awareness of Internet users, i.e. usually the majority of citizens in the country. On the other hand, in third world countries, in countries with undemocratic systems of power, all that is needed for this purpose is about 1,000 avatars of these fictitious people with stories modelled, for example, on famous people such as, in Poland, a well-known singer claiming that there is no pandemic and that vaccines are an instrument for increasing control of citizens by the state. The analysis of changes in the world view of Internet users, changes in trends concerning social opinion on specific issues, evaluations of specific product and service offers, brand recognition of companies and institutions can be conducted on the basis of sentiment analysis of changes in the opinion of Internet users using Big Data Analytics. Consequently, this type of analytics can be applied and of great help in detecting factual news disseminated as part of the deliberate spread of disinformation on social media.
In view of the above, I address the following question to the esteemed community of scientists and researchers:
Does analytics based on sentiment analysis of changes in the opinions of Internet users using Big Data Analytics help in detecting fakenews spread as part of the deliberate spread of disinformation on social media?
What is your opinion on this topic?
What is your opinion on this subject?
Please respond,
I invite you all to discuss,
Thank you very much,
Warm regards,
Dariusz Prokopowicz

I am tiding up with the below problem, it's a pleasure to have your ideas.
I've written a coding program in two languages, Python and R, but each came to a completely different result. Before jumping to a conclusion, I declare that:
- Every word of the code in two languages has multiple checks and is correct and represents the same thing.
- The used packages in two languages are the same version.
So, what do you think?
The code is about applying deep neural networks for time series data.
Where can I get the original paper by Ackoff from 1989 titled "From data to wisdom" published in Journal of applied systems analysis?
The file that is publicly available under this title is a 2-page article from 1999.
Where can I get global data (freely accessible) for small and micro-enterprises across countries and time?
I am currently using Fisher's exact test as some of my cell counts are <5. I have done this for lots of data in the same dataset which have generally been 2x3 or more (so reported Cramer's V as well), however now I am running 2x2, the Fisher's output is blank and I can't figure out why?! I have attached an example of the output - any help would be gratefully received!

Since statistics is the golden key to interpret data in almost all scientific & social branches !
I would like to fit some data sets. I read that LEVMW is the best program. does anyone has this program or a link for it?
Or are there other programms better than LEVMW?
Greetings respectable community of ResearchGate. I encountered some issues while gathering data from the World Bank Database, hence I would like to know if there are alternatives or other websites like the World Bank Database in which we can gather raw data.
The website can contain whatever form of indicators such as (developments, governance, competitiveness, economics, financial sector, etc.….) Thank you in advance for your assistance.
It is a dta file, can not be read with read.csv. I used haven to read it. However, the second row of column is real name, such as Checking Year and Checking Month , how can you extract it?

I had the same problem with the data collected from Thermo Scientific XPS. Here is the solution I found out, it may help other researchers.
Step 1: Open Avantage (on the XPS measurement computer), then go to "file"->"save as", a new window pops out.
📷(see the picture below)
Step 2: Name your .vgp file and click save. A new folder will be created, like Data Grid 2. DATA
Step 3: Open the software "DataSpace_BatchDump.exe". It can be found on the same computer with Avantage installed, in "C:\Program Files (x86)\Thermo\Avantage\Bin".
Step 4: Open the folder with .vgp/.vgd files in DataSpace_BatchDump, and click OK. Then, a new window pops out, find a location to export the files, like “C:\export”, then click ok twice. New .avg files will be saved in that location.
📷(see the picture below)
Step 5: Open CasaXPS, click “covert”, find “C:\export”. Type “.dth” and click ok. Then, the .vms files would be created.
📷(see the picture below)



Can artificial intelligence help optimize remote communication and information flow in a corporation, in a large company characterized by a multi-level, complex organizational structure?
Are there any examples of artificial intelligence applications in this area of large company operations?
In large corporations characterized by a complex, multi-level organizational structure, the flow of information can be difficult. New ICT and Industry 4.0 information technologies are proving to be helpful in this regard, improving the efficiency of the flow of information flowing between departments and divisions in the corporation. One of the Industry 4.0 technologies that has recently found various new applications is artificial intelligence. Artificial intelligence technology is finding many new applications in recent years. The implementation of artificial intelligence, machine learning and other Industry 4.0 technologies into various business fields of companies, enterprises and financial institutions is associated with the increase in digitization and automation of processes carried out in business entities. For several decades, in order to refine and improve the flow of information in a corporation characterized by a complex organizational structure, integrated information systems are being implemented that informationally connect applications and programs operating within specific departments, divisions, plants, etc. in a large enterprise, company, corporation. Nowadays, a technology that can help optimize remote communication and information flow in a corporation is artificial intelligence. Artificial intelligence can help optimize information flow and data transfer within a corporation's intranet.
Besides, the technologies of Industry 4.0, including artificial intelligence, can help improve the cyber security techniques of data transfer, including that carried out in email communications.
In view of the above, I address the following question to the esteemed community of researchers and scientists:
Can artificial intelligence help optimize remote communication and information flow in a corporation, in a large company characterized by a multi-level, complex organizational structure?
What is your opinion on this issue?
Please answer,
I invite everyone to join the discussion,
Thank you very much,
Best regards,
Dariusz Prokopowicz

Hello, i'm looking for a reliable insect pest database that shares information about the occurrence, geographic distribution, hosts of all insect pest over the world... I made a little research on my own but the results aren't quite reliable in my opinion from the databases i found (gbif, cabi..). I also believe that many technical reports revealing the occurrence of those insects are being published in governmental research centers of every country however they aren't accessible online. Is there a way to get access to those reports?
Hi Researchers,
I am looking for journals that publish scientifically valuable datasets, and research that advances the sharing and reuse of scientific data. Please let me know if you have any recommendations.
FYI, the dataset we are curating are bioinformatics image data.
Greetings everyone! could anyone kindly tell me where I can get data concerning the stock and bonds market?
I'm new to fsQCA and would like to conduct a fsQCA analysis in one of my dissertation studies. The majority of fsQCA methods begin with data collection, followed by the preparation of the data matrix and the truth table. Conditions and outcomes are represented in the data matrix. Conditions, as we know, are the responses to surveys or interviews. I'm curious where the "outcomes" come from. Do we ask participants to rate the outcomes on a scale, as they did in the conditions?
For example, in the data matrix attached, there are five conditions (LPI, TAB, WPP, PAP, and NR) and outcomes are indicated by PubINF.
I acknowledge this is a very basic question, but I look forward to receiving your response.

I tried using Gigasheet but it does not have many features that are available in excel. Suggest me some freely available sources where I can load my ~1.7 million rows and do some calculations like sort multiple columns, remove duplicates
TIA
Hello,
Does anyone know how to change the time length to 2 or 3s in Ansys Static Structural -> Model -> Static Structural -> Pressure -> Magnitude -> Function -> Tabular data? The current setup only allows me to go up to 1s.
Thanks!
Hi all,
I'm having trouble converting one particular variable in my dataset from string to numeric. I've tried manually transforming/recoding into a different variable and automatic recoding. I've also tried writing syntax (see below). The same syntax has worked for every other variable I needed to convert but this one. For all methods (manual recode, automatic recode, and writing a syntax), I end up with missing data.
recode variablename ('Occurred 0 times' = 0) ('Occurred 1 time' = 1) ('Occurred 2 times' = 2) ('Occurred 3+ times' = 3) into Nvariablename.
execute.
VALUE LABELS
Nvariablename
0 'Occurred 0 times'
1 'Occurred 1 time'
2 'Occurred 2 times'
3 'Occurred 3+ times'.
EXECUTE.
Thank you in advance for your help!
Dear researchers, we tried to download AOD data from aeronet for Nepal stations, however maximum data are missing. Are there any other appropriate website to download AOD data? Need your suggestions, thanks. :)
My lab wants to try to do as much of our pre-processing, processing, and analysis in R as possible, for ease of workflow and replicability. We use a lot of psychophysiological measures and have historically used MATLAB for our workflow with this type of data. We want to know if anyone has been successful in using R for these types of tasks.
Any good R packages?
I wanted to know the purity of MgSO4 and i went for XRF analysis for the same. I received data in the following format
[Quantitative Result]
---------------------------------------------------------------------------------
Analyte Result Proc-Calc Line Net Int. BG Int.
---------------------------------------------------------------------------------
S 61.3709 % Quant.-FP S Ka 1165.666 3.578
Mg 37.9584 % Quant.-FP MgKa 158.225 0.918
Ca 0.5466 % Quant.-FP CaKa 4.244 1.142
Si 0.1241 % Quant.-FP SiKa 0.862 0.148
can anyone please help me by explaining how to find purity of MgSO4 from this.
Based on the literature review we get idea of moderators. But what we if want to introduce a new moderator in the literature.
1) What are the criteria for new moderator ?
2) How to theoretically support moderating variable ?
3) Is is necessary to adopt new moderating variable from same theory ?
Is there any special procedure to follow to get data?
For RWE studies is important to find the correct RWD sources, there fore I am looking for other sources of Japanese drugs codes, diagnosis codes, etc. Other than JMDC. It would be helpful if any of you could help me.
Hello;
We have two twenty years data sets, for a historical time span, and a future prediction. for both, statistical distributions are fitted for five-year intervals, and for historical and predicted data, the same statistical distribution (Johnson SB, Gen. Pareto, and Wakeby) اave been selected as the most appropriate distributions.
Similar statistical distributions have been obtained for all five-year intervals and for the entire twenty-year time series. We want to know what this similarity in data analysis means?
Best
Saeideh
Hello,
I have a huge number of data and I need to calculate such categorical statistical indices (e.g., POD, FAR, CSI, ETS) using python or R. I will be thankful for any kind of help.
Regards,
Dear colleagues
I've a CSV file (5 thousand lines) with information about the year,country,distance,product,quantity..
I can to open the file in notepad++ also..
Could you tell me please, how I can to construct the graph in excel or R studio with quantity or how to consider every quantity , which corresponds to the respective country?
Thank you very much
If I want the annual average of the country production of oil for 2019 and I have 25 stations,
1- should I take the sum ( of 12 months) for each station individually so I get the annual sum for each station and then divide by 25 to calculate country annual
2- or I take the sum of January for the 25 stations and then February .... etc. and then divide by 12 which is number of months to get the annual average of the country
I have a model that needs calibration, but I am afraid that if I calibrate using too many model parameters, I will overfit to my data, or the calibration will not be well-done.
Can anyone suggest a method to determine the maximum number of parameters I should use?
I was exploring differential privacy (DP) which is an excellent technique to preserve the privacy of the data. However, I am wondering what will be the performance metrics to prove this between schemes with DP and schemes without DP.
Are there any performance metrics in which a comparison can be made between scheme with DP and scheme without DP?
Thanks in advance.
Hello everyone,
Could you recommend an alternative to IDC please to get records from the global datasphere for free?
Thank you for your attention and valuable support.
Regards,
Cecilia-Irene Loeza-Mejía
I am a research assistant to a doctoral student and she has been asked that for her thesis she must include on a CD a document management system with all the documentation she used, so that the reviewers of the work can quickly search through the documents, filter, search by keywords or between texts, etc....
I have searched and find that there are several such systems:
- OpenDocMan
- Seeddms
- I, Librarian
- OpenKm
- LogicalDOC
- Kimios
- others...
Several of them are web based and would be ideal as they offer the functionality we are looking for, but they are free as long as you are the one setting up a server. Others work as windows software but are not packable on their own to store on a CD. On the other hand I have not found options for free hosting even if it is low capacity and it does not make sense to pay indefinitely for such a system for a thesis work. *Excel is not an option for her unfortunately*.
I would like to know what system you know of that I could set up to search through documents and all this, so that I could save the whole system along with the documents on a CD, or it could be a Web solution but that I could have free hosting.
Thank you.
Dear colleagues,
I am studying the most relevant disinformation topics on a given subject and over a period of time. I intend to analyze the content of fact-checking articles.
I am developing a model to optimise a sustainable energy system for community buildings. The system uses renewable energy, battery storeage and intelligent building management to optimise the energy used by the building. I cannot find any data on electricity use patterns for community builings (Village/Church Halls) across the year. There seems to be lost for domisetic property and some for normal commercial property (offices/shops/factories). I have limited data which shows a marked summer/winter pattern but would be grateful if anyone could share any larger data sets. At the moment the buildings are all in the north of England but ideally we woudl like to develop a model that works anywhere.
In R, how do you generate random data whose distribution needs to satisfy specified skewness and kurtosis
I have recorded behavioural data, such as incidences of aggression and grooming partners, in a troop of lemurs over three conditions.
What tests should I be using to compare the rates of aggression in the three conditions?
For the grooming partner data, I want to compare grooming between sex dyads. For instance, the frequency of male-male grooming compared to male-female grooming within each condition and then compare the average proportion of grooming between sex dyads in the three conditions. How would I do this?
Thank you in advance for your help. Apologies if this question is poorly worded, I am very new to data analysis.
I'm doing a systematic review and I need an access to EMBASE and Scopus databases to do research (my institution doesn't offer such an access)
Could someone help?
Regards,
Hi,
My research involves looking at whether an education initiative (workplace training) increase employee knowledge and engagement in corporate sustainability missions.
Research includes:
1) A pre Likert Scale questionnaire (18 questions divided roughly into two sections, knowledge and engagement)
2) An education intervention (training)
3) A post Likert Scale questionnaire (18 questions divided roughly into two sections, knowledge and engagement)
These have already taken place and I have questionnaire responses for 20 participants.
How do I go about interpreting this/ analyzing this? I have read lots of different answers online and can't seem to find a common answer
Any help will be appreciated- Thank you
Eimear
hey guys, I'm working on a new project where I should transfer Facebook ads campaigns data to visualize in tableau or Microsoft power BI, and this job should be done automatically daily, weekly or monthly, I'm planning to use python to build a data pipeline for this, do you have any suggestions or any Resources I can read or any projects similar I can get inspired from ? thank you .
Data science is a growing field of technology in present context. There have been notable applications of data science in electronic engineering, nanotechnology, mechanical engineering and artificial intelligence. What kind of future scopes available for data science at civil engineering aspects in the field of structural analysis, structural design, geotechnical engineering, hydrological engineering, environmental engineering and sustainable engineering?
We ran different experiments in our lab where we exposed corals to different factors, e.g. Experiment 1 looking at ocean acidification and phosphate enrichment, and Experiment 2 looking at ocean acidification and nitrate. In both experiments, we have a control group, each factor (acidification and eutrophication) alone, and then a group exposed to both stressors at the same time.
As our sample size is rather small, we thought of pooling the data from different experiments when corals experienced the same treatment, e.g. the pure acidification groups from Experiments 1 and 2.
And here is the question: Which test(s) should we run to decide whether we can pool our data or not? We assume that we can only pool the data if there is no significant difference between the response (like respiration rate) in corals exposed to pure acidification in Experiments 1 and 2, correct?
We thought that we could compare the means of the groups (using ANOVA or Kruskal-Wallis), compare the ranks (using PERMANOVA), or look at similarity (using PCO and ANOSIM). Unfortunately, depending on the test, the outcomes of them are different (surprise!) and we don’t know which test is the “correct” one to make the decision to pool or not to pool.
Or maybe we don’t have to test them at all and can just pool them? What is the correct way/test to make this decision?
Usecase- To provide the security of the data by building Next-generation firewalls or Is there any better firewall type to handle the normal systems. Please do suggest me any answers!!.
I'm trying to analyse some data from my last experiment, where I grew two varieties of potato in a range of pot sizes with well-watered and water-restricted conditions, to see if the size of the pot would affect the relationships between water restriction and measures of plant morphophysiology over time.
Unfortunately, I have absolutely no idea how to analyse these data, which looks like this (5 pot sizes, 2 genotypes, 2 treatments, and about 11 dates)... Each combination of factors was replicated in triplicate. To be honest, I'm not even sure what I'm trying to look for, my brain's not great with numbers so I'm just sitting staring at Minitab. Any help at all would be amazing. Thanks.
Dear all,
I wanted to evaluate the accuracy of a model using observation data. My problem is the correlation of the model with observed data is really good (bigger than 0.7) but RMSE is very high too (like bigger than 100 mm in a month for monthly rainfall data). How can I explain it? the model also has low bias.
How to explain this case?
Thank you all
Dear collegues,
I try to a neural network.I normalized data with the minimum and maximum:
normalize <- function(x) {
return ((x - min(x)) / (max(x) - min(x)))
}
maxmindf <- as.data.frame(lapply(mydata, normalize))
and the results:
results <- data.frame(actual = testset$prov, prediction = nn.results$net.result).
So I can see the actual and predicted values only in normalized form.
Could you tell me please,how do I scale the real and predicted data back into the "unscaled" range?
P.s. minvec <- sapply(mydata,min)maxvec <- sapply(mydata,max)
> denormalize <- function(x,minval,maxval) {
+ x*(maxval-minval) + minval
doesn't work correct in my case.
Thanks a lot for your answers
Some literature clarify that they used quarterly data from the source of world-bank is that available? or they transform the annual data? how this transformation done?
Hi,
I am looking for the latest data of population from authentic sources/government authorities for anytime between 2012-2021. It will be for my study area, South 24 Parganas district, West Bengal. Any leads/contacts for acquiring the same will be of great help for my research work.
Thank you.
I have used the merge function in SPSS loads of times previously with no problems. However this time I am running into an unusual issue and can't find any information online to overcome it.
I am trying to merge 2 SPSS data files: set 1 contains demographic data on 8504 cases and set 2 contains blood data on 6725 of those cases.
Using point-and-click options in SPSS I tried MERGE FILES>ADD VARIABLE>one-to-one merge based on key values (key variable = ID). However this results in a file with duplicate cases i.e. row 1 and row 2 are both subject ID 1, row 1 shows the values for the demographic data for that subject, while the blood data cells are blank, and in row 2 the demographic data cells are blank and the blood data are there. Screenshot attached.
I tried following this up with the restructuring command to try and merge the duplicate case rows but it did not alter the data set.
I've double checked that my ID variable in set 1 and set 2 matches in type/width/decimals etc.
I've tried the following syntax
MATCH FILES /FILE=*
/FILE='DataSet2'
/BY ID.
EXECUTE.
But none of the above has worked. Any advice would be HUGELY appreciated!

I have been struggling to get the export import data of Taiwan. In wb website, Taiwan is not listed as a country, so nothing can be found. Is there any reliable sources for country specific (Taiwan) data?
Hello everyone,
I am looking for links of scientific journals with dataset repositories.
Thank you for your attention and valuable support.
Regards,
Cecilia-Irene Loeza-Mejía
all rows are not visible due to large no. of rows, is there any method to view entire rows? I am using pandas to read the file
I am a PhD student and am currently working on metabolite profiles of some marine invertebrates.
While analysing some raw data generated from LC-MS, HRMS, NMR and FTIR, I was told by some researchers that these raw data, once submitted to a journal as supporting files, cannot be used further for any other analysis. For each analysis I need to generate the raw data again, otherwise it will be treated as a case of self-plagiarism.
I can see that my raw data has a potential of producing three distinct publications. I can analyse different parts of my raw data differently to present distinct conclusions.
But generating all the raw data again from these analyses, and that too for each publication, does not look sustainable to me. And clubbing all three publications in one also does not seem to be a good option here.
So I would like to know your views on this matter as a researcher and also as an Editor/Reviewer. Also, please share your similar experiences and solutions to it.
When should household panel data be weighted to increase the representativeness of the data? Does this depend on the number of reporting households?
Hi,
Should the sociodemographic data of qualitative research be equally distributed? I will be glad if you send me your opinions and sources about this issue. Thanks in advance.