Science topic

Data Management - Science topic

Explore the latest questions and answers in Data Management, and find Data Management experts.
Questions related to Data Management
  • asked a question related to Data Management
Question
4 answers
What benefits? In which research center and university?
In the areas of blockchain, data management, and data security for AI or DL, which centers are most qualified or have investigations in these areas?
Relevant answer
Answer
Dear António Brandão,
As far as I know, post doctoral is self-guided research activity. One must do it to extend the PhD work.
  • asked a question related to Data Management
Question
3 answers
I have six kinds of compounds which I then tested for antioxidant activity using the DDPH assay and also anticancer activity on five types of cell lines, so I got two types of data groups:
1. Antioxidant activity data
2. Anticancer activity (5 types of cancer cell line)
Each data consisted of 3 replications. Which correlation test is the most appropriate to determine whether there is a relationship between the two activities?
Relevant answer
Answer
Just do logistic regression is what I had in mind. The DV might be antcancer activity (yes /no) same for antioxidant activity. Best wishes David Booth
  • asked a question related to Data Management
Question
8 answers
For example, for a large lab with a number of research projects, a document management system that would work in a collaborative environment. Also, that would include connecting figures to actual data (numbers and images) standard tools for creating manuscript templates.
Thanks
Relevant answer
Answer
A data management tool combines and manages data from numerous data sources. It extracts, cleans, transforms, and integrates data without compromising on integrity so that you can access it in an easy-to-use format.
  • asked a question related to Data Management
Question
4 answers
Through my research I've developed a data management and analysis software, and am looking for beta testers to help me improve it.
If you are interested you can visit the website at
or send me a message.
Relevant answer
Answer
Arif Ahmed , are you interested in trying out the software? If so please send me a DM so we can discuss further.
  • asked a question related to Data Management
Question
4 answers
This is a bit of a meta-question. I've developed a piece of software to help researchers with data management and analysis. The software is at a point now where I am looking for more beta testers.
Is there a good thread on Research Gate or other forums where it is appropriate to announce the software?
Relevant answer
Answer
Ofcourse, you can use thread or Q/A feature on Researchgate to announce and call for beta testers. I believe, Researchgate isn't the best for such and other social -academic/software forum and also GitHub forum groups and sites are available. Even on this thread and on LinkedIn, and stackexchange
  • asked a question related to Data Management
Question
4 answers
Can anyone help me with the Trial Data Management process?
Relevant answer
Answer
All SAS-related poster/presentation may be found at https://www.lexjansen.com/ Use their search engine for anything related to SAS.
  • asked a question related to Data Management
Question
10 answers
I have one independent variables with two group: Physically and Chemically Activated Rice Husk
And I have three Parameters(dependent variable) to test it's effectiveness in Water Purification. What type of Data Management, Processing, and Analysis should I use for my study?
The title of my study is: "POTENCY OF ACTIVATED CARBON FROM RICE HUSK AS A COMPONENT FOR WATER PURIFICATION"
Relevant answer
Answer
I would caution against MANOVA because it's not what you think it is.
A lot of people assume that a MANOVA works like an ANOVA but with several outcome variables at once. This is not in the least bit true. MANOVA mashes your outcome variables together into composite outcome variables. And furthermore, it does it in a way that you cannot control. Instead of giving each outcome variable equal weighting, or even allowing you to decide on the relative importance of each variable, it basis the weighting on the relationship of each variable to the predictor variables. As Kendal Smith puts it ANOVA and MANOVA do not analyze the same variables and, thereby, address different research questions.
See
Huang FL. MANOVA: A procedure whose time has passed?. Gifted Child Quarterly. 2020 Jan;64(1):56-60.
Smith KN, Lamb KN, Henson RK. Making meaning out of MANOVA: the need for multivariate post hoc testing in gifted education research. Gifted Child Quarterly. 2020 Jan;64(1):41-55
  • asked a question related to Data Management
Question
5 answers
Hello,
I am starting a project and a white paper on this regard. I'm curious on how new developments on Big data and every-day new platforms related to this new concept of data management is affecting current decision making processes and techniques inside businesses, corporations, non-profit organizations? Are we being hit only by the first and novel effects or is it the case that we are in the middle of the storm already?
I'd like to know from anyone how can shed some light on this topic.
Cordialement,
oa
Relevant answer
Answer
Decision making in companies and organizational leadership should be dependent on the key variables that determines the effect of the decision to be made. The volume of data (big data) may be of help but is not decisive for effective decision making. The top management should identify the point that require decision (purpose), what key factors are related to the purpose and what resources are required to make decision for that particular issue. Relevant data on each of these variables is most decisive than the bulky of data. Nonetheless, the big data is of importance in reviewing the over all performances of the company and effectiveness of the leadership. For specific decisions, specific but relevant data is required. So the level (scope) of big data that affect decision making process depends on the level (scope) of the decision that is to be made.
  • asked a question related to Data Management
Question
4 answers
I am collecting questionnaire data in an epidemiological study from large numbers of participants (>1000). What is the best data management system/software for: data entry, data validation and data checking?
Relevant answer
Answer
epi info is free and has backup help
  • asked a question related to Data Management
Question
5 answers
Who also uses eLabFTW for documentation of experiments and analysis? In what research area and what experience do you have?
Relevant answer
Answer
Hello Frank Krüger, thank you for your feedback, we recently started testing the eLabFTW in the workflow at our Electron Microscope. It would be cool if we could compare/exchange our methodology. It would be useful to consider a user meeting on this topic.
  • asked a question related to Data Management
Question
3 answers
In the form of an essay provide a critical analysis of data management within an organisation of your choice, showing the importance of effective and efficient data management systems and processes.
· Evaluate the decision making processes used in the business environment and how this is linked to their data management.
Finally link the evaluation of two communications theories into organisational decision making and data management procedures used showing how these rely on effective management and leadership practice.
Relevant answer
Answer
Oops sorry but I confused different Dr..Sharma's. The one I know has published in Decision Theory and Management Science quite heavily, but this one has a great deal of management experience. Mea culpa and apologies to all for my error. David Booth
  • asked a question related to Data Management
Question
2 answers
For example; Data should be collected and distributed in a single center, or there should be a data management portal for each province as local governments or a separate center for transportation.
What are your thoughts on this matter?
Relevant answer
Answer
I agree with Cenk. Directly sending emails to authorities may be a good choice based on my own experiences.
  • asked a question related to Data Management
Question
9 answers
Big Data is another trend in the world of ICT which has implication for information management. In this regard,can libraries manage Big Data in term of its volume,velocity, and varieties? The amount of data in the world today has been exploding. How could libraries help in managing Big Data?
Relevant answer
Answer
The use of Big Data Analytics database and analytical technologies can significantly improve the information offer of digitized online libraries. Analytics on digitized library resources using Big Data Analytics can significantly increase the possibilities of research in the field of book science. The results of these studies posted on the library website can significantly improve the information offer of the library website.
Regards,
Dariusz Prokopowicz
  • asked a question related to Data Management
Question
8 answers
Animal breeding is all about big data management and handling. The big data management is not possiable without application of computer in animal breeding. As data analysis and interpretation has pivotalrole in animal breeding. Hence, to have insight about the softwares used by animal breeders for data handling and analysis is necessory for a breeder is very important process.
Relevant answer
Answer
Thanks for your contribution and support @ Dr. Mohammad Ghaderzadeh
  • asked a question related to Data Management
Question
12 answers
Dear all, suggest the optimal software for measuring causality.
Relevant answer
Answer
It may be arriving late. Sophia, what type of causality are you looking for? If you are looking for time series causality there are good libraries in R and Python. Search for "Casual Impact". It is an algorithm developed by Google. You can find econometric papers and software libraries that implement it.
  • asked a question related to Data Management
Question
3 answers
Hi,
I am looking for open access tool for Master Data management.Using Talend Open Source for MDM, I am unable to create master data. Is there any other open access tool which could help me in this?
  • asked a question related to Data Management
Question
4 answers
Dear respected colleagues,
I would like to check how worthy it is to spend some time implementing and developing a well tested tool for simulating Blockchain-assisted Fog-enhanced systems. Initially, I suggest the tool would allow users to choose the layer of the Fog Computing architecture at which they would like to place the Blockchain. Further, the tool allows to choose one of several available consensus algorithms for simulating different cases for each scenario. The services that can be provided by the Blockchain in this tool includes, but not limited to, Computational services (Smart Contracts), Data Management, Identity Management, and Payment Services between systems entities.
If such a simulation tool is available, how likely it is that you would use it in your research?
Relevant answer
Answer
I like that, is great
  • asked a question related to Data Management
Question
1 answer
  • What are the risks of an Industry 4.0 solution?
  • How is data managed at this moment in time?
  • Can the end user manage the velocity, variety, volume and veracity of its current information flow?
  • ​What would happen if this was tripled or quadrupled?
  • How does it support integration across the value chain?
Relevant answer
Answer
One risk is connected with security issues.
But organizing a data storage using the I4.0 principles gives more flexibility and the security part is a price to pay.
We've been developing an approach to design a data management system for measuring forecasting accuracy across many time series and across multiple horizons and forecasting origins. By following the I.4 principles the following framework has been developed:
The framework presented implements the so-called Forecast-value-added (FVA) analysis (see slide 6) while following the principles of interoperability, decentralization, real-time processing, service-orientation, and modularity.
Our general problem definition is given on slides 4-5. The I4.0 principles helped obtain a cross-platform solution for forecasting, which is simple to implement and learn, and fast in operation (applicable in production settings).
  • asked a question related to Data Management
Question
8 answers
>500 in the sample
Relevant answer
Answer
Why you don't try Matlab
  • asked a question related to Data Management
Question
9 answers
I am writing a paper assessing unidimensionality of multiple-choice mathematics test items. The scoring of the test was based on right or wrong answers which imply that the set of data are in nominal scale. Some earlier research studies that have consulted used exploratory factor analysis, but with the little experience in data management, I think factor analysis may not work. This unidimensionality is one of the assumptions of dichotomously scored items in IRT. Please sirs/mas, I need professional guidance, if possible the software and the manual.
  • asked a question related to Data Management
Question
6 answers
I am currently using NVivo for my constructivist grounded theory analysis. I have large volumes of data and I am using a variety of different sources. I think it works really well as a data management tool, however, it is crashing at least once every two hours and the recovery isn't always the latest version. I would be interested to know if other researchers are experiencing the same issues? If not, I would be keen to know the spec of the PC/laptop they are using. I am currently using a Dell Inspiron 15 5000.
Relevant answer
Answer
Most of the links that Mary provided are to rather general discussions of NVivo, rather than to technical problems like you are experiencing. My strong impression is that NVivo is quite solid, so I would try to connect with the support team at QSR to get a resolution to your problem.
  • asked a question related to Data Management
Question
5 answers
In the Mental Health Department of the Navarre Public Health Service a Group Therapy Unit has been recently created (August 2018). We are gathering information about management, cost-effectiveness and clinical efficacy, but we would like to know if there are other similar experiences around the world.
We do know that group therapy is a common service provided by mental health departments, but always within wider inpatient or outpatient units that also provide other treatments.
Our inquiry is if there are other specific Group Therapy Units that specialize in providing only group format interventions.
We are interested in sharing data, management indicators identified, clinical and process variables assessed etc.
Relevant answer
Answer
It will be hard to conduct group therapy virtually!
  • asked a question related to Data Management
Question
3 answers
In our organization researchers work with simulations. These simulations have code, models and the data from these simulations. Each of them has different versions producing different outputs. All of these data is saved currently in Network Drives in a very disorganized fashion.
Are there any known tools/software which is used to handle such an scenario where these data can be easily linked to each other? simulation--data--code this link should be clear among different versions.
Relevant answer
Answer
No. You could try to standardize the simulation software but there are advantages and disadvantages to each. Your researchers are currently trying to pick the best software for their problems. If you want to maximize good research you let the researchers decide. Other than that, what is your goal? As a retired professor I can assure you that only allowing one statistical package on campus has never worked because not every package does everything. The same is true for simulation. The reason I use R now is because my university mandated SPSS which was of no help in my research. Thus I moved to R and I'm glad my university did that because I never would have discovered the power of R without the SPSS mandate. Other departments required everyone to purchase their department's favorite software. R is not only the most powerful it is free and open source. The question you are asking is really, which set of problems do you want to live with? With R I don't have either since I can ignore the mandate because of Rs power and cost. I recommend don't standardize(mandate) a single package for everyone. If you do then your simulation situation will be like our statistics one was. May the force be with you. David Booth
  • asked a question related to Data Management
Question
3 answers
Non SQL DB will improve data management and optimize consults in Big Data analysis on financial institutions? Any help will be thanked
Relevant answer
Answer
Possibly non sql DBs like MongoDB provide object based programming/scripting facilities for data storage and retrieval. It is the task of the developer to write script for optimization with respect to storage and retrieval. Some inbuilt function are there to move forward with optimization tasks, I think.
  • asked a question related to Data Management
Question
5 answers
I would love to see examples of your stats sheet/screen shot of your data collector.
I am also interested in what the data gets used for.
Thanks,
Rachel
Relevant answer
Answer
on paper
  • asked a question related to Data Management
Question
9 answers
Data Mining and Big data cover the subject of Artificial intelligence or these terms also discuss in the context of Data Literacy or Data Management in the context of Library and information science?
  1. Do librarians data literacy skills remain the same as the Data Scientist skill? If data scientist skill the higher than Librarian data literacy skills inf future librarian job market replace by the librarian?
  2. What should librarian do to enhance data literacy skills ?
Any study (Dissertation, Model, Conference Paper, Poster discussed the data literacy in the context of AI (Big Data and Data mining) application in Library (ies).
_Yousuf
Relevant answer
Answer
Hi Pohammad,
Big data/data mining, business intelligence/analytics, data science, distant reading, knowledge discovery etc. are all different terms used in different disciplines to denote essentially the same thing: statistical analysis and discovery of novel patterns from data and presenting them in the form conducive to human consumption. Meliha
  • asked a question related to Data Management
Question
3 answers
In my openion, the chemical industry is one of the most important industries in the world. Not only do 90% of our everyday products contain chemicals, but the industry also employs approximately 10 million people. Naturally, they were one of the first to embrace digital technologies such as process control systems or sensors which have a long tradition in production.
A continuous digital transformation plays a crucial role in several key aspects of the industry. Accenture has identified the six most influenced areas.
  • Higher Levels of Efficiency and Productivity
Increase competitive advantages and further decreases the costs through operational optimizations.
  • Innovation through Digitalization
Helps boost the productivity in R&D and thus decrease the time till market entry.
  • Data Management and Analytics
An improved understanding of customer needs and so the optimization of offerings are integral contributors to any company’s success.
  • Impact on the Workforce
Tasks and opportunities, as well as the job requirements, will experience a change with digital transformation. “Finally, technology will take an even greater role in upskilling and training employees, and in knowledge management.” (World Economic Forum & Accenture, 2017)
  • Digitally enhanced offerings
An increasingly important aspect of product performance especially for close-to-end customer markets.
  • Digital Ecosystems
Being separated into Innovation, Supply and Delivery, and Offering Ecosystem.
Industry 4.0, however, does not only include aspects of digitalization but mainly artificial intelligence, robotics, Internet of Things (IoT) or advanced materials. In the table below their impact on chemical products is illustrated.
Please elabotate are your thoughts about it also.
Relevant answer
Answer
Thanks, clear!
  • asked a question related to Data Management
Question
4 answers
Out of your experience: which software for clinical trial data management (including fMRI) can you recommend/advice against?
Relevant answer
Answer
Hi Anna, RedCap ( https://www.project-redcap.org/ ) should have all you need for simple clinical trials.
  • asked a question related to Data Management
Question
5 answers
I have a dataset in csv format with over 22k rows and I need to transpose it. An excel sheet is limited to 16384 columns; so it doesn't let me transpose. I have spent much time to transpose it until I realised I can't do it due to size limit of Excel sheet.
I have transposed the data now in Matlab. But I can't export the data out since the excel sheet can't handle it. Working with excel sheet is much more convenient for me for data management and data storage for future reference, although rest of my work is in Matlab.
Can anyone suggest any alternate way out so I can store the transposed data as a CSV file on my cloud?
Thanks.
Relevant answer
Answer
Thank you everyone. All the answers were really helpful to cross my hurdle.
  • asked a question related to Data Management
Question
3 answers
i'm working on a real time data management system for social media data. the goal of this work is to improve the process of decision-making in E-government policy. This requires knowing what the best database for this kind of system MongoDB or any other please suggest me.
Relevant answer
Answer
Cloud era
  • asked a question related to Data Management
Question
4 answers
I coordinate field operations for a farm with more than 5000 acres of blueberries and cranberries, divided into a number of fields ranging from 10 - 125 acres each. I was hoping to bring in a new system of data management for better grip on money and money out of each field.
By data, I mean chemical input, equipment usage, labour charges, and yields to have a better grip over every part of farm. This will help us plan variable applications and management.
As of now, I am looking for a GIS solution/software for data management, visualisation, analysis (to support third party statistical software), and decision-making.
Relevant answer
Answer
QGIS
  • asked a question related to Data Management
Question
4 answers
Dear colleagues, please help out with this challenge.
Downloaded Landsat 8 imagery for my study location from USGS earth explorer. I extracted the zip folder and added to Arcgis using
Data management>Raster>Raster processor>composite bands
I input the bands in order 754 and added an output location. Then I ran the analysis.
Alas, it doesn't perform the analysis. A white background appears in place of composite image. I repeated for bands 431 but got same results.
I need suggestions on what to do to get a composite of the bands for land use analysis.
Many thanks
Relevant answer
Answer
Dear Frank,
try to open the Landsat images in ArcMap via "Add Data" --> Browse to the unzipped folder --> chose the "..._MTL.txt" that has a raster symbol.
Arcmap will now recognize the different bands and will stack them. You can now export the data and save them as a composite.
If you downloaded the uncalibrated data you can use Arcmap in addition to perform the Top of Atmosphere (TOA) Correction. To do so, open go to "Windows" --> "Image Analysis" --> select the data, e.g. "Multispectral_LC8..." --> "Processing: Add Function" --> rigth click "Stretch Function" --> "Insert Function" --> "Apparent Reflectance Function" --> Check "Albedo", --> In "General" tab chose "32 Bit Float". Now save and export the new raster, named for example "Func_Multispectral_LC8..."
Some links that migth help:
  • asked a question related to Data Management
Question
3 answers
is there any new data on management of mild cognitive impairment
Relevant answer
Answer
" Technology-based cognitive training and rehabilitation interventions show promise, but the findings were inconsistent due to the variations in study design. " [Ge & Zhu & Wu 2018]
  • asked a question related to Data Management
Question
15 answers
We have 40 probes out in rivers and streams collecting conductance, water level, time, and temperature. These will be taking measurements every 5 minutes for 2 years. LOTS OF DATA! Any recommendations for data management programs, would be good. Our data analyst wants to use excel.... I dont think that will cut it.
Thanks
Relevant answer
Answer
Meanwhile, water quality monitoring has been evolving to the latest wireless sensor network (WSN) based solutions in recent decades. This paper presents a multi-parameter water quality monitoring system of Bristol Floating Harbour which has successfully demonstrated the feasibility of collecting real-time high-frequency water quality data and displayed the real-time data online. The smart city infrastructure – Bristol Is Open was utilised to provide a plug & play platform for the monitoring system. This new system demonstrates how a future smart city can build the environment monitoring system benefited by the wireless network covering the urban area. The system can be further integrated in the urban water management system to achieve improved efficiency.
  • asked a question related to Data Management
Question
3 answers
I have 8 tiles of Aster DEM, which are having elevation range 212 m to 1766 m from all the tiles. After Mosaicing (Mosaic- Data Management tool in ArcGIS 10.3)all this tiles, the elevation range get suppressed to 228 m to 1108 m. Why it is happening ? Is there any other relevant method ?
Relevant answer
Answer
If the datasets are big, then it can be happened. What type of RS images you are using MODIS, Landsat or Sentinel?
  • asked a question related to Data Management
Question
2 answers
My objective is to find the relevancy of using big data in the science and technology sector of Ethiopia and knowing how to implement it. As the science and technology sector reaches many industries, i want my research to give a basic guideline in utilization of big data technology in this area.
Relevant answer
Answer
Thank you Professor Ette Etuk . As Ethiopia is somehow new to the technology, i wanted to generalize the management framework of big data, not just the analytics. I understand the analytics is an important part of big data but i wanted to start from the beginning Is there such a method that i can recommend as a policy guideline??
  • asked a question related to Data Management
Question
5 answers
Hello amazing colleagues,
I'm MBA student, have a huge passion about Innovation especially (machine learning and AI), and I have experience in construction field in Middle east as System support and data management. So I picked this subject for my dissertation this Oct. I hope to make distinguish work, which could help me to secure a scholarship for PHD in the same subject.
So I'm seeking your help and advice for what and where I can link between these two topics, and what are the new research area between ML/AI and construction, and what is the new trend so I can work on.
I know your time is really valuable, so I'm really appreciate your help and advice.
Relevant answer
Answer
Basically anything you want to predict, estimate can be used in Construction industry. Some particularly application like: Fault detection (before/during/after construction); recommendation on architecture, color, etc.; price/construction time/resource prediction, when is the best time to start construction etc.; 3D reconstruction from 2D architecture, and many more. Many of these applications are related to computer vision.
It seems like some recommendation system are more related to your background.
Disclaimer. I am not expert in this topic but maintain a general interest on AI's applications.
  • asked a question related to Data Management
Question
5 answers
Recently, I wanted to include a variable into a data table using mutate function in R. Unfortunately, I ended up completely losing the columns of the original data set in the process. Is it possible to undo any manipulations in R? How?
Relevant answer
Answer
Usually, unless you're perfoming time consuming operations like multiple imputation of missing data, loading the original data and doing all data transformation "on-the-fly" (to repeat all manipulations) is done very quickly, especially with packages like dplyr, tidyr or data.table.
If you really want to save interim steps, simply save the next steps in another object (data.frame), and finally save your complete workspace. There's a short, helpful SO-thread on this topic: https://stackoverflow.com/questions/34539011/saving-in-r-studio
I really recommend 1) using projects; 2) importing the raw data; 3) doing all data wrangling (manipulations) by writing a clean, commented script. Optionally, 4) save your workspace. Unlike other statistical software packages, R has the advantage that you're less often in the situation to save all your changes, as beginning from scratch by running your script is very fast.
  • asked a question related to Data Management
Question
5 answers
I have got data on Pressure Injuries. Each row tells me how many pressure injuries each person has. Then I also have variables such as toe, foot, hand, and so on, which specify the stage of pressure injury. I want to break these variables apart but I am missing something. Current data is as follows:
ID No of PI Foot1 Arm Hand
1 1 Stage1
2 3 Stage 3 Stage 3 Stage 1
3 2 Stage 4 Stage 1
I want the data to look like:
D No of PI PI 1 Location PI 1 Stage PI 2 Location PI 2 stage
1 1 Foot 1 Stage1 Arm Stage 1
2 3 Foot 1 Stage 4 Hand Stage 2
3 2 Foot 1
I have tried using the recode and compute function and it works for the first pressure injury. But it doesn't seem to work for the second and so on.
Can you please offer suggestions, thanks in advance .
Relevant answer
Answer
Please refer to:
1. Landau, S. and Everitt, B. S.(2004). A Handbook of Statistical analyses using SPSS.
2.Howitt, D. and Cramer, D.(2008). Introduction to SPSS.
Regards,
Zuhair
  • asked a question related to Data Management
Question
4 answers
I graduated from business school with a MBA degree years ago. It has been more than 5 years since I worked as an IT marketing specialist but I think I may be able to learn more skills about data management /data mining/analysis so as to work as a data management specialist to do more research on data mining/management/analysis regarding health care. I have basic knowledge of statistics, database and research methods and keep working on certified health informatician Australiasia (CHIA) at this moment.
Please give me any clue that may help me out of the disorientation.
Relevant answer
Answer
Hi Allen,
With your credentials and background, you have plenty of opportunities. Having been in both industry and academia myself, I think your background is commensurate with both paths. If you consider industry, then you might look at corporate research and development (R&D) or some similar occupation where research and statistics are necessary, such as market analysis. Outside academia, you might also try to align yourself with a research-based think tank and pursue grant funding.
Good luck!
--Adrian
  • asked a question related to Data Management
Question
4 answers
Currently, I am encountering problem in data management to be analyzed, I had limited resources, however i had been advised to use data triangulation to data normalization and management.
Relevant answer
Answer
  • asked a question related to Data Management
Question
5 answers
I have thinking that have problems of data management within and sharing across the hospitals, that some hospitals have different systems so which factors cause to systems to share the data?
Relevant answer
Answer
Do you want to study that how hospitals in Tanzania are using data management system within hospital and how they share or you want to study that how to start this in hospitals of Tanzania?
  • asked a question related to Data Management
Question
4 answers
The cloud technology features its computing and storage capacity, and high reliability. The question is how we refrigeration and air-conditioning (RAC) professionals use the cloud computing, and cloud storage effectively and efficiently for our parts sizing, design calculations, test data processing and storage, and drawing data management?
Relevant answer
Answer
Thanks, Himadri! It's good information from China.
  • asked a question related to Data Management
Question
5 answers
I have 4 populations of different races that I am following for a period of time. I am looking for the incidence of a particular condition (lets call it condition A) after an event x. (lets say joining a particular business)... I find that one particular population (lets call it population k) has significantly lower incidence of the condition A compared to other populations . Upon further analysis I realize that event x occurred in most members of the population k a lot later than the other three populations.. (ie most members of the business joined the business a lot later than other 3 groups).
I want to know whether the lower incidence of condition A is due to the late occurrence of event x or do they actually have a lower incidence.
How do I approach this problem? I am thinking of getting kaplan meier curves of condition A in all 4 subgroups..what do I do there after?
Relevant answer
Answer
nice information thank you
  • asked a question related to Data Management
Question
17 answers
I am facing problem with GSEM in Stata because when I add my variable and run it, it takes longer time and still does not converge (all my variables are categorical). Then I have used SEM with these variables (16 latent and 4 observed) which are mostly ordinal and few binomial type. Total number of data I have used is around 400. I am using 13.1 version of Stata. Can I use SEM to analyse these variables? I am modelling these variables in SEM Builder instead of coding (model attached for your reference). Can you also suggest me the best method to create the reduce model in SEM output? I have multiplied the path coefficient of variables and done it manually.
Relevant answer
Answer
nice information thank you
  • asked a question related to Data Management
Question
3 answers
Can you please recommend any good open sources software for geotechnical borehole logs, field and lab testing and geotechnical modeling.
Relevant answer
Answer
You can check the link below:
  • asked a question related to Data Management
Question
5 answers
I need this book,please any one help me---"Data Provenance and Data Management in eScience", Qing Liu , Edited by  Quan Bai , Edited by  Stephen Giugni , Edited by  Darrell Williamson , Edited by  John Taylor ?
Relevant answer
Answer
Ahmed - this question has been answered. Not sure why you make your statement. The book can either be loaned or purchased.
  • asked a question related to Data Management
Question
7 answers
I have 38 semi-structured interviews and this will be my first time using a software program to conduct my coding. Usually I use a whiteboard and post-its but with the richness of these data, it is getting unwieldy. I would like a more efficient way to visualize things across participants. Thanks!
Relevant answer
Answer
Use Qiqqa, it has autotags as well as multiple tags for you to do your own coding. It will automatically suggest themes! It also does Brainstorms.
  • asked a question related to Data Management
Question
14 answers
We had estimated 600 samples for one study with national representation. In some districts due to the purpose of pilot intervention we collected additional 196 samples with same tools and methods. Would it be appropriate to include all the samples for national estimation?
Relevant answer
Answer
The concern is that the pilot samples are (in some way) sampling a different population than the full study. Maybe the pilot survey was done just prior to a very successful national ad campaign. The campaign changed the proportion of people that use that type of product. Say the survey was on those people who drink orange juice. The pilot survey was sampling a population where 80% of the people had at least one glass of orange juice per day. After the campaign, half the people that used to drink orange juice now drink cranberry juice. So the pilot and full surveys have very different populations.
One solution is to break the analysis into parts.
1) Analyze the full survey as one part.
2) Reanalyze the data with the pilot survey results included. Add a blocking variable coded as 0 if there is no pilot data for that area, 1 for pilot data, 2 for survey data where pilot data is also present. I would delete block=1, and ask if there was a significant block effect (is block=0 any different from block=2)? I would delete block=0 and ask if there was a significant block effect. If the answer in both cases is no, I would combine the data.
  • asked a question related to Data Management
Question
4 answers
I've been working on composting standard and composting process. Liquid waste (LW) is nowadays often uses as a feedstock. Although there is interest in using it, there are also several characteristics that would increase the risk for the composting process and impacts (air, water, soil). This is where empirical data and management practices need to be assessed to balance the risk of the use of LW in composting industry. Any current research or recent paper related to the process of LW in compost is welcome. Thank you
Relevant answer
Answer
Hi Ifeanyichukwu, this article appears really interesting and I made a request to get the full text. Thank you for sharing. Have a nice day. Regards
  • asked a question related to Data Management
Question
7 answers
I'm supplementing an already published genetic database with genes extracted from full mitochondrial genomes from genbank. I have all accession numbers for the complete genomes and I have previous gene sequences too.
I'm using R to do all the downloading and data management but i'm willing to use other open access tools to do the job. So far, my plan is to just download the full genome, replicate it in separate fasta files for each mitochondrial gene, align all the sequences for that specific gene and then trim sequences manually. I could also go genome by genome extracting the genes manually on genbank, but I figure there is a smarter and quicker way to do it.
I have never done anything like this, so I'm playing by ear here. Any tips are welcome.
Relevant answer
Answer
Found it: "AnnotationBustR: An R package to extract subsequences from GenBank annotations" Here: https://peerj.com/preprints/2920.pdf  
  • asked a question related to Data Management
Question
8 answers
Does anyone have some useful data management tips for qualitative case studies? I have 20 NGOs with staff interviews, observation, and document analysis occurring at each NGO. There is going to be a huge amount of data and I need to be able to cross reference and corroborate different data sets with one another. I will be using nVivo to code the documents and interviews and observations will be linked to organisations and individuals. I also need to keep track of NGO and interviewee demographic details, interview and observation dates/hours/locations etc, and types of documents (formal/informal, audience, author, purpose etc). So much data! I was thinking I might use nVivo for managing most of it as I can link observations and demographics to transcripts and documents. I've just been using excel to keep track of the other data (eg number of times an interviewee has been interviewed/duration/location etc) however I don't know that I've set that spreadsheet up in the best way possible. It would be great if anyone has good ideas! Thanks! 
Relevant answer
Answer
Dear Leanne,
I was in your situation 20 years ago with the same number of companies (20) and mixed data (interviews, observations, financial reports and survey results).
There are several major elements to handle the information
1) first, you create a standard SPSS file presenting each interviewee as a separate case (line) and adding some specific variables obtained from interviews and common variables for interviewees from the same companies from observations and documents. In this way, you will have the possibility to run different statistical checks on concordance/disagreement of opinions of interviews on the same topic, and also to include all data about a particular interview (date, time, etc.)
2) to make such a database useful you must create specific variables to present your impressions from observations and general results from interviews  as numerical variables. I made such variables as "SOC" for my impressions from interviews with CEOs on their preoccupation with employees needs, GOV for my impression from interviews with CEOs on their relations with local authorities etc.
3) Equally important is to design specific metrics for particular actions that reflect your general impression on a NGO resulted from the whole set of your data. For this you may look at the enclosed file. 
Success!
Igor Gurkov
  • asked a question related to Data Management
Question
3 answers
I have baseline and follow up group, they are same people (paired) but in each group I have three categories, so to test the changes I cannot apply MacNemar test since it works for 2x2
Relevant answer
Answer
I presume you mean 3 x 3!
When we extend the McNemar test, there are two hypotheses we can test. The first is asymmetry - that the off-diagonal values are asymmetric. The second is marginal homogenity - a test that the frequencies along the table margins are similar.
In genetics, the test is called transmission/disequilibrium test (TDT) and is used to test the association between transmitted and nontransmitted parental marker alleles to an affected child. I don't know if SPSS does the test. For sure, Stata and R will do it.
  • asked a question related to Data Management
Question
5 answers
Looking for an effective research method - my current method is time consuming.
Relevant answer
Answer
Yes, indeed, dear Ljubomir.
Good initial planning of work and tasks with clear definition of objectives and general aim of work, are the keywords to successful results of our project.
The strategy of work should include a careful schedule of tasks.
A good way to evaluate the quality of work is to check if the conclusions match the main objectives drawn.
  • asked a question related to Data Management
Question
2 answers
An experimental field was conducted at Mehoni ARC, Ethiopia having five varieties, five levels of harvesting age and three numbers of nodes. all the required data was collected properly. However, during analysis the main effect mean comparison was displayed with their respective letter but the two way and three way interaction was not. therefore, if there is any other option please remind me.
Relevant answer
Answer
when using 0 1 coding, like genmod, the main effect only show the difference among the reference group, when interaction is used. If 1, -1 coding, the main effect value is not the difference with reference group, you need find the parameter by sum of others levels than multiply by -1. You can use contract+estimate option to get overall comparison as Firas's example, or conditional comparison. The main effect for categorical measure difference, the two ways interaction measure difference of difference, if the difference by one factor the same across the level for other factor. Three way interaction measure difference of difference of difference.
  • asked a question related to Data Management
Question
7 answers
Hi all.
I have a variable in stata which has the following structure:
[t, Variable]
1   0
2   0
3   0
4   1
5   0
6   0
7   0
8   2
9   0
Where 1 and 2 shows the starting and the ending period for an specific event, respectively.
I would like to know how to compute the length of time for the event using stata, which would be 5 periods for this particular example.
I know this could be easily achieved in R with a very simple code like the one I present below:
A = rep(0,20)
A[c(4,15)] = 1
A[c(6,19)] = 2
Begin        = which(A==1)
End           = which(A==2)
Total.Time.of.Event = rep(NA,20)
for(i in 1:length(Begin)){  
    Total.Time.of.Event[Begin[i]:End[i]]  =  End[i] - Begin[i] + 1
}
cbind(A,Total.Time.of.Event)
Thus, the output for the cbind() instruction would be:
          [A ,Total.Time.of.Event]
[1,]    0    NA
[2,]    0    NA
[3,]    0    NA
[4,]    1    3
[5,]    0    3
[6,]    2    3
[7,]    0    NA
[8,]    0    NA
[9,]    0    NA
[10,]  0   NA
[11,]  0   NA
[12,]  0   NA
[13,]  0   NA
[14,]  0   NA
[15,]  1   5
[16,]  0   5
[17,]  0   5
[18,]  0   5
[19,]  2   5
[20,]  0   NA
Any help with this would be greatly appreciated,
Best regards
Juan
Relevant answer
Answer
You can use the egen command in stata.
Can you please give me the English syntax. This may help me provide some more assistance....
I mean something like:
[Sn,] A Duration
  • asked a question related to Data Management
Question
11 answers
Dear colleagues, 
Likert scale is widely used in medical and social sciences, it usually gives participants five responses, 1: Strongly Disagree, 2: Disagree, 3: Neutral, 4: Agree, 5: Strongly Agree. how should neutral responses be understood and what should be entered to SPSS data sheet? in more simple words, to determine the percentage of agreement or disagreement with certain item, are all Neutral response will be omitted?
Thank You
Relevant answer
Answer
  • asked a question related to Data Management
Question
3 answers
I wanted to know the exact steps in doing Recurrent Event Data Analysis in excel. I have time to event data . I want to find the distribution and parameters after checking the data for IID.
 Any experts please help me with that.
Relevant answer
Answer
I do have a question on no Parametric method. If we use MCF method how to estimate the future failures and what will be the distribution . Basically I want to do system modeling using monte carlo. so my excel sheet is based on reliability equations which is different for each distribution
  • asked a question related to Data Management
Question
16 answers
There are certain sayings in research that data become normal when the sample size is large. What is the sample size for assuming the data to be normal? Andy field refers to sample size above 30 as large data in his book (if i am right) which seems to be more applicable to a medical science data. But in case of social science research what would be the larger sample size for assuming normality. Is there any literature for the same?
Relevant answer
Answer
Dear Rajkumar,
I agree with Rudolf. It depends on the distribution and the statistic you are trying to obtain.
As per my knowledge, for climatic variables, the period of 30 years is considered as a standard to assess the changes. For Mann-Kendall non-parametric test, which is used to calculate trend, if the no. of years is greater than 10, the test assumes it to be normal.
Furthermore, when you go for normality tests for small dataset, I have observed from various literatures, if your n > 50 you should use kolmogorov-smirnov, and if n<50, should go for Shapiro-Wilk. Again, as described by Rudolf, the rate convergence remarakably varies for different distribution also.
You can refer the book "The SAGE Encyclopedia of Social Science Research Methods" for various statistical queries.
I hope this helps.
All the best !
  • asked a question related to Data Management
Question
3 answers
Hello! My outcome is the ceod index, is an average or median to population level, but based on count individual data. The ceod index is the addition of teeth with decay, teeth with extraction indication and filled teeth by individual. Then, one calculates the average ceod for the population in study. Therefore, the distribution is with excess zeros (about 30% of the observations) and over-dispersion.
Relevant answer
Answer
Dear Maria Jose Monsalves 
please check the resources
  • asked a question related to Data Management
Question
9 answers
I am trying to transform a vector dataset using the BoxCox command in R which contains a few 0 values and the result shows "Transformation requires positive data". After doing further internet study I found that the two parameter Box-Cox transformation is suitable for my case. I have tried the code as following:
X4.tr <- boxcoxfit(X4$x, lambda2 = T)
where X4 is the dataframe which contains the vector of dataset "x".
This yields in the optimized value of "lambda" and "lambda2" however, I am not sure what code to use in order to transform the whole vector of data in x although I tried using the following code to transform:
X4.trans <- BoxCox(X4$x, 0.71027)
where 0.71027 is the value of lambda estimated using the boxcoxfit function. This results in negative value for the 0 values of the original vector. After changing the lambda value in the above code with the lambda2 value identified by boxcoxfit, yet it results in negative values for 0. 
I would really appreciate suggestions for the possible code to perform the two parameter Box-Cox transformation is I am on the wrong path.
Thanks 
Relevant answer
Answer
###  Check over the following for any errors before use
if(!require(geoR)){install.packages("geoR")}
Turbidity = c(1.0, 1.2, 1.1, 1.1, 2.4, 2.2, 2.6, 4.1, 5.0, 10.0, 4.0, 4.1, 4.2, 4.1, 5.1, 4.5, 5.0, 15.2, 10.0, 20.0, 1.1, 1.1, 1.2, 1.6, 2.2, 3.0, 4.0, 10.5)
hist(Turbidity, col="gray")
library(geoR)
T.box = boxcoxfit(Turbidity, lambda2=TRUE)
lambda = T.box$lambda[1]
lambda2 = T.box$lambda[2]
if(lambda==0){T.norm = log(Turbidity + lambda2)}
if(lambda!=0){T.norm = ((Turbidity + lambda2) ^ lambda - 1) / lambda}
hist(T.norm, col="gray")
  • asked a question related to Data Management
Question
4 answers
If the study has been powered at 80%, and a sample size is estimated to be 79 subjects per group, but eventually the size of each group ends up being 140; what would be the impact of this on the study results?
Relevant answer
Answer
There are several ways to interpret this question. I will assume that you are generally familiar with the effect of sample size on power, accuracy, specificity, type I and type II errors, and so forth. You would have had to go through this when planning the original sample size.
There are many good things that happen with increased sample size. There is one bad aspect to increased sample size. With the larger sample size you are able to detect smaller treatment effects. So you can find a statistically significant difference that is too small to be biologically meaningful. However, there is seldom enough information about the system being studied to be able to balance the treatment effect against all other sources of biological variability. Typically biological experiments in the lab are too controlled, and this can result in unexpected field results following promising laboratory experiments.
Maybe you want to quantify the benefit realized from increasing the sample size. In this activity I would resample the data and run the analysis with several thousand random selections of 79. You now have a distribution of what might have been the outcome if you had stopped at 79 replicates. On this graph plot the results using all the data to show the effect of having increased the sample size. A less powerful approach would be to compare the results from the first 80 to the results using all the data. Had things worked differently, one would presume that you would have simply stopped with the first 80 in the original design.
In terms of formulating the null hypotheses, increasing sample size will have no effect. However, sometimes increasing the sample size can enable you to look at factors that you would have had to ignore with the smaller sample size. Are there differences between females and males, ethnic differences, age, weight, and so forth.
Did you means something else?
  • asked a question related to Data Management
Question
2 answers
I have been working on ASI unit level data for period from 1983-84 to 1997-98. I am unable to detect the duplicates in these datasets. Is there anyone who else worked on ASI data and can help me on this issue?
Relevant answer
Answer
its done now. I need to know if anyone has used ASI data of eyar 1994-95
  • asked a question related to Data Management
Question
6 answers
I want to evaluate cloud security and data management
Relevant answer
Answer
Dear Mohammad
Go for CloudSim. You can simulate most of the aspects required with respect to cloud in  that. But there are some limitations in CloudSim with respect to the working when we compare with real cloud environments.
Best Regards
  • asked a question related to Data Management
Question
3 answers
I have two columns of data. First column is H (solar radiation), second column is T (temperature). And I know the relationship between them. The relationship is H =a*[1-exp⁡{-b*(T)^c}]. a, b and c are emprical coefficients. And these coefficients are constants. I want to find best fitted emprical coefficients in this data set. I have got 3081 {H,T} values in Excel. Thanks.
Relevant answer
Answer
Quick and dirty approach:
1. estimate the value of a: it is equal to H at very high T
2. plot log(-log(1-H/a)) vs. log(x)
You should see a straight line.  If not - adjust (increase) a slightly until success.  The slope is then equal to parameter c, as the straight line equation reads: log(-log(1-H/a)) = log(b) + c*log(T).  The value of b may be found easily then, too.  But don't ask me what are the uncertainties of so obtained a, b, and c.  Anyway, it should be a pretty good start point (initial guess) to more advanced computations.
  • asked a question related to Data Management
Question
4 answers
The cohorts answered questions related to learning styles. The Internet collecting group gave total for each cohort to the questions that were on a scale from strongly agree to strongly disagree.
Relevant answer
Answer
Thanks, Stephen. I will let you know what I discover.
  • asked a question related to Data Management
Question
12 answers
Hello,
I'm working with string variables in SPSS and encountered a problem in managing the data. My variables are codes assigned to each observation, where observations are turns of speaking in a discussion. Sometimes an observation has multiple codes assigned to THE SAME variable (see the picture), i. e., I have more than one value in one cell (in POSFEED - PR-COG, PI). I need to spread out my codes so that each observation contains only 1 value in each cell, i. e. instead of POSFEED I want to have PR-COG and PI as 2 new variables with 0 and 1 in the cells. The problem is this: when I use the RECODE syntax, SPSS does not recode the cells which contain more than 1 value. I understand why, because when using the statement
RECODE POSFEED ('PI' =1)  INTO PI.
EXECUTE.
PI, PI in the same cell does not equal to only one PI.
However, I have a lot of data and want to avoid recoding things manually. I tried using different logical functions and statement, but none of them seem to solve my problem. Can anybody suggest any solution to this?
Thank you!
P. S. My thinking is that an operator for "contains" instead of "equals" ("is", =) could solve the problem, but I can't find it anywhere.
Relevant answer
Answer
Hi,
You can use INDEX to test for substrings.
COMPUTE PI = INDEX(POSFEED, "PI") > 0.
  • asked a question related to Data Management
Question
1 answer
BDMS = Big Data Management System
BVT = Best Version of Truth
BDM = Big Data Management
MDM = Master Data Management
Relevant answer
Answer
At some point of time "yes", please refer to "CAP theorem" (https://en.wikipedia.org/wiki/CAP_theorem) and various variations of consistency, especially you will be interested in "eventual consistency" (https://en.wikipedia.org/wiki/Eventual_consistency).
  • asked a question related to Data Management
Question
2 answers
Dear Researchers
I have the unit value index of exports data for X country from 1972 to 2015, with different base years but My question is how can I change the base year, for instance to change it be year 2000?
Relevant answer
Answer
Ok thanks a lot Dr Bento
  • asked a question related to Data Management
Question
6 answers
Hi everyone. I have conducted analysis on items of a competency test and found that there are 2 items have negative biserial correlation (-0.01 and -0.03). I and several panel of experts did not find any problem with the keys and distractors. Because, both items are difficult items, so they were very good to discriminate against top performers. Do I have to remove the items for the final test composition? Thank You
Relevant answer
Answer
How the experts think the items will perform matters less than how they actually did perform. I would recommend removing them.
  • asked a question related to Data Management
Question
10 answers
Now these days 'Big Data' is being considered a vital field of research ( rather I should say it is a buzzword in today's data science research community). This question, I want to float to discuss some real challenges pertaining to Big Data handling for various purposes. So many researchers are viewing this filed merely as extended field of distributed processing by simply dividing the task into small pieces and then agglomerating the results to handle scalability issue( i.e. simple design of map and reduce procedures).
I request respondents to provide references of vital resources, which are posing challenges( theoretical as well as practical aspects) to understand Big Data environment in true sense and to understand difference between distributed environment and big data environment.
  • asked a question related to Data Management
Question
5 answers
We have commissioned a survey company to undertake a telephone survey for us (n=500). The sample has been recruited from people who have taken part in a large national survey (Health Survey England) which is conducted every year with a different randomly selected population sample. We are speaking specifically to people with more than one long-term health condition.
The questionnaire data has been collected and the survey company are planning to provide us with the dataset next week. They have asked us if we would like them to weight the data (for an additional fee). I am not an expert on weighting large datasets, and am unsure whether this is necessary. Was wondering if anyone had any thoughts on the matter, guidance on what I should be considering in this decision, or indeed could point me in the direction of any resources which might help me make this decision.
We would like the data to be broadly generalisable to this population. The sample has been recruited from 3 different years worth of HSE participants so we could get the numbers. I am assuming that they were more difficult to recruit from the earlier years as people will have changed phone numbers, become more ill, etc. I will need to check this assumption with the survey company.
Would really appreciate any advice. We are a charity and don't really have resources for additional expenditure, but I don't want to jeopardise the quality of the data.
Thank you
Relevant answer
Answer
Hi Karen
If your main interest is prevalence estimates that are generalisable to the target population, then you should really undertake a weighted analysis - even if it costs more. If your main interest is associations between variables (typically using some form of regression analysis), then weighting will make little difference to the results.
Adrian
  • asked a question related to Data Management
Question
2 answers
DevonThink is a software that is available only for Apple.
Relevant answer
Answer
I have not. Thanks for the suggestion. I will look into it.
  • asked a question related to Data Management
Question
14 answers
What is your opinion on applying Knowledge Representation tool to manage large documents and display them to user on request. Is there any tool which has this functionlaity with inbuilt reasoning algorithm/function?
I am looking for a tool to manage large documents with a built-in reasoning functionality. It should be able to read office documents e.g. word, powerpoint etc from database and display them to the user.
Relevant answer
Answer
Here is an interesting open-source data portal platform:
  • asked a question related to Data Management
Question
11 answers
I want to normalize my data using log10 in SAS. Please write the related program for me.
Thank you
Relevant answer
Answer
Thank you for the clarification.  So you are asking how to transform one variable that currently has a non-normal distribution using a logarithmic transformation in the hopes that it will give an approximately normal distribution.  The necessary SAS code is as follows:
DATA dataset2;
SET dataset1;
LOGVAR=log10(VAR);
RUN;
This would create a new variable LOGVAR that is the log10 transformation of the variable VAR. 
  • asked a question related to Data Management
Question
4 answers
I have calculated Transmit time of data packet as:packet length/data rate, now want to calculate receive time of data packet, because finally I have to calculate the delay, thanks in advance
Relevant answer
Answer
How you calculated transmit time, then I can exactly answer. Receive time would also have same things to be considered(hops processing, queuing, ready to transmit, recevie device acceptance time etc). I need to know what exactly you want? If you tell me how you have calculated transmit time then I can answer the exact thing.
  • asked a question related to Data Management
Question
10 answers
The dependent variable has a long tail and the splitting process ignores the large-value observations
Relevant answer
Answer
As far as I remember, regression trees are very robust to outliers and skewed distributions in EXPLANATORY variables. For the response/dependent variable, that's a different story... Transformations can make sense.
  • asked a question related to Data Management
  • asked a question related to Data Management
Question
3 answers
Working on Large scale data management systems, please suggest me to use CAP theorem 
Relevant answer