Science topic
Data Management - Science topic
Explore the latest questions and answers in Data Management, and find Data Management experts.
Questions related to Data Management
Looking for an effective research method - my current method is time consuming.
For implementing a Master Data Management (MDM) System for an Insurance Regulator, what should be the Key Performance Indicators (KPIs) during implementation and during operations?
Through my research I've developed a data management and analysis software, and am looking for beta testers to help me improve it.
If you are interested you can visit the website at
or send me a message.
climate action, cultural heritage, digital data management survey
What benefits? In which research center and university?
In the areas of blockchain, data management, and data security for AI or DL, which centers are most qualified or have investigations in these areas?
I have six kinds of compounds which I then tested for antioxidant activity using the DDPH assay and also anticancer activity on five types of cell lines, so I got two types of data groups:
1. Antioxidant activity data
2. Anticancer activity (5 types of cancer cell line)
Each data consisted of 3 replications. Which correlation test is the most appropriate to determine whether there is a relationship between the two activities?
This is a bit of a meta-question. I've developed a piece of software to help researchers with data management and analysis. The software is at a point now where I am looking for more beta testers.
Is there a good thread on Research Gate or other forums where it is appropriate to announce the software?
Can anyone help me with the Trial Data Management process?
I have one independent variables with two group: Physically and Chemically Activated Rice Husk
And I have three Parameters(dependent variable) to test it's effectiveness in Water Purification. What type of Data Management, Processing, and Analysis should I use for my study?
The title of my study is: "POTENCY OF ACTIVATED CARBON FROM RICE HUSK AS A COMPONENT FOR WATER PURIFICATION"
Hello,
I am starting a project and a white paper on this regard. I'm curious on how new developments on Big data and every-day new platforms related to this new concept of data management is affecting current decision making processes and techniques inside businesses, corporations, non-profit organizations? Are we being hit only by the first and novel effects or is it the case that we are in the middle of the storm already?
I'd like to know from anyone how can shed some light on this topic.
Cordialement,
oa
I am collecting questionnaire data in an epidemiological study from large numbers of participants (>1000). What is the best data management system/software for: data entry, data validation and data checking?
Who also uses eLabFTW for documentation of experiments and analysis? In what research area and what experience do you have?
In the form of an essay provide a critical analysis of data management within an organisation of your choice, showing the importance of effective and efficient data management systems and processes.
· Evaluate the decision making processes used in the business environment and how this is linked to their data management.
Finally link the evaluation of two communications theories into organisational decision making and data management procedures used showing how these rely on effective management and leadership practice.
For example; Data should be collected and distributed in a single center, or there should be a data management portal for each province as local governments or a separate center for transportation.
What are your thoughts on this matter?
Big Data is another trend in the world of ICT which has implication for information management. In this regard,can libraries manage Big Data in term of its volume,velocity, and varieties? The amount of data in the world today has been exploding. How could libraries help in managing Big Data?
Animal breeding is all about big data management and handling. The big data management is not possiable without application of computer in animal breeding. As data analysis and interpretation has pivotalrole in animal breeding. Hence, to have insight about the softwares used by animal breeders for data handling and analysis is necessory for a breeder is very important process.
Hi,
I am looking for open access tool for Master Data management.Using Talend Open Source for MDM, I am unable to create master data. Is there any other open access tool which could help me in this?
Dear respected colleagues,
I would like to check how worthy it is to spend some time implementing and developing a well tested tool for simulating Blockchain-assisted Fog-enhanced systems. Initially, I suggest the tool would allow users to choose the layer of the Fog Computing architecture at which they would like to place the Blockchain. Further, the tool allows to choose one of several available consensus algorithms for simulating different cases for each scenario. The services that can be provided by the Blockchain in this tool includes, but not limited to, Computational services (Smart Contracts), Data Management, Identity Management, and Payment Services between systems entities.
If such a simulation tool is available, how likely it is that you would use it in your research?
- What are the risks of an Industry 4.0 solution?
- How is data managed at this moment in time?
- Can the end user manage the velocity, variety, volume and veracity of its current information flow?
- What would happen if this was tripled or quadrupled?
- How does it support integration across the value chain?
I am writing a paper assessing unidimensionality of multiple-choice mathematics test items. The scoring of the test was based on right or wrong answers which imply that the set of data are in nominal scale. Some earlier research studies that have consulted used exploratory factor analysis, but with the little experience in data management, I think factor analysis may not work. This unidimensionality is one of the assumptions of dichotomously scored items in IRT. Please sirs/mas, I need professional guidance, if possible the software and the manual.
I am currently using NVivo for my constructivist grounded theory analysis. I have large volumes of data and I am using a variety of different sources. I think it works really well as a data management tool, however, it is crashing at least once every two hours and the recovery isn't always the latest version. I would be interested to know if other researchers are experiencing the same issues? If not, I would be keen to know the spec of the PC/laptop they are using. I am currently using a Dell Inspiron 15 5000.
In the Mental Health Department of the Navarre Public Health Service a Group Therapy Unit has been recently created (August 2018). We are gathering information about management, cost-effectiveness and clinical efficacy, but we would like to know if there are other similar experiences around the world.
We do know that group therapy is a common service provided by mental health departments, but always within wider inpatient or outpatient units that also provide other treatments.
Our inquiry is if there are other specific Group Therapy Units that specialize in providing only group format interventions.
We are interested in sharing data, management indicators identified, clinical and process variables assessed etc.
In our organization researchers work with simulations. These simulations have code, models and the data from these simulations. Each of them has different versions producing different outputs. All of these data is saved currently in Network Drives in a very disorganized fashion.
Are there any known tools/software which is used to handle such an scenario where these data can be easily linked to each other? simulation--data--code this link should be clear among different versions.
Non SQL DB will improve data management and optimize consults in Big Data analysis on financial institutions? Any help will be thanked
I would love to see examples of your stats sheet/screen shot of your data collector.
I am also interested in what the data gets used for.
Thanks,
Rachel
Data Mining and Big data cover the subject of Artificial intelligence or these terms also discuss in the context of Data Literacy or Data Management in the context of Library and information science?
- Do librarians data literacy skills remain the same as the Data Scientist skill? If data scientist skill the higher than Librarian data literacy skills inf future librarian job market replace by the librarian?
- What should librarian do to enhance data literacy skills ?
Any study (Dissertation, Model, Conference Paper, Poster discussed the data literacy in the context of AI (Big Data and Data mining) application in Library (ies).
_Yousuf
In my openion, the chemical industry is one of the most important industries in the world. Not only do 90% of our everyday products contain chemicals, but the industry also employs approximately 10 million people. Naturally, they were one of the first to embrace digital technologies such as process control systems or sensors which have a long tradition in production.
A continuous digital transformation plays a crucial role in several key aspects of the industry. Accenture has identified the six most influenced areas.
- Higher Levels of Efficiency and Productivity
Increase competitive advantages and further decreases the costs through operational optimizations.
- Innovation through Digitalization
Helps boost the productivity in R&D and thus decrease the time till market entry.
- Data Management and Analytics
An improved understanding of customer needs and so the optimization of offerings are integral contributors to any company’s success.
- Impact on the Workforce
Tasks and opportunities, as well as the job requirements, will experience a change with digital transformation. “Finally, technology will take an even greater role in upskilling and training employees, and in knowledge management.” (World Economic Forum & Accenture, 2017)
- Digitally enhanced offerings
An increasingly important aspect of product performance especially for close-to-end customer markets.
- Digital Ecosystems
Being separated into Innovation, Supply and Delivery, and Offering Ecosystem.
Industry 4.0, however, does not only include aspects of digitalization but mainly artificial intelligence, robotics, Internet of Things (IoT) or advanced materials. In the table below their impact on chemical products is illustrated.
Please elabotate are your thoughts about it also.
Out of your experience: which software for clinical trial data management (including fMRI) can you recommend/advice against?
I have a dataset in csv format with over 22k rows and I need to transpose it. An excel sheet is limited to 16384 columns; so it doesn't let me transpose. I have spent much time to transpose it until I realised I can't do it due to size limit of Excel sheet.
I have transposed the data now in Matlab. But I can't export the data out since the excel sheet can't handle it. Working with excel sheet is much more convenient for me for data management and data storage for future reference, although rest of my work is in Matlab.
Can anyone suggest any alternate way out so I can store the transposed data as a CSV file on my cloud?
Thanks.
i'm working on a real time data management system for social media data. the goal of this work is to improve the process of decision-making in E-government policy. This requires knowing what the best database for this kind of system MongoDB or any other please suggest me.
I coordinate field operations for a farm with more than 5000 acres of blueberries and cranberries, divided into a number of fields ranging from 10 - 125 acres each. I was hoping to bring in a new system of data management for better grip on money and money out of each field.
By data, I mean chemical input, equipment usage, labour charges, and yields to have a better grip over every part of farm. This will help us plan variable applications and management.
As of now, I am looking for a GIS solution/software for data management, visualisation, analysis (to support third party statistical software), and decision-making.
Dear colleagues, please help out with this challenge.
Downloaded Landsat 8 imagery for my study location from USGS earth explorer. I extracted the zip folder and added to Arcgis using
Data management>Raster>Raster processor>composite bands
I input the bands in order 754 and added an output location. Then I ran the analysis.
Alas, it doesn't perform the analysis. A white background appears in place of composite image. I repeated for bands 431 but got same results.
I need suggestions on what to do to get a composite of the bands for land use analysis.
Many thanks
is there any new data on management of mild cognitive impairment
We have 40 probes out in rivers and streams collecting conductance, water level, time, and temperature. These will be taking measurements every 5 minutes for 2 years. LOTS OF DATA! Any recommendations for data management programs, would be good. Our data analyst wants to use excel.... I dont think that will cut it.
Thanks
I have 8 tiles of Aster DEM, which are having elevation range 212 m to 1766 m from all the tiles. After Mosaicing (Mosaic- Data Management tool in ArcGIS 10.3)all this tiles, the elevation range get suppressed to 228 m to 1108 m. Why it is happening ? Is there any other relevant method ?
My objective is to find the relevancy of using big data in the science and technology sector of Ethiopia and knowing how to implement it. As the science and technology sector reaches many industries, i want my research to give a basic guideline in utilization of big data technology in this area.
Hello amazing colleagues,
I'm MBA student, have a huge passion about Innovation especially (machine learning and AI), and I have experience in construction field in Middle east as System support and data management. So I picked this subject for my dissertation this Oct. I hope to make distinguish work, which could help me to secure a scholarship for PHD in the same subject.
So I'm seeking your help and advice for what and where I can link between these two topics, and what are the new research area between ML/AI and construction, and what is the new trend so I can work on.
I know your time is really valuable, so I'm really appreciate your help and advice.
Recently, I wanted to include a variable into a data table using mutate function in R. Unfortunately, I ended up completely losing the columns of the original data set in the process. Is it possible to undo any manipulations in R? How?
I have got data on Pressure Injuries. Each row tells me how many pressure injuries each person has. Then I also have variables such as toe, foot, hand, and so on, which specify the stage of pressure injury. I want to break these variables apart but I am missing something. Current data is as follows:
ID No of PI Foot1 Arm Hand
1 1 Stage1
2 3 Stage 3 Stage 3 Stage 1
3 2 Stage 4 Stage 1
I want the data to look like:
D No of PI PI 1 Location PI 1 Stage PI 2 Location PI 2 stage
1 1 Foot 1 Stage1 Arm Stage 1
2 3 Foot 1 Stage 4 Hand Stage 2
3 2 Foot 1
I have tried using the recode and compute function and it works for the first pressure injury. But it doesn't seem to work for the second and so on.
Can you please offer suggestions, thanks in advance .
I graduated from business school with a MBA degree years ago. It has been more than 5 years since I worked as an IT marketing specialist but I think I may be able to learn more skills about data management /data mining/analysis so as to work as a data management specialist to do more research on data mining/management/analysis regarding health care. I have basic knowledge of statistics, database and research methods and keep working on certified health informatician Australiasia (CHIA) at this moment.
Please give me any clue that may help me out of the disorientation.
Currently, I am encountering problem in data management to be analyzed, I had limited resources, however i had been advised to use data triangulation to data normalization and management.
I have thinking that have problems of data management within and sharing across the hospitals, that some hospitals have different systems so which factors cause to systems to share the data?
The cloud technology features its computing and storage capacity, and high reliability. The question is how we refrigeration and air-conditioning (RAC) professionals use the cloud computing, and cloud storage effectively and efficiently for our parts sizing, design calculations, test data processing and storage, and drawing data management?
I have 4 populations of different races that I am following for a period of time. I am looking for the incidence of a particular condition (lets call it condition A) after an event x. (lets say joining a particular business)... I find that one particular population (lets call it population k) has significantly lower incidence of the condition A compared to other populations . Upon further analysis I realize that event x occurred in most members of the population k a lot later than the other three populations.. (ie most members of the business joined the business a lot later than other 3 groups).
I want to know whether the lower incidence of condition A is due to the late occurrence of event x or do they actually have a lower incidence.
How do I approach this problem? I am thinking of getting kaplan meier curves of condition A in all 4 subgroups..what do I do there after?
I am facing problem with GSEM in Stata because when I add my variable and run it, it takes longer time and still does not converge (all my variables are categorical). Then I have used SEM with these variables (16 latent and 4 observed) which are mostly ordinal and few binomial type. Total number of data I have used is around 400. I am using 13.1 version of Stata. Can I use SEM to analyse these variables? I am modelling these variables in SEM Builder instead of coding (model attached for your reference). Can you also suggest me the best method to create the reduce model in SEM output? I have multiplied the path coefficient of variables and done it manually.

Can you please recommend any good open sources software for geotechnical borehole logs, field and lab testing and geotechnical modeling.
I need this book,please any one help me---"Data Provenance and Data Management in eScience", Qing Liu , Edited by Quan Bai , Edited by Stephen Giugni , Edited by Darrell Williamson , Edited by John Taylor ?
I have 38 semi-structured interviews and this will be my first time using a software program to conduct my coding. Usually I use a whiteboard and post-its but with the richness of these data, it is getting unwieldy. I would like a more efficient way to visualize things across participants. Thanks!
We had estimated 600 samples for one study with national representation. In some districts due to the purpose of pilot intervention we collected additional 196 samples with same tools and methods. Would it be appropriate to include all the samples for national estimation?
I've been working on composting standard and composting process. Liquid waste (LW) is nowadays often uses as a feedstock. Although there is interest in using it, there are also several characteristics that would increase the risk for the composting process and impacts (air, water, soil). This is where empirical data and management practices need to be assessed to balance the risk of the use of LW in composting industry. Any current research or recent paper related to the process of LW in compost is welcome. Thank you
I'm supplementing an already published genetic database with genes extracted from full mitochondrial genomes from genbank. I have all accession numbers for the complete genomes and I have previous gene sequences too.
I'm using R to do all the downloading and data management but i'm willing to use other open access tools to do the job. So far, my plan is to just download the full genome, replicate it in separate fasta files for each mitochondrial gene, align all the sequences for that specific gene and then trim sequences manually. I could also go genome by genome extracting the genes manually on genbank, but I figure there is a smarter and quicker way to do it.
I have never done anything like this, so I'm playing by ear here. Any tips are welcome.
Does anyone have some useful data management tips for qualitative case studies? I have 20 NGOs with staff interviews, observation, and document analysis occurring at each NGO. There is going to be a huge amount of data and I need to be able to cross reference and corroborate different data sets with one another. I will be using nVivo to code the documents and interviews and observations will be linked to organisations and individuals. I also need to keep track of NGO and interviewee demographic details, interview and observation dates/hours/locations etc, and types of documents (formal/informal, audience, author, purpose etc). So much data! I was thinking I might use nVivo for managing most of it as I can link observations and demographics to transcripts and documents. I've just been using excel to keep track of the other data (eg number of times an interviewee has been interviewed/duration/location etc) however I don't know that I've set that spreadsheet up in the best way possible. It would be great if anyone has good ideas! Thanks!
I have baseline and follow up group, they are same people (paired) but in each group I have three categories, so to test the changes I cannot apply MacNemar test since it works for 2x2
An experimental field was conducted at Mehoni ARC, Ethiopia having five varieties, five levels of harvesting age and three numbers of nodes. all the required data was collected properly. However, during analysis the main effect mean comparison was displayed with their respective letter but the two way and three way interaction was not. therefore, if there is any other option please remind me.
Hi all.
I have a variable in stata which has the following structure:
[t, Variable]
1 0
2 0
3 0
4 1
5 0
6 0
7 0
8 2
9 0
Where 1 and 2 shows the starting and the ending period for an specific event, respectively.
I would like to know how to compute the length of time for the event using stata, which would be 5 periods for this particular example.
I know this could be easily achieved in R with a very simple code like the one I present below:
A = rep(0,20)
A[c(4,15)] = 1
A[c(6,19)] = 2
Begin = which(A==1)
End = which(A==2)
Total.Time.of.Event = rep(NA,20)
for(i in 1:length(Begin)){
Total.Time.of.Event[Begin[i]:End[i]] = End[i] - Begin[i] + 1
}
cbind(A,Total.Time.of.Event)
Thus, the output for the cbind() instruction would be:
[A ,Total.Time.of.Event]
[1,] 0 NA
[2,] 0 NA
[3,] 0 NA
[4,] 1 3
[5,] 0 3
[6,] 2 3
[7,] 0 NA
[8,] 0 NA
[9,] 0 NA
[10,] 0 NA
[11,] 0 NA
[12,] 0 NA
[13,] 0 NA
[14,] 0 NA
[15,] 1 5
[16,] 0 5
[17,] 0 5
[18,] 0 5
[19,] 2 5
[20,] 0 NA
Any help with this would be greatly appreciated,
Best regards
Juan
Dear colleagues,
Likert scale is widely used in medical and social sciences, it usually gives participants five responses, 1: Strongly Disagree, 2: Disagree, 3: Neutral, 4: Agree, 5: Strongly Agree. how should neutral responses be understood and what should be entered to SPSS data sheet? in more simple words, to determine the percentage of agreement or disagreement with certain item, are all Neutral response will be omitted?
Thank You
I wanted to know the exact steps in doing Recurrent Event Data Analysis in excel. I have time to event data . I want to find the distribution and parameters after checking the data for IID.
Any experts please help me with that.
There are certain sayings in research that data become normal when the sample size is large. What is the sample size for assuming the data to be normal? Andy field refers to sample size above 30 as large data in his book (if i am right) which seems to be more applicable to a medical science data. But in case of social science research what would be the larger sample size for assuming normality. Is there any literature for the same?
Hello! My outcome is the ceod index, is an average or median to population level, but based on count individual data. The ceod index is the addition of teeth with decay, teeth with extraction indication and filled teeth by individual. Then, one calculates the average ceod for the population in study. Therefore, the distribution is with excess zeros (about 30% of the observations) and over-dispersion.
I am trying to transform a vector dataset using the BoxCox command in R which contains a few 0 values and the result shows "Transformation requires positive data". After doing further internet study I found that the two parameter Box-Cox transformation is suitable for my case. I have tried the code as following:
X4.tr <- boxcoxfit(X4$x, lambda2 = T)
where X4 is the dataframe which contains the vector of dataset "x".
This yields in the optimized value of "lambda" and "lambda2" however, I am not sure what code to use in order to transform the whole vector of data in x although I tried using the following code to transform:
X4.trans <- BoxCox(X4$x, 0.71027)
where 0.71027 is the value of lambda estimated using the boxcoxfit function. This results in negative value for the 0 values of the original vector. After changing the lambda value in the above code with the lambda2 value identified by boxcoxfit, yet it results in negative values for 0.
I would really appreciate suggestions for the possible code to perform the two parameter Box-Cox transformation is I am on the wrong path.
Thanks
If the study has been powered at 80%, and a sample size is estimated to be 79 subjects per group, but eventually the size of each group ends up being 140; what would be the impact of this on the study results?
I have been working on ASI unit level data for period from 1983-84 to 1997-98. I am unable to detect the duplicates in these datasets. Is there anyone who else worked on ASI data and can help me on this issue?
I want to evaluate cloud security and data management
I have two columns of data. First column is H (solar radiation), second column is T (temperature). And I know the relationship between them. The relationship is H =a*[1-exp{-b*(T)^c}]. a, b and c are emprical coefficients. And these coefficients are constants. I want to find best fitted emprical coefficients in this data set. I have got 3081 {H,T} values in Excel. Thanks.
The cohorts answered questions related to learning styles. The Internet collecting group gave total for each cohort to the questions that were on a scale from strongly agree to strongly disagree.
Hello,
I'm working with string variables in SPSS and encountered a problem in managing the data. My variables are codes assigned to each observation, where observations are turns of speaking in a discussion. Sometimes an observation has multiple codes assigned to THE SAME variable (see the picture), i. e., I have more than one value in one cell (in POSFEED - PR-COG, PI). I need to spread out my codes so that each observation contains only 1 value in each cell, i. e. instead of POSFEED I want to have PR-COG and PI as 2 new variables with 0 and 1 in the cells. The problem is this: when I use the RECODE syntax, SPSS does not recode the cells which contain more than 1 value. I understand why, because when using the statement
RECODE POSFEED ('PI' =1) INTO PI.
EXECUTE.
PI, PI in the same cell does not equal to only one PI.
However, I have a lot of data and want to avoid recoding things manually. I tried using different logical functions and statement, but none of them seem to solve my problem. Can anybody suggest any solution to this?
Thank you!
P. S. My thinking is that an operator for "contains" instead of "equals" ("is", =) could solve the problem, but I can't find it anywhere.

BDMS = Big Data Management System
BVT = Best Version of Truth
BDM = Big Data Management
MDM = Master Data Management
Dear Researchers
I have the unit value index of exports data for X country from 1972 to 2015, with different base years but My question is how can I change the base year, for instance to change it be year 2000?
Hi everyone. I have conducted analysis on items of a competency test and found that there are 2 items have negative biserial correlation (-0.01 and -0.03). I and several panel of experts did not find any problem with the keys and distractors. Because, both items are difficult items, so they were very good to discriminate against top performers. Do I have to remove the items for the final test composition? Thank You
Now these days 'Big Data' is being considered a vital field of research ( rather I should say it is a buzzword in today's data science research community). This question, I want to float to discuss some real challenges pertaining to Big Data handling for various purposes. So many researchers are viewing this filed merely as extended field of distributed processing by simply dividing the task into small pieces and then agglomerating the results to handle scalability issue( i.e. simple design of map and reduce procedures).
I request respondents to provide references of vital resources, which are posing challenges( theoretical as well as practical aspects) to understand Big Data environment in true sense and to understand difference between distributed environment and big data environment.
We have commissioned a survey company to undertake a telephone survey for us (n=500). The sample has been recruited from people who have taken part in a large national survey (Health Survey England) which is conducted every year with a different randomly selected population sample. We are speaking specifically to people with more than one long-term health condition.
The questionnaire data has been collected and the survey company are planning to provide us with the dataset next week. They have asked us if we would like them to weight the data (for an additional fee). I am not an expert on weighting large datasets, and am unsure whether this is necessary. Was wondering if anyone had any thoughts on the matter, guidance on what I should be considering in this decision, or indeed could point me in the direction of any resources which might help me make this decision.
We would like the data to be broadly generalisable to this population. The sample has been recruited from 3 different years worth of HSE participants so we could get the numbers. I am assuming that they were more difficult to recruit from the earlier years as people will have changed phone numbers, become more ill, etc. I will need to check this assumption with the survey company.
Would really appreciate any advice. We are a charity and don't really have resources for additional expenditure, but I don't want to jeopardise the quality of the data.
Thank you
DevonThink is a software that is available only for Apple.
What is your opinion on applying Knowledge Representation tool to manage large documents and display them to user on request. Is there any tool which has this functionlaity with inbuilt reasoning algorithm/function?
I am looking for a tool to manage large documents with a built-in reasoning functionality. It should be able to read office documents e.g. word, powerpoint etc from database and display them to the user.
I want to normalize my data using log10 in SAS. Please write the related program for me.
Thank you
I have calculated Transmit time of data packet as:packet length/data rate, now want to calculate receive time of data packet, because finally I have to calculate the delay, thanks in advance
The dependent variable has a long tail and the splitting process ignores the large-value observations