Data Collection - Science topic
Systematic gathering of data for a particular purpose from various sources, including questionnaires, interviews, observation, existing records, and electronic devices. The process is usually preliminary to statistical analysis of the data.
Questions related to Data Collection
i am planning to collect blood samples from patients with a particular disease after their first visit in my department with the aim to compare the blood levels of some biomarkers with the general population. But this disease is rare, so i need around a year to collect a significant number of patients as well as data. Blood will be collected only during the first visit. As cross-sectional study by definition involves data collection at a specific point of time, what type is my observational study ?
Brief summary of study idea
- High achieving individuals (participants)
- six month period of data collection
- 6 methods of data collection including diaries, scales, interviews and observations.
- possible sample size?
As I was reading a paper by Zattoni et al. (2013) on corporate governance theories*, I came across a passage that explains the new shift to alternative or complementary theories with the aim to explore real-life governance issues. I quote: "to enhance our understanding of the effectiveness of governance mechanisms, scholars should gain access to process-oriented data, go beyond the almost exclusive use of agency theory, and overcome empirical dogmatisms and narrow conceptualizations of corporate governance"
What does process-oriented data exactly mean? And how is it different from data used in previous corporate governance research?
* Zattoni, A., Douglas, T., & Judge, W. (2013). Developing Corporate Governance Theory through Qualitative Research: Guest Editorial. Corporate Governance: An International Review, 21(2), 119–122. https://doi.org/10.1111/corg.12016
In recent years, quite a few reports have been published of the results based on the statistical information processing. For example, a study establishes that the use of a certain remedy (some food, drink, nutritional supplement, drug, treatment method, etc.) reduces (or increases) the value of some output parameter by 20 ... 30 ... 40%. The output parameter can be the frequency of onset of the analyzed disease, the frequency of its successful cure, etc. Based on this finding the conclusion is made that the studied factor significantly influences the output parameter. How trustable can such a conclusion be?
For further details see, please,
Hello Good time
Please, if you have an article with the research method of grand theory and with the method of collecting information from internet websites, send it to me. Thank you
If the data is collected by a PG level student as an assignment for which they are to be evaluated, can this data be further used for research work by a faculty member? how do we put this in research methodology section?
Data collected by student???
Can someone please clarify this?
What types of data will be required for measuring the WTP and WTA. What types of analytical model or tool be used to assess the WTP and WTA. How can I develop a questionnaire for required data collection.
I am conducting a mini-research project for my master's thesis. The research study design is interpretivism philosophy. I proposed a qualitative methodological choice. I suggested using a narrative inquiry strategy in which I developed oral questions but was unsure what effective tool to use to collect the data.
2 undergraduate researchers were tasked with doing data collection for a paper that was recently published. The data collection took months to complete. However, in the paper, there is no mention or acknowledgement of their names as the ones who collected the data.
There is a “Data Availability“ section that says that the data can be obtained by other researchers now using the same method and that the data set and analysis can be obtained via the undergraduate researchers’s professor (who is also first author). There is also an “Acknowledgements” section that says the undergraduate researchers’s professor/first author was funded by the university for completing the research (Yes, the undergraduate researchers were also granted a small amount).
Should the 2 undergraduate researchers be acknowledged in the paper for doing the data collection that was used for the paper?
I understand that a non-sampling error is a term used in statistics that refers to an error that occurs during data collection, causing the data to differ from the true values. To what extent is it possible to quantity the extent of 'non-sampling error' to the over margin of error? Knowing fully that the overall error is function of both sampling and non-sampling error?
Am conducting research on happiness at work among the teaching staff of one of the Universities in Nigeria, Enugu State University of Science and Technology in parenthesis Esut. and am looking for scale/questionnaire that can serve as my instrument for data collection.
I am evaluating a new English textbook. I found out the answers of teachers who are teaching this textbook in the class are very different from the answers of english teaching scholars to the same questions. How we can solve this problem at the time of giving a report to the authors of textbook to revise it?
Recently i have conducted a online survey work through google form.I want to know what could be it's methodology?
Help me suggesting any good articles where authors used/mentioned this types of survey process.
Thanks in advance!
This might be a very common question/challenge about quasi-experimental research. I have been doing a data collection for my study. In the end, the data is not adequate (i.e., the participants' participations were declining over the course of five weeks. Only 10 out of 39 participants had the interventions for five weeks).
Some said that I can still run a statistical analysis with some adjustment on p-value and specific analysis (but I think this might not be something that journal editor would expect). Some other said that I could pivot the study to a more qualitative study with additional work of interview.
Any suggestions based on your experience? That would be much appreciated. Thanks
Can anyone please explain how to analyse data collected using brief cope 28 item scale?
Like i won't get the total score rather than score for each subscale and then it is mentioned in a research to use normative data of a heart failure study for calculating percentile ranks. Can anyone please help as I dont get the data analysis part after collection using this scale?
Dear Research Gate community,
I’m conducting a study on how different management practices in wetland (ponds) affects the diversity and abundance of species in the wetlands. The different ponds are next to each other. Some ponds are control without any management practices. The water in the treatment ponds is regularly drawn down and filled up again with water from the river. We recorded the waterbird species number and abundance of each pond regularly (record the bird data of all the ponds at the same time for each survey).
- In the first year, we conducted a baseline study in which no treatment was done for all the ponds (data were collected monthly).
- In the second year, we conducted the treatment (operational study), and data of birds were collected weekly.
We’re now trying to study:
1) first, if there is any difference between the treatment ponds and control ponds during the operation
2) if there is any difference between baseline study and operational study of the same pond.
We wonder what kind of statistics are suitable for statistically analysing our data.
Some problems we are encountering is:
1. The data do look like normally distributed. The data collected are time series data, there are natural seasonal variation in the number of waterbirds in our region (a lot of migratory birds in fall and winter). How to take into account of influence of the time of survey.
2. The sample frequency for baseline year (12 times) and operational year is different (52 times), how to compare the difference between baseline and operational year.
Highly appreciate any help or suggestion!
Can anyone recommend a usable resource/tool for species detection from acoustic data please? A middle ground between a phone app and something like Arbimon would be about the level. Briefly, a phone app lacks flexibility in data collection, i.e. it cant be left out all night or left running for long periods. However, Arbimon is not that useful to anyone below ecologist level, as it only tells the user what species is present if the user completes their own validation, i.e. the user has to identify all species themselves. I'm looking for something that can analyse data from an AudioMoth, uploaded by citizen scientist participants, and actually identify species.
How do I justify scientifically a small sample size for data collection for quantitative data collection to validate my findings for the low response?
It may help the researcher to drop or add the necessary questions before going to the main data collection work. It is a preliminary feasibility research process that will help the researcher to make his research more soulful. It will help to adjust the research question and research design and to prevent the unnecessary cost and time expenditures in the main data collection.
I am a coordinator of an international research project investigating the impact of remote and hybrid work on employees' wellbeing, retention, and knowledge transfer. Currently, we want to proceed with data collection in Taiwan, Hong Kong, and Singapore (we have prepared the survey in Chinese traditional). However, this is rather complicated for my team (even partners from HK and Taiwan) to find relevant data collection sources (websites/companies/Universities) that could help in getting the right respondents. I posted a batch on Amazon MTurk, but gather only a few relevant answers.... Could you share any ideas or contacts with companies/websites helping in data collection in mentioned countries?
I plan to collect data for my research through two questionnaires. The first questionnaire is for collecting data regarding the two independent variables from managers, and the second questionnaire is for collecting data about the dependent variable from customers (Customers' responses). So, the question now is how to handle this in the statistical analysis, given that the number of questionnaires collected from customers will be greater than that will be collected from managers (about 350 managers and 500 customers)? Also, is it ok if we have inequivalent numbers of questionnaires from managers and customers, or is it a must to insert the same number of responses for each variable in the statistical program (such as SPSS) to statistically examine the relationship between them?
Thank you so much in advance.
I am conducting a study to understand the factors that will impact usage mobile health application. Your participation in this study is completely voluntary. All the information obtained will be kept strictly confidential and anonymous. The data obtained will be only for research purposes. Please select the most suitable response as applicable to you. https://forms.gle/x7kTK9mh6s1FPXgB7
Dear Scholars & Researchers,
How do I justify scientifically a small sample size for data collection for quantitative data collection to validate my findings for the low response?
Looking forward to hearing from you all.
To be sincere, I do not know if I can write this here, but I will try.
Some time ago, I have been working in order to understand and identify the drivers behind Land Concentration in Chile. Even though I have made progress, I am looking for a quantitative expert as a coauthor today.
The principal issue will elaborate an econometric model using the data collected.
If someone is interested just let me know.
All the best
For my graduation research, I am trying to create a composite score about household resilience out of data collected through a household survey. However, this data consists of ordinal variables (5-point Likert scale), binary variables (yes-no questions), and ratio variables (in proportions between 0-1).
My plan was to recode the data on the 5-point Likert scale into scores from 0-1 and do the same for the yes-no questions, with yes = 1 and no = 0 (since answering yes would mean a household is more resilient and thus have a higher score). However, this seems very off.
At this moment I am aware that it wasn't the best idea to create a survey with both types of questions, but I am unable to recollect the data.
Therefore my **question** is as follows: do you have any tips on how to create a composite score composed of both interval and binary variables?
Thank you in advance and have a lovely day.
I have data from a question asked to people pre- and post-treatment and answered on a 10 point Likert scale. I need to analyse the change in response. Searching for an answer has proven much more difficult than I expected, as there seem to be a lot of different tests used in subtly different circumstances. Can anybody give me some guidance on what the best test to use is in this situation?
I have a model with the following parameters:
- Groups: factor - 4 levels (base level = control group)
- Time: numerical
- Label = factor - 3 levels (base level = control group)
- price = numerical (5 different values; from a Likert scale)
The problem is that the base level of variable 'Groups' is perfectly collinear with the variable Time because, in the control condition, no values for Time were collected. That means that for Groups = 'control group', Time is always '0'.
This introduces singularities or perfect collinearity in my regression model, meaning I cannot interpret it correctly.
Do you have any suggestions for helping me out? Recollecting data is not an option, unfortunately. It was too costly and there is too many time constraints.
Actually, I have one independent variable and a dependent variable, I am using one latent variable which is a mediator and through that I wanted to know the effect on dependent variable. The problem is I am not aware if data collected for latent variable is through semi structured questionnaire and not on likert scale will make the possibility for Structure Equation Model? Kindly explain and help. I am rookie researcher and new to multivariate analysis and other methods.
this my data collection form any one you have opinion about added or remove some things and thank you for all
The costs and time commitments associated with data collection and labeling might be prohibitive. A huge dataset is insufficient since the success of deep learning models is strongly dependent on the quality of training data. Cost, time, and the use of appropriate training data are all challenges. Biases, incorrect labels, and omitted values are some of the difficulties that impair the quality of deep learning training datasets.
One dependent variable (continuous) ~ two continuous and two categorical (nominal) independent variables
I'm seeking for the best method for predicting a data collection with more than 100 sites. The distribution of all continuous variables is not normally distributed.
I received this from our ethics committee "Declare potential COI of the principal investigator to the research participants," I'm a bit confused of what I should put. My research is all about Coping Strategies of Primigravida Teenage Childbearing Mothers in Metro Manila, and the purpose is to determine the coping strategies of the respondents and correlate it with their demographic profile.
regarding my project, I am wondering if it would possible to you to help me find the method to calculate sample size and data collection strategy in a qualitative study? also, questions to include and a timeline for evaluation.
Data is collected from 2 Sources:
2 Source B
For Source A, Data on the topic~subject is collected over a very long period of time on an individual.For Source B,Data on the topic~subject is collected over a relatively brief period on Millions of people.
1.What is or shall be the Principles for Quantitative Determinations of the Reliabilty of Data collected from Source A and Source B?
2.How does one assess and evaluate the Authenticity of the Principles in 1 ?
Hi. Everyone. I have a problem with WordSmith tools on doing the data collection of 4-word Lexical Bundles that I only see the number over there but cannot check the data source like I use AntConc to do 4-word bundles that can be shown the LBs types. So how can I check the data source of 4-letter words? For example, Text 1 has 135 LBs was calculated and where can I see the whole LBs types? I spent a whole day finding out, but nothing can help at all. I am so confused about this tool now. If anyone can help me, I will appreciate it.
Self-reporting bias is a challenge in case of GPS data collection, in studies where participants have to manually START and STOP recording their trip, unlike studies where GPS data is passively collected (continuously in the background) without the need for user intervention.
I specifically want studies which have mentioned the existence of self-reported bias in the context of GPS data collection.
in my SEM model, i managed to have the attached value with low loadings but good model fit. Could any body provide any insights on if this results are valid for publication or still need improvement.
PS: I really struggeled in DATA collection & the model is only for testing can't be edited.
Sample size :66
Dear scholars, I have a technical question that needs clarification. As employees and customers are the most participants when it comes to the data collection from the questionnaires related to organizations or companies studies. Therefore, there may be some reasons or justifications for the choice to include both employees and customers in the same sample without employing multi-groups analysis. Besides, I am not sure that employees and customers points of views are homogenous and that both can be considered as part of the same population. So, my question is what are the reasons that justify that the employees and customers sample size (i.e. data collected from the questionnaires) can be combined to form the same population of the study? In the other words, from which extend, employees and customers can be homogenous to form the same population of the study?
I have a three-group experiment design with two data collection times (baseline and retention). Some data violate the assumptions of normality and homogeneity of variance. But for the data that meets assumptions, I am using repeated measures ANOVA to show three points:
First, all subjects are coming from the same population (using the normality test of Shapiro-Wilk [also skewness value] and homogeneity of variance of Levene's test). Second, the impact of treatment (between-subjects), and third, if there is a significant difference between baseline and retention sessions where the significance lies (within-subjects).
Besides normality and homogeneity of variance tests, I am not conducting any other tests (like unpaired t-tests) for the first point. I am conducting ANOVA for the second and third points. I believe this is correct but I would like to confirm this?
My main question here is; what are non-parametric tests that I need to conduct to show the same three points for the parametric tests?
Here is what I am planning and would appreciate any feedback or any suggestion on what to use.
I am planning to run a set of pairwise Mann-Whitney U tests to show that all three groups are coming from the same population. Use Friedman's test to show the impact of treatment. Then run another set of pairwise Mann-Whitney U tests to show where the differences are. Is this the right approach?
Thank you and appreciate your help!
Dear scientific community,
Based on reviewers suggests, i am considering inter-rater analysis, therefore measuring 2nd time, for my study. However, there are some aspects are not clear;
1) For the second measurements; should i measure the variables of whole subjects or a part of them would be sufficient (i have seen in some articles 20% of the variables were remeasured and analyzed) ?
2) İf whole data should be remeasured instead of a part of them, should the results of 2nd. measurement included to 1st measurements' results by averaging the results of two measurements which also means i have to rewrite results and discussion section?
3) or remeasuring a part of the subjects variables and presenting the results of 1st measurements which already given in the article would be correct?
Currently my study is focussed on 'Understanding Customer Behaviour under Interest Rate Changes'.
I would like to know the list of types of financial data that are aggregated nationally. Total deposits is one such example.
Where can I find the data of 'Total deposits' and the 'other financial data' that are aggregated nationally.
Please suggest the official websites for gathering the above data (not kaggle, github etc).
We have survey 15 teachers and their students (6 - 14 students each) to find a correlation between teacher well-being and students' academic results. We are also looking at student-reported teacher relationship as a potential mediator.
Unfortunately, we have had to stop data collection due to COVID19.
We are considering using Spearman's think that it is most likely not sufficient. Is there a better alternative for this?
Should we also drop mediation analysis given the sample size obtained?
We plan to distribute supply chain surveys via personal networks. Do you have any experience to share? I think the point(s) of penetration is important. Credible people and credible websites support the data collection. I plan to use LinkedIn and Facebook.
There are different guidelines for what qualifies a researcher to be an author on publications. I've just seen a case on Twitter where the PhD student was puzzled that the supervisor was going to publish an article from his/her data collection without including him/her.
For PhD students: do you think this is fair? have you discussed it with your supervisor?
For PhD supervisors: What criteria do you use for establishing the authorship policy in your lab/research group?
I see this a huge gray zone in research, and clarity/transparency would help everybody :)
Hello, I'm an urban design graduate student and it's my first semester. How is research done? I need to write an example at the process of learning to identify a Problem, Problem, Method. Research Proposal: Tension in green urbanism (or strategies): Displacement communities in large-scale projects (İzmir - GreenUp?)
Research Problem: Displacement of community in the implementation of green strategies in cities / Socio-spatial flaws in green strategies against climate change
Research Questions: Are necessary preventive measures taken against displacement of communities for large-scale green strategies? Can the scope of green strategies be both environmentally beneficial and socially equitable?
What I am wondering is that the Greenup project for Turkey is considered new and not completed. We are expected to learn and do until the data collection and method determination part in order for it to be empirical research. I wonder how can I do it, your suggestions and comments are important to me, thank you!
- What is the difference between Landsat Collection 1, Collection 2 Level 1and Collection 2 Level 2?
- What are the specifications included in each type?
- Do these types include atmospheric correction?
- What is the most appropriate collection for Land Use landcover and ecological related analysis?
I would like to know, if single investigator (specialist) will assess different variables on x Ray.
Does he requires to do inter examiner reliability test with expertise before I start data collections , or he needs to do only the intra examiner reliability test for all variables? Because all variable will be assess by single investigator? On the other hand, the work will be done by single author.
I have species abundance (different sp.) data collected from 6 sites at two different seasons. I want to test if there is seasonal variation in the bundance of species by using the Wilcoxon signed-rank test (paired data). Can anyone help me?
If before, then 2 samples from the same population are needed, one to establisg reliability of the instrument and another one for the actual study. Need some clarity about the right approach.
If it has to be done before the actual data collection then how many participants should be an ideal number for just establishing the instrument's reliability..
For a validated instrument (designed in consultation with the subject matter expert), what if it is done after the actual data collection and the reliability coefficient (CronBach Alpha) falls below the acceptable range.. what should be done in such scenario..
I am currently working on my dissertation and would like input on the best platform to use for a study that includes data collection at two points in time and is not cost prohibitive. What platform would you recommend?
We intend to write a review paper with data from various studies around the world, but the data collection in the various studies was done using different methods and intensities. So, in this case, how should the data set be standardized in order to run some analysis?
I'm currently looking at the rheological properties of the polymer Xanthan Gum. focusing on its dynamic viscosity to be more specific. I'm assessing the effects of pH (ranging from 3.6 to 5.6, 0.4 increment, total of 6 pH's) on the dynamic viscosity of xanthan gum solution (dissolving xanthan gum powder into acetic buffer with equal ionic strength, concentration is kept at 0.04%).
Firstly, my viscosity data collected shows that, as pH increases from 3.6 to 4.0 then 4.4, the viscosity increases; but as I bring up the pH from 4.4 to 4.8, 4.8 to 5.2, then lastly 5.2 to 5.6, the increasing viscosity trend plateaus and the increase in viscosity is less significant compared to the 3.6-4.4 jump. At this range, does pH has an effect on the viscosity of xanthan gum based on its molecular configuration? Though some sources states that xanthan gum's viscosity remains stable and unchanged within the range of pH 3-12 at a high concentration like 1% not 0.04%, yet some suggest pH still plays an effect, though I'm not sure how on the chemical and molecular aspect.
A possible conjecture I can think of is the xanthan gum's order-disorder and helix-coil transition is affected by protonation. In figure 2, it demonstrates how electrolytes affect the structure of the polymer; in figure 3, it shows how at a state of a helical rod and no longer a random coil, it is capable to hydrogen bonds among each other. Hence, I'm wondering of pH plays an effect on it's structural transition, such that the increased intermolecular forces at the form of a helical rod would make it more viscous in solution.
Here are the resources I have used so far:
Brunchi, CE., Bercea, M., Morariu, S. et al. Some properties of xanthan gum in aqueous solutions: effect of temperature and pH. J Polym Res 23, 123 (2016). https://doi.org/10.1007/s10965-016-1015-4
Recently, an acquaintance of mine told me about a friend of hers who earns money by responding to digital surveys sent to her by a data-collection company such as Prolific or Amazon Mechanical Turk.
Apparently, when answering surveys the woman's friend does not concentrate on the responses she provides (often watching TV simultaneously) but primarily ensures she answers a sufficient number of questions in a way that appears valid so that she can claim payment for having submitted her responses.
I therefore wonder about the extent to which data collected that way is valid. Can anyone provide insights, or experiences, concerning this, please?
Many young researchers are faced with uncertainty when they go from the step of developing and planning their methodology to acutally doing the data collection. Even professors often cannot predict well, how long data collection takes.
The data collection phase therefore can substantially impact the total duration of a research project.
I believe it should become a standard to record the time needed for field work and lab work and publish these figures along with the actual research.
This would enable other researchers to plan their campaigns better and it would have the side effect of making the effort and time going into research much more visible.
It would also make planning recurring data collection and inventories easier to plan and budget, which would in the long run contribute to more realistic expectations of the actual work volume and effort that researchers will be faced with.
It can also help prevent going on unrealistic or overambitious data collection campaigns and make it easier to select appropriate measurement and field methods.
In terms of recurring inventories, it would result in opportunities for contractors to better estimate working and setup times for making realistic offers and simultaneously helps contracting bodies to give contracts to those bidders who plan budget realistically.
What do you think?
How could such a standard be implemented?
I am embarking on a research project and want help with a data collection instrument on the emotional and academic performance of visually impaired students who are the target population.
Some literature clarify that they used quarterly data from the source of world-bank is that available? or they transform the annual data? how this transformation done?
Data collected using Likert Scale and collecting insights from Experts, entrepreneurs and customers
The popularity and applications of the metaverse are expected to skyrocket in the near future, thanks to the entry of major players (Facebook, Microsoft, EA, and even McDonalds) in the industry. As ab opinion piece in The Drum puts it "many experts predict that the metaverse will change the way we live and work in the future, and it would be extremely negligent not to recognize the trend and continue to engage with it". Could someone suggest some preliminary studies that could be carried out using consumer data collected through online surveys? Thanks in advance.
I have collected data at 3 different time periods. For the 1st time period, I have analysed using SEM (SPSS + AMOS). Subsequently, I have collected data for 3 different time periods (say each year). What type of statistical analysis can be done for drawing inference ?
Hi, I am a PhD student working on my thesis about game platforms and human behavior. I am having a hard time at the data collection stage and would like to ask for some suggestion and advise. I started distributing my questionnaire on various online forums (e.g. Reddit) and social media (Facebook) dedicated in gaming but with low response rate. Is there any other forums or website that I can distribute my questionnaire online and reach more participants for data collection?
Hello everyone. I am a PhD student at the Czech University of Life Sciences, I am working on the development of data collection collars (tags) for different animals species (and, what is more critical - sizes). At this point, I am looking for colleagues, coauthors and collaborators who are working in the field of animal behaviour and automatic data collection (i.e. accelerometers, gyroscopes, GPS). To continue the development and appliance of animal behaviour data collection devices and analysing tools for that purpose. If you are interested in more details, you can contact me here or by email: email@example.com
I am working on a project related to online abuse. The structured questionnaire consists of about 50 questions and we would like to pretest the survey questionnaires before actual data collection. What would be the appropriate way to determine the sample number for pretesting?
Sometimes, we collect a huge amount of raw data during field visit, that we are unable to use in a research article immediately. But there are possibilities to use that data in future. So, I would like to know for how many years primary data collected from the field by the individual researcher will be valid. For example, can I use the data I had collected in 2015 through field survey, to write a paper in 2022? Will it be still valid?
For analysis can we take companies listed in nse, and two or three indices? Or should we just take listed companies from only one indices. Kindly tell. Thank you.