Science topic

Survey Methodology and Data Analysis - Science topic

Survey Methodology and Data Analysis is a some of the areas of interests: Constructing paper-based or on-line surveys Questionnaire design and question testing Types of data collection (e.g., interviews, etc) Data analysis *Special topic*: Stated Preference techniques (e.g., discrete choice experiments)
Questions related to Survey Methodology and Data Analysis
  • asked a question related to Survey Methodology and Data Analysis
Question
4 answers
I have been studying a particular set of issues in methodology, and looking to see how various texts have addressed this.  I have a number of sampling books, but only a few published since 2010, with the latest being Yves Tille, Sampling and Estimation from Finite Populations, 2020, Wiley. 
In my early days of survey sampling, William Cochran's Sampling Techniques, 3rd ed, 1977, Wiley, was popular. I would like to know which books are most popularly used today to teach survey sampling (sampling from finite populations).
I posted almost exactly the same message as above to the American Statistical Association's ASA Connect and received a few recommendations, notably Sampling: Design and Analysis,  Sharon Lohr, whose 3rd ed, 2022, is published by CRC Press.  Also, of note was Sampling Theory and Practice, Wu and Thompson, 2020, Springer.
Any other recommendations would also be appreciated. 
Thank you  -  Jim Knaub
Relevant answer
Answer
Here are some recommended ones: 1. "Sampling Techniques" by William G. Cochran This classic book covers a wide range of sampling methods with practical examples. It’s comprehensive and delves into both theory and application, making it valuable for students and professionals. 2. "Survey Sampling" by Leslie Kish'' This is another foundational text, known for its detailed treatment of survey sampling design and estimation methods. Kish's book is especially useful for those interested in practical survey applications. 3. "Model Assisted Survey Sampling" by Carl-Erik Särndal, Bengt Swensson, and Jan Wretman This book introduces model-assisted methods for survey sampling, which blend traditional design-based methods with model-based techniques. It's ideal for more advanced readers interested in complex survey designs. 4. "Sampling of Populations: Methods and Applications" by Paul S. Levy and Stanley Lemeshow This text is widely used in academia and provides thorough explanations of different sampling methods with a focus on real-world applications. It also includes case studies and practical exercises, making it helpful for hands-on learners. 5. "Introduction to Survey Sampling" by Graham Kalton This introductory book offers a concise and accessible overview of survey sampling methods. It’s well-suited for beginners who need a straightforward introduction to key concepts. 6. "Designing Surveys: A Guide to Decisions and Procedures" by Johnny Blair, Ronald F. Czaja, and Edward A. Blair This book focuses on the practical aspects of designing and conducting surveys, with particular emphasis on decision-making and procedural choices in the survey process.
  • asked a question related to Survey Methodology and Data Analysis
Question
4 answers
Suppose one has 40 or 50 survey questions for an exploratory analysis of a phenomenon, several of which are intended to be dependent variables, but most independent. A MLR is conducted with e.g. 15 IVs to explain the DV, and maybe half turn out to be significant. Now suppose an interesting IV warrants further investigation, and you think you have collected enough data to at least partially explain what makes this IV so important to the primary DV. Perhaps another, secondary model is in order... i.e. you'd like to turn a significant IV from the primary model into the DV in a new model.
Is there a name for this regression or model approach? It is not exactly nested, hierarchical, or multilevel (I think). The idea, again, is simply to explore what variables explain the presence of IV.a in Model 1, by building Model 2 with IV.a as the DV, and employing additional IVs that were not included in Model 1 to explain this new DV.
I am imagining this as a sort of post-hoc follow up to Model 1, which might sound silly, but this is an exploratory social science study, so some flexibility is warranted, imo.
Relevant answer
Answer
If you how coherent subsets of your variables (i.e., they all measure essentially the same thing), then you can create scales that are stronger measures than any of the variables take alone.
I have consolidated a set of references on this approach here:
  • asked a question related to Survey Methodology and Data Analysis
Question
6 answers
In 2007 I did an Internet search for others using cutoff sampling, and found a number of examples, noted at the first link below. However, it was not clear that many used regressor data to estimate model-based variance. Even if a cutoff sample has nearly complete 'coverage' for a given attribute, it is best to estimate the remainder and have some measure of accuracy. Coverage could change. (Some definitions are found at the second link.)
Please provide any examples of work in this area that may be of interest to researchers. 
Relevant answer
Answer
I would like to restart this question.
I have noted a few papers on cutoff or quasi-cutoff sampling other than the many I have written, but in general, I do not think those others have had much application. Further, it may be common to ignore the part of the finite population which is not covered, and to only consider the coverage, but I do not see that as satisfactory, so I would like to concentrate on those doing inference. I found one such paper by Guadarrama, Molina, and Tillé which I will mention later below.
Following is a tutorial i wrote on quasi-cutoff (multiple item survey) sampling with ratio modeling for inference, which can be highly useful for repeated official establishment surveys:
"Application of Efficient Sampling with Prediction for Skewed Data," JSM 2022: 
This is what I did for the US Energy Information Administration (EIA) where I led application of this methodology to various establishment surveys which still produce perhaps tens of thousands of aggregate inferences or more each year from monthly and/or weekly quasi-cutoff sample surveys. This also helped in data editing where data collected in the wrong units or provided to the EIA from the wrong files often showed early in the data processing. Various members of the energy data user community have eagerly consumed this information and analyzed it for many years. (You might find the addenda nonfiction short stories to be amusing.)
There is a section in the above paper on an article by Guadarrama, Molina, and Tillé(2020) in Survey Methodology, "Small area estimation methods under cut-off sampling," which might be of interest, where they found that regression modeling appears to perform better than calibration, looking at small domains, for cutoff sampling. Their article, which I recommend in general, is referenced and linked in my paper.
There are researchers looking into inference from nonprobability sampling cases which are not so well-behaved as what I did for the EIA, where multiple covariates may be needed for pseudo-weights, or for modeling, or both. (See Valliant, R.(2019)*.) But when many covariates are needed for modeling, I think the chances of a good result are greatly diminished. (For multiple regression, from an article I wrote, one might not see heteroscedasticity that should theoretically appear, which I attribute to the difficulty in forming a good predicted-y 'formula'. For psuedo-inclusion probabilities, if many covariates are needed, I suspect it may be hard to do this well either, but perhaps that may be more hopeful. However, in Brewer, K.R.W.(2013)**, he noted an early case where failure using what appears to be an early version of that helped convince people that probability sampling was a must.)
At any rate, there is research on inference from nonprobability sampling which would generally be far less accurate than what I led development for at the EIA.
So, the US Energy Information Administration makes a great deal of use of quasi-cutoff sampling with prediction, and I believe other agencies could make good use of this too, but in all my many years of experience and study/exploration, I have not seen much evidence of such applications elsewhere. If you do, please respond to this discussion.
Thank you - Jim Knaub
..........
*Valliant, R.(2019), "Comparing Alternatives for Estimation from Nonprobability Samples," Journal of Survey Statistics and Methodology, Volume 8, Issue 2, April 2020, Pages 231–263, preprint at 
**Brewer, K.R.W.(2013), "Three controversies in the history of survey sampling," Survey Methodology, Dec 2013 -  Ken Brewer - Waksberg Award article: 
  • asked a question related to Survey Methodology and Data Analysis
Question
12 answers
More exactly, do you know of a case where there are repeated, continuous data, sample surveys, perhaps monthly, and an occasional census survey on the same data items, perhaps annually, likely used to produce Official Statistics?   These would likely be establishment surveys, perhaps of volumes of products produced by those establishments. 
I have applied a method which is useful under such circumstances, and I would like to know of other places where this method might also be applied.   Thank you. 
Relevant answer
Answer
This is for the crushed stone industry in the US:
I'm told these quarterly surveys are for "a select set of companies" which reminds me of how the quasi-cutoff sample of electric sales in the US got started. The electric sales survey of a select group of entities was later modified and used as a sample, first a stratified random sample with a large company censused stratum, and then only the censused stratum as a quasi-cutoff sample, all after starting an annual census of all electric sales by economic sector (residential, etc.), from the production/supply side. If one wanted to monitor the crushed stone industry the same way, I would suggest this approach using a quasi-cutoff sample with a ratio model for prediction, as is done at the US Energy Information Administration (EIA).
Does anyone know of other surveys, each of a select group of larger establishments being followed, where there is a chance to instead have an occasional census to be used for regressor data for the same data items in a more frequent sample?
  • asked a question related to Survey Methodology and Data Analysis
Question
3 answers
I am currently studying cross-sectional research design. I have found that these studies are often associated with surveys and structured interviews but can also include other methods such as structured observation, content analysis, official statistics, and diaries (Bryman, 2016). I wonder if the focus group technique can be used in a cross-sectional research design and in what situations it could be classified as such.
Could you help me with literature or examples to resolve this question?
Relevant answer
Answer
Any data collection that occurs at one point in time is cross-sectional, so that could definitely include focus groups. In practice, nearly all focus group research is cross-sectional, because repeated waves of focus groups are quite rare.
  • asked a question related to Survey Methodology and Data Analysis
Question
3 answers
A number of people have asked on ResearchGate about acceptable response rates and others have asked about using nonprobability sampling, perhaps without knowing that these issues are highly related.  Some ask how many more observations should be requested over the sample size they think they need, implicitly assuming that every observation is at random, with no selection bias, one case easily substituting for another.   
This is also related to two different ways of 'approaching' inference: (1) the probability-of-selection-based/design-based approach, and (2) the model-based/prediction-based approach, where "prediction" means estimation for a random variable, not forecasting. 
Many may not have heard much about the model-based approach.  For that, I suggest the following reference:
Royall(1992), "The model based (prediction) approach to finite population sampling theory." (A reference list is found below, at the end.) 
Most people may have heard of random sampling, and especially simple random sampling where selection probabilities are all the same, but many may not be familiar with the fact that all estimation and accuracy assessments would then be based on the probabilities of selection being known and consistently applied.  You can't take just any sample and treat it as if it were a probability sample.  Nonresponse is therefore more than a problem of replacing missing data with some other data without attention to "representativeness."  Missing data may be replaced by imputation, or by weighting or reweighting the sample data to completely account for the population, but results may be degraded too much if this is not applied with caution.  Imputation may be accomplished various ways, such as trying to match characteristics of importance between the nonrespondent and a new respondent (a method which I believe has been used by the US Bureau of the Census), or, my favorite, by regression, a method that easily lends itself to variance estimation, though variance in probability sampling is technically different.  Weighting can be adjusted by grouping or regrouping members of the population, or just recalculation with a changed number, but grouping needs to be done carefully. 
Recently work has been done which uses covariates for either modeling or for forming pseudo-weights for quasi-random sampling, to deal with nonprobability sampling.  For reference, see Elliott and Valliant(2017), "Inference for Nonprobability Samples," and Valliant(2019), "Comparing Alternatives for Estimation from Nonprobability Samples."  
Thus, methods used for handling nonresponse, and methods used to deal with nonprobability samples are basically the same.  Missing data are either imputed, possibly using regression, which is basically also the model-based approach to sampling, working to use an appropriate model for each situation, with TSE (total survey error) in mind, or weighting is done, which attempts to cover the population with appropriate representation, which is mostly a design-based approach. 
If I am using it properly, the proverb "Everything old is new again," seems to fit here if you note that in Brewer(2014), "Three controversies in the history of survey sampling," Ken Brewer showed that we have been all these routes before, leading him to have believed in a combined approach.  If Ken were alive and active today, I suspect that he might see things going a little differently than he may have hoped in that the probability-of-selection-based aspect is not maintaining as much traction as I think he would have liked.  This, even though he first introduced 'modern' survey statistics to the model-based approach in a paper in 1963.  Today it appears that there are many cases where probability sampling may not be practical/feasible.  On the bright side, I have to say that I do not find it a particularly strong argument that your sample would give you the 'right' answer if you did it infinitely many times when you are doing it once, assuming no measurement error of any kind, and no bias of any kind, so relative standard error estimates there are of great interest, just as relative standard error estimates are important when using a prediction-based approach, and the estimated variance is the estimated variance of the prediction error associated with a predicted total, with model misspecification as a concern.  In a probability sample, if you miss an important stratum of the population when doing say a simple random sample because you don't know the population well, you could greatly over- or underestimate a mean or total.  If you have predictor data on the population, you will know the population better.  (Thus, some combine the two approaches: see Brewer(2002) and Särndal, Swensson, and Wretman(1992).) 
..........         
So, does anyone have other thoughts on this and/or examples to share for this discussion: Comparison of Nonresponse in Probability Sampling with Nonprobability Sampling?    
..........         
Thank you.
References:
Brewer, K.R.W.(2002), Combined Survey Sampling Inference: Weighing Basu's Elephants, Arnold: London and Oxford University Press
Brewer, K.R.W.(2014), "Three controversies in the history of survey sampling," Survey Methodology, Dec 2013 -  Ken Brewer -   Waksberg Award: 
Elliott, M.R., and Valliant, R.(2017), "Inference for Nonprobability Samples," Statistical Science, 32(2):249-264,
Royall, R.M.(1992), "The model based (prediction) approach to finite population sampling theory," Institute of Mathematical Statistics Lecture Notes - Monograph Series, Volume 17, pp. 225-240.   Information is found at
The paper is available under Project Euclid, open access: 
Särndal, C.-E., Swensson, B., and Wretman, J.(1992), Model Assisted Survey Sampling, Springer-Verlang
Valliant, R.(2019), "Comparing Alternatives for Estimation from Nonprobability Samples," Journal of Survey Statistics and Methodology, Volume 8, Issue 2, April 2020, Pages 231–263, preprint at 
Relevant answer
Answer
This is a very interesting perspective, James R Knaub , and one that you could well share on Frank Harrell's Datamethods discussion forum : https://discourse.datamethods.org
Other than that, I'm going to have a look at those references over a largeish pot of coffee before I say anything stupid (stupid plus references allows you to cover your retreat better!)
r
  • asked a question related to Survey Methodology and Data Analysis
Question
4 answers
At the US Energy Information Administration (EIA), for various establishment surveys, Official Statistics have been generated using model-based ratio estimation, particularly the model-based classical ratio estimator.  Other uses of ratios have been considered at the EIA and elsewhere as well.  Please see
At the bottom of page 19 there it says "... on page 104 of Brewer(2002) [Ken Brewer's book on combining design-based and model-based inferences, published under Arnold], he states that 'The classical ratio estimator … is a very simple case of a cosmetically calibrated estimator.'" 
Here I would like to hear of any and all uses made of design-based or model-based ratio or regression estimation, including calibration, for any sample surveys, but especially establishment surveys used for official statistics. 
Examples of the use of design-based methods, model-based methods, and model-assisted design-based methods are all invited. (How much actual use is the GREG getting, for example?)  This is just to see what applications are being made.  It may be a good repository of such information for future reference.
Thank you.  -  Cheers. 
Relevant answer
Answer
In Canada they have a Monthly Miller’s Survey, and an Annual Miller’s Survey.  This would be a potential application, if used as I describe in a paper linked below. As in the case of a survey at the US Energy Information Administration for electric generation, fuel consumption and stocks for electric power plants, they collect data from the largest establishments monthly, and from the smallest ones just annually.  After the end of the year, for a given data item, say volume milled for a type of wheat, they could add the twelve monthly values for each given establishment, and with the annual data collected, there is then an annual census.  To predict totals each month, the previous annual census could be used for predictor data, and the new monthly data would be used for quasi-cutoff sample data, for each data item, and with a ratio model, one may predict totals each month for each data item, along with estimated relative standard errors.  Various techniques might apply, such as borrowing strength for small area predictions, adjustment of the coefficient of heteroscedasticity, and multiple regression when production shifts from say, one type of grain to another, as noted in the paper. 
Here are the mill surveys: 
Canadian Mill surveys: 
Monthly Miller’s Survey: 
Annual Miller’s Survey: 
This survey information is found on page 25 of the paper below, as of this date.  There will likely be some revisions to this paper.  This was presented as a poster paper at the 2022 Joint Statistical Meetings (JSM), on August 10, 2022, in Washington DC, USA.  Below are the poster and paper URLs. 
Poster:
The paper is found at
.........................     
If you can think of any other applications, or potential applications, please respond. 
Thank you. 
  • asked a question related to Survey Methodology and Data Analysis
Question
8 answers
I am analyzing a Nationally representative survey and I wonder if I recode the categorical variables like gender or education, it would mess the weights!
each row of the data has a weight, strata, and PSU. does recoding the categorical variables impact the results of my regression analysis?
Relevant answer
Answer
for descriptive narration recoding of categorical data not recommended. But for testing significance or association some times merging of categories is advised to make tests applicable and satify assumption of the tests.
  • asked a question related to Survey Methodology and Data Analysis
Question
9 answers
I am looking for online survey tools with the ability to upload a photo file and send your exact, present location (in terms of longitude and latitude).
The tool should work perfectly on mobile devices, and preferably be free of charge.
The online survey will be completed on mobiles while being outdoor, so sending location need to be easy and user friendly. I was considering just using LimeSurvey and ask participants to copy - paste their location url from Google Maps app, but it is inconvenient and inaccurate.
Thank you!
Relevant answer
Answer
You can use KoboToolbox. The platform is free and easy to use, suitable for mobile devices and enables to upload photos, GPS locations, etc.
  • asked a question related to Survey Methodology and Data Analysis
Question
11 answers
How can I validate a questionnaire for a small sample of hospitals' senior executive managers?
Hello everyone
-I performed a systematic review for the strategic KPIs that are most used and important worldwide.
-Then, I developed a questionnaire in which I asked the senior managers at 15 hospitals to rate these items based on their importance and their performance at that hospital on a scale of 0-10 (Quantitative data).
-The sample size is 30 because the population is small (however, it is an important one to my research).
-How can I perform construct validation for the items which are 46 items, especially that EFA and CFA will not be suitable for such a small sample.
-These 45 items can be classified into 6 components based on literature (such as the financial, the managerial, the customer, etc..)
-Bootstrapping in validation was not recommended.
-I found a good article with a close idea but they only performed face and content validity:
Ravaghi H, Heidarpour P, Mohseni M, Rafiei S. Senior managers’ viewpoints toward challenges of implementing clinical governance: a national study in Iran. International Journal of Health Policy and Management 2013; 1: 295–299.
-Do you recommend using EFA for each component separately which will contain around 5- 9 items to consider each as a separate scale and to define its sub-components (i tried this option and it gave good results and sample adequacy), but am not sure if this is acceptable to do. If you can think of other options I will be thankful if you can enlighten me.
Relevant answer
Answer
Cronbach's alpha are different between factors dep and indep
teste de reliability
  • asked a question related to Survey Methodology and Data Analysis
Question
8 answers
How can i validate a questionnaire for hospitals' senior managers?
Hello everyone
-I performed a systematic review for the strategic KPIs that are most used and important worldwide.
-Then, I developed a questionnaire in which I asked the senior managers at 15 hospitals to rate these items based on their importance and their performance at that hospital on a scale of 0-10 (Quantitative data).
-The sample size is 30 because the population is small (however, it is an important one to my research).
-How can I perform construct validation for the items which are 46 items, especially that EFA and CFA will not be suitable for such a small sample.
-These 45 items can be classified into 6 components based on literature (such as the financial, the managerial, the customer, etc..)
-Bootstrapping in validation was not recommended.
-I found a good article with a close idea but they only performed face and content validity:
Ravaghi H, Heidarpour P, Mohseni M, Rafiei S. Senior managers’ viewpoints toward challenges of implementing clinical governance: a national study in Iran. International Journal of Health Policy and Management 2013; 1: 295–299.
-Do you recommend using EFA for each component separately which will contain around 5- 9 items to consider each as a separate scale and to define its sub-components (i tried this option and it gave good results and sample adequacy), but am not sure if this is acceptable to do. If you can think of other options I will be thankful if you can enlighten me.
Relevant answer
Answer
After the survey is completed..but it is better to increase the number of studied samples..so that the result will be bette
  • asked a question related to Survey Methodology and Data Analysis
Question
6 answers
Hi, I am an undergraduate currently working on a project that is using a quantitative survey.
I have developed 3 scenarios that have the same 5 Likert scale questions across these scenarios. Also, the questions are split into confidence and experience, as they are asking respondents to self-rate themselves on confidence and experience on the skills specified in the questions.
My question is, how should I analyse the Likert scale responses across all 3 scenarios? Can I sum them up then divide them to get the mean value of each response to each question? I cant seem to find similar papers like my situation.
I have found Cronbach alpha to be >0.7 across all the questions and there is significant positive correlation between confidence and experience across all 3 scenarios. Are these valid reasons enough to add up arithmetically the responses across the 3 scenarios? I can't find any research to say when I am able to add the responses up.
Please help as I am quite lost. Please cite sources in your statement so I can read up further too.
Relevant answer
The Likert's Scale means are the main criterion for measurement and comparison, between the three scenarios (groups)
  • asked a question related to Survey Methodology and Data Analysis
Question
4 answers
Hi all,
6 of the items in the 5-point Likert-type scale are positive and 6 are negative. When I turn negatives into positives, the meaning changes a bit.
E.g:
In a municipality where women are a minority, women feel excluded.
Is it appropriate to take scale averages and apply t-test and anova without reversing these items? Will there be a problem if I do this?
Relevant answer
Answer
You need to recode either the negative or the positive items before computing scale averages. Otherwise the average will make no sense.
  • asked a question related to Survey Methodology and Data Analysis
Question
9 answers
I am a Master's Student who specialized in development economics (especially rural development), now I am eager to publish my thesis in an academic journal. The topic is about determinants of vulnerability and roles of livelihood assets, so at this time, I would like to ask what kind of journal (paper) is more suitable? if possible, I would like to publish a high rank in terms of impact factors. thank you in advance.
Relevant answer
  • asked a question related to Survey Methodology and Data Analysis
Question
7 answers
Hello!
So, here is the story. I was give this Likert scale data for analysis, and I just can't get it how I should deal with it. It is a 1-7 scale with answers ranging from 1 being "extremely worse" to 7 being "extremely better". But here is the problem, 4 is "same as was before" and questions introduce the changes as an effect of a different variable, which is work from home (for example, "Compared to work from office, how much has your ability to plan your work so that it was done on time changed when working at home?").
Questions are separated into some groups to form variables, and mean should probably show each person's opinion on the change, right? But it just seems too strange to me to work with just 1 parameters and not go through full comparison of now vs before as 2 different constructs.
If you have any works or insight on the topic, can you please help me?
All the best and take care!
Relevant answer
Answer
I agree with David L Morgan, Likert-scored items are ordinal. Don't worry about 4 (same as was before), it is a neutral option in Likert scale. Based on data distribution, you can use different statistical tests.
  • asked a question related to Survey Methodology and Data Analysis
Question
4 answers
I am trying to perform the cell-weighting procedure on SPSS, but I am not familiar with how this is done. I understand cell-weighting in theory but I need to apply it through SPSS. Assume that I have the actual population distributions.
Relevant answer
Answer
I might be misunderstanding your question, or his answer, but in my reading of what you are trying to do, I think the approach suggested by David Morse is missing a final step.
I'll assume, as David did, that the population, with N=3200, consists of 500 cases (or 15.625%) in subgroup A, 700 (21.825%) in B, and 2000 (62.5%) in C. I'll assume, further, that you have a sample, with n=80, that includes 10 cases (or 12.5%) in A, 20 (25%) in B, and 50 (62.5%) in C. If so, then the cell weights for your sample should be (15.625/12.5 = 1.25) for cell A, (21.825/25 = 0.875) for cell B, and (62.5/62.5 = 1.0) for cell C. That will keep your weighted sample size at 80, the same as your unweighted sample size, but will make the proportion of cases in A, B, and C in your weighted sample equal to the population proportions.
Forming the weights as ratios of the % in the population divided by the % in the sample will inflate the under-represented cells and deflate the over-represented cells in your sample by exactly the right amount.
If, instead, you also want to make the total number of cases in your sample equal to the total population size, then each of the three initial weights (1.25, 0.875, and 1.0) should be multiplied by (3200/80 = 40), yielding three new weights (50, 35, and 40).
Multiplying by the ratio of population size divided by sample size inflates all of your initially weighted sample counts by exactly the right amount to equal the population count.
  • asked a question related to Survey Methodology and Data Analysis
Question
2 answers
I've a:
  • "Retrospective panel survey": In each year all units are asked "who (X) first told you about us (in the year you first learned about us)?"
  • There is lots of attrition from the panel, which may vary by X, as well as new people entering in each year
My question of interest: How is X, the 'how did you first learn about us' thing changing across time? I.e., is the 'point of first contact' (referrer) changing from year to year?
Possible approaches
A. Single-retrospect:
If I use only the most recent (2020) retrospective data this may lead to a bias from differential attrition related to X (as well as issues of imperfect recall).
If people who 'heard about us through Spaghetti Monster' have dropped out at twice the average rate, and Spaghetti Monster was the referrer for 1/2 of those who learned about us in 2015. ...we will falsely report that "only 1/4 of people who heard about us in 2015 heard about us through SM".
B. Recent-retrospect for each year
I could look instead at the 2016 recall data only for 2015, 2017 data for 2016 etc., as there will be less attrition between the shorter time intervals.
But this has its own problem: the share who respond to the survey from coming from each referrer fluctuates from year to year. Suppose in 2020 there is a particularly low SM response rate vs 2019 ... we would falsely claim that SM-referrals fell dramatically in 2020 relative to 2019. This should not be a problem for the single retrospect
Vaguely remember that I've seen papers dealing with similar issues but I can't recall. Before I try to reinvent the wheel, any suggestions? Thanks!
Relevant answer
Answer
Thanks José-Ignacio Antón -- this is putting me on the right track! I don't think this paper (and those cited within) make use of recall data (people's reports, each year, about previous years) and it doesn't seem to involve new people entering the panel.
But still, I think this helps me focus on the right literature.
> A common procedure for estimating the effects of attrition on the measurement of income mobility consists of dividing the sample according to the individuals remaining in successive waves and comparing the results estimated for the resulting sub-samples.
-- this seems important
  • asked a question related to Survey Methodology and Data Analysis
Question
2 answers
I'm a bit new to these aspects of survey design and analysis. What should I read and what are some approaches to the following situation and question?
Suppose:
  • We've a population-of-interest based on an affiliation, certain actions, or a set of ideas; (e.g., 'vegetarians' or 'tea-party conservatives)... call it the "Movement"
  • There has never been a national representative survey nor a complete enumeration of this group. There is no 'gold standard'
  • For several years we've advertised a survey (with a donation reward) in several outlets (web pages, forums, listserves which we call 'referrers') associated with the 'movement'
  • We can track responses from each referrer. We suspect some referrers are more broadly representative of the movement as a whole than others, but of course there is no gold standard.
This is essentially a 'convenience sample', perhaps more specifically a 'river sample' (using the notation of Baker et al, 2013) or 'opt-in web-based sample'. It is probably non-representative because of
  • Exclusion/coverage bias: Some members of the movement will not be aware of the survey (they don't visit any of the outlets or they don't notice it)
  • Participation/non-response bias: Among those aware (through visiting the 'referrers') only a smallish share complete the survey (and these likely tend to be the more motivated and time rich individuals). Some outlets/referrers may also promote the survey more prominently than others.
We wish to measure:
  • The (changing) demographics (and size) of the movement
  • Measures of the demographics, beliefs, behavior, and attitudes of people in the movement (and how these have changed from year to year)
Our methodological questions
Analysis: Are there any approaches that would be better than 'reporting the unweighted raw results' (e.g., weighting, cross-validating something or other) to using this "convenience/river' sample to either:
i. Getting results (either levels or changes) likely to be more 'representative of the movement as a whole' then our unweighted raw measures of the responses in each year?
ii. Getting measures of the extent to which our reports are likely to be biased ... perhaps bounds on this bias.
Survey design: In designing future years' surveys, is there a better approach?
Brainstorming some responses...
Analysis
  • E.g., as we can separately measure demographics (as well as stated beliefs/attitudes) for respondents from each referrer, we could consider testing the sensitivity of the results to how we weight responses from each referrer.
  • Or we might consider using the demographics derived from some weighted estimate of surveys in all previous years to re-weight the survey data in the present year to be "more representative."
  • As noted we subjectively think that some referrers are more representative than others, sSo maybe we can do something with this using Bayesian tools
  • We may have some measures of the demographics of participants on some of the referrers, which might be used to consider weighting to deal with differential non-response
Survey design
  • Would 'probability sampling' within each outlet (randomly choosing a small share within each to actively recruit/incentivize, perhaps stratifying within each outlet if the outlet itself provides us demographics) somehow be likely to lead to a more representative sample?
It's not immediately obvious to me why this would improve things. The non-response within probability samples would seem to be an approximately equivalent problem to the limited participation rate in the convenience sample. The possible advantages I see would be:
i. We could offer somewhat-stronger incentives for the probability sample, and perhaps reduce this non-response/non-participation rate and consequent biases.
ii. If we can connect to an independent measure of participant demographics from the the outlets themselves this might allow us to get a better measure of the differential rates of non-participation by different demographics, and adjust for it.
Some references (what else should I read?)
Baker, R., Brick, J.M., Bates, N.A., Battaglia, M., Couper, M.P., Dever, J.A., Gile, K.J., Tourangeau, R., 2013. Summary report of the AAPOR task force on non-probability sampling. Journal of survey statistics and methodology 1, 90–143.
Salganik, M.J., Heckathorn, D.D., 2004. Sampling and estimation in hidden populations using respondent-driven sampling. Sociological methodology 34, 193–240.
Schwarcz, S., Spindler, H., Scheer, S., Valleroy, L., Lansky, A., 2007. Assessing Representativeness of Sampling Methods for Reaching Men Who Have Sex with Men: A Direct Comparison of Results Obtained from Convenience and Probability Samples. AIDS Behav 11, 596. https://doi.org/10.1007/s10461-007-9232-9
Relevant answer
Answer
For hard-to-reach populations you might find participant-guided-sampling useful. Explore
  • asked a question related to Survey Methodology and Data Analysis
Question
5 answers
Let's say that I'd like to compare male respondents' and female respondents' perceptions of government across 30 countries, and test if there is statistically significant gender differences in perceptions across countries. The data is collected from survey data conducted in 30 countries. Then, simple t-test results of each country sample shows that two group means are statistically different in all countries. But if I want to claim that my analysis shows that there exists statistically significant gender differences in any countries regardless of different values of country-level variables, including polity score, GDP, and gender equality index, what is my next step other than regression? I mean that I want to show there exists gender differences in any countries in my sample regardless of different values of the above country level variables. I don't need to do regression, but I just want to show group means differences across countries.
Relevant answer
Answer
If I understood you right, Meta-analysis is what you are looking for.
  • asked a question related to Survey Methodology and Data Analysis
Question
3 answers
I am doing survey data analysis using svyset command. I would like to see if collinearity exists between any independent variables used in the regression analysis, which is the post estimation command I should use? Does margin command works after svyset? Thanks in advance
Relevant answer
Answer
Rukman Manapurath, to read about -margins- after -svy- type in "help svy_postestimation##margins" in Stata. When it comes to VIF, you can regress each predictor on the rest of predictors (e.g., svy: regress y x1 x2 x3) and extract R-squared values for each equation. Substracting each R-squared value from 1 will give you the tolerance value. A tolerance value of 0.4 corresponds directly to a VIF value of 2.5 (i.e., 1/0.4). As such, tolerance values above 0.4 and VIF values less than 2.5 are signs of no serious multicollinearity among the items.
  • asked a question related to Survey Methodology and Data Analysis
Question
1 answer
PFA the word file of the output I received and command used.
Context:- Secondary data analysis of big survey data
Logistc regression for prediction of output (binomial)
Relevant answer
  • asked a question related to Survey Methodology and Data Analysis
Question
3 answers
Hi,
My question is is two parts:
I have collected data from a set of people completing a course. This data concerned their confidence in certain procedures. Data was collected pre course, immediately post course and 6 months following course completion.
Unfortunately 33 people completed pre course, 28 people completed post course, and 21 people completed 6 month surveys.
I am not sure if this makes the data 'non-matching' and therefore meaning I could not use the Friedman Test.
Hoping someone will be able to help me!
Thanks
Relevant answer
Answer
I would suggest you should rather use a repeated measure ANOVA if the participant, for whom the data is being collected, are the same which I highly suspect is true in your case.
These links might be helpful.
  • asked a question related to Survey Methodology and Data Analysis
Question
13 answers
I hope to conduct a series of interviews/questionnaire surveys to collect information regarding urban flood management and the use of software tools for the same.
Fundamentally, decision-makers, flood modellers, general public and software modellers/developers are in my expected audience.
Could you please suggest what personal information should be considered when weighing them?
My assumptions are as follow;
1. Decision Makers: The age, level of education, years of service, the level in the organization, no of participations/decision makings in actual flood management activities
2. Flood modellers: educational status (MSc/PhD etc), years of experience, no of participations/decision makings in actual flood management activities
3. Software developers: years of experience, no of contributions in actual flood management software development and the role he/she played
4. General Public: The Age, the level of flood-affected to the person, educational level, experience with floods
Relevant answer
Answer
I appreciate the request to comment, Rmm, but I don't think I know enough about your particular problem domain.
That's the thing about applying weights to survey respondents - making responses from one person, or a group of people, more important than those of another. You would do this if you have a legitimate reason to think that one group is severely under-represented in your sampling frame, or in your final sample. Or if you have a theoretical reason for giving greater value to the responses of some, and lesser value to others.
You need to have a theory, and/or good evidence, to support the use of weights in the first place and some ideas about how much those weights should apply.
Thinking about it some more, the purpose of your research is likely to be important too. If you're interested only in the value of real estate affected by floods then your weights may apply to the value associated with the people/organisations you survey. If you are interested in the effects on people's homes then you may minimise commercial real estate and apply weights based on the sizes of families.
  • asked a question related to Survey Methodology and Data Analysis
Question
3 answers
Does it occur in handling the data collected or in the way of collecting the data.
Relevant answer
Answer
A structured interview is a method of data collection, while a schedule is a data collection tool used in structured interviews.
  • asked a question related to Survey Methodology and Data Analysis
Question
5 answers
I'm working on an R&R and a reviewer is questioning our response rate. We e-mailed a link to an online survey to student-athletes on our campus. We targeted specific teams, offered an incentive (entered into a drawing for a $25.00 amazon.com gift card) and got a response rate of 37.6%. I thought that was decent, but the reviewer specifically asked "Why was the response rate so low?" Are there published/expected response rates that I can cite?
Thanks!
KS
Relevant answer
Answer
Are there rules of thumb or expected percentages for online survey response rates?
Not sure there is such a rule of thumb % but my personal experience shows the online survey response rate is ranging from 10% to 25% in which your 37.6% is considered very good. Suggest you refer to some journal articles that you'd reviewed relevant to your research field / industry then cite their response rates instead of looking for an industry average response rates.
  • asked a question related to Survey Methodology and Data Analysis
Question
10 answers
I have a question in my questionnaire regarding purchase intention, and the options to choose the answer are:
  • Definitely Not
  • Probably Not
  • Possibly
  • Probably
  • Definitely
From this question, I need to figure out the relation between purchase intention and three other factors (that are asked in 15 different likert scale questions)
I am doing a pretest, and so far 8 people have filled in the questionnaire and 7 of them have chosen 'Possibly' as their answers for the purchase intention question.
So, my question is if for example 90 percent of respondents in the final questionnaire choose the same answer for that question, can I still get the meaningful analysis from my data?
Relevant answer
Answer
Did you pilot your research instrument?
Doing a pilot study is important, so you can find out how participants might react to your items.
My suggestion is to treat these 8 respondents as a pilot study - change the scale - and re-pilot to see if the ceiling effect dissipates.
To solve the crux of the issue - Can you not rather use a sliding bar rather than each option? This way you only give them the two extremes - Definitely Not, Definitely ... and then they can place the bar on the slider for however strongly they feel.
This avoids using words that will receive a large number of responses.
I hope this helps.
  • asked a question related to Survey Methodology and Data Analysis
Question
4 answers
My data consits out of observations rated on 5 different dimensions (5 point Lickert-scale). A sixth variable describes the outcome (binary 0-1) I am interested in. But rather understanding the individual contribution of these dimensions on the outcome variable, I am interested in finding out the optimal combination of dimensions resulting in the highest probability of getting the Outcome equal to 1.
Do you have any advice regarding a methodological approach for me?
Thank you very much in advance, your help is highly appreciated!
Kind regards,
Jessica Birkholz
Relevant answer
Answer
Hi Jessica,when there is a multivariate analysis like yours,we would suggest two methods.In case all variables are observed variables (e.g. age,weight,frequency) etc then on can go for Regression.But in case any one of the variable is a construct (e.g stress,satisfaction which cannot be measured in numbers or units but will have to be measured with a likert scale statements set called items)then we go for the SEM which includes the CFA method of analysis.
  • asked a question related to Survey Methodology and Data Analysis
Question
5 answers
Hi, I am planning a community survey in a remote community. I have designed it as a paper based survey but am considering transferring it to an online survey platform such as Survey Monkey, Survey Gizmo, Survey Sparrow (later two have offline functionality so are looking more appealing). However, since the community is remote and there is a mix of young and old, I would like to still have the option of paper based surveys for those who prefer. Is there any risks or considerations I should be aware of if I go down this route?
Relevant answer
Answer
In survey research, this is known as a "mixed mode" approach, and there is a small literature on this topic, which you can search on the internet.
  • asked a question related to Survey Methodology and Data Analysis
Question
9 answers
I have utilized Qualtrics and SurveyMonkey in the past, and I truly appreciate the look/feel both for researcher and participant found with Qualtrics. What do you use, and what would you recommend to do if funding is difficult to secure for a more expensive survey application (i.e. Qualtrics) for a three to six month study?
Relevant answer
Answer
I am agreed with vidyasri khadanga
  • asked a question related to Survey Methodology and Data Analysis
Question
9 answers
Hi! I noticed that mPlus offers two alternative approaches to modeling when the measurements are not independent. One approach is multilevel modelling (TYPE=TWOLEVEL) and the other is complex and handles the problem of non-independence of observations in a different way (TYPE=COMPLEX). Multilevel modelling is very popular, but the second approach seems less frequent. What do you think about the second approach in cases when you have no reason to model random effects and you just want to account for the non-independence of observations. Please find below a passage from mPlus manual where the authors write about the second approach:
"Complex survey data refers to data obtained by stratification, cluster sampling and/or sampling with an unequal probability of selection. Complex survey data are also referred to as multilevel or hierarchical data. For an overview, see Muthén and Satorra (1995). There are two approaches to the analysis of complex survey data in Mplus. One approach is to compute standard errors and a chi-square test of model fit taking into account stratification, non-independence of observations due to cluster sampling, and/or unequal probability of selection. Subpopulation analysis is also available. With sampling weights, parameters are estimated by maximizing a weighted loglikelihood function. Standard error computations use a sandwich estimator. This approach can be obtained by specifying TYPE=COMPLEX in the ANALYSIS command in conjunction with the STRATIFICATION, CLUSTER, WEIGHT, and/or SUBPOPULATION options of the VARIABLE command. Observed outcome variables can be continuous, censored, binary, ordered categorical (ordinal), unordered categorical (nominal), counts, or combinations of these variable types. The implementation of these methods in Mplus is discussed in Asparouhov (2005, 2006) and Asparouhov and Muthén (2005, 2006a)."
I wonder whether anyone has already tried it before and what are your impressions. Thank you!
Relevant answer
  • asked a question related to Survey Methodology and Data Analysis
Question
22 answers
Recently i have been doing a research to examine the psychometric characteristics of Satisfaction With Life Scale (SWLS) (developed by Diener, Emmons, Larsen, & Griffin, 1985) in Indonesian context. The scale consists of 5 items with 7-point likert scale.
I'm planning to compare which one works better in Indonesian context, 7-point likert scale or 5-point likert scale of the Satisfaction With Life Scale. How can i compare both of them? What analysis i should conduct?
Thank you in advance.
Relevant answer
Answer
The standard way to do this sort of comparison is to randomly divide the sample in half, and to use one question format in each. Then you can assess whether one format or the other produces higher correlations, etc.
But my question would be: why bother? Is this so important in the Indonesian context that you would risk losing the half of your sample that uses the non-standard format?
I have attached a review article that reviews the literature on various formats for Likert response scoring, which generally favors odd numbered responses over even numbered and 7-point over 5-point scoring.
  • asked a question related to Survey Methodology and Data Analysis
Question
3 answers
As I recall, I saw an estimator which combined a ratio estimator and a product estimator using one independent variable, x, where the same x was used in each part.  Here is my concern: A ratio estimator is based on a positive correlation of x with y.  A product estimator is based on a negative correlation of x with y.  (See page 186 in Cochran, W.G(1977), Sampling Techniques, 3rd ed., John Wiley & Sons.)  So how can the same x variable be both positively and negatively correlated with y at the same time? 
Can anyone explain how this is supposed to work?
Thank you for your comments.
Relevant answer
Answer
I was referring to only one independent variable, when there are no others.
  • asked a question related to Survey Methodology and Data Analysis
Question
9 answers
As you are doubtless aware, paper-based survey has been known as one of the most common methods for gathering data relevant to people's behavior (either revealed preferences or stated preferences). I wanna make sure how much can we rely on new methods like Internet (Web)-based survey instead of traditional paper-based survey? In particular, my research's scope is related to travel behavior analysis. My research' sample should cover all socioeconomic groups and almost all geographical areas in a city.
I would be happy if somebody shared with me his/her opinion or the valid references.
Thanks in advance
Relevant answer
Answer
Another problem you have to consider is about respondent’s willingness to participate. You have to have a reliable database of lead contacts and be aware that response rate is very low, commonly around 5% to 10% so if you need a sample of 400 subjects, for example, you have to contact at least 8000 people.
Of course, never forget the main characteristics of your sample otherwise your results will be biased.
  • asked a question related to Survey Methodology and Data Analysis
Question
26 answers
Design-based classical ratio estimation uses a ratio, R, which corresponds to a regression coefficient (slope) whose estimate implicitly assumes a regression weight of 1/x.  Thus, as can be seen in Särndal, CE, Swensson, B. and Wretman, J. (1992), Model Assisted Survey Sampling, Springer-Verlang, page 254, the most efficient probability of selection design would be unequal probability sampling, where we would use probability proportional to the square root of x for sample selection. 
So why use simple random sampling for design-based classical ratio estimation?  Is this only explained by momentum from historical use?  For certain applications, might it, under some circumstances, be more robust in some way???  This does not appear to conform to a reasonable data or variance structure.
Relevant answer
Answer
I suggest to go through the book
"Sampling Technique"
by W. G. Cochran.
  • asked a question related to Survey Methodology and Data Analysis
Question
3 answers
I'm working on a financial inclusion project that wants to use a survey to engage with clients of digital financial systems (DFS) in rural areas and am looking for a tool recommendation - possibly SMS based, or Voice interactive response system, or mobile phone app (but less desirable).
Would love to hear of peoples experiences and suggestions.
Thanks
Relevant answer
Answer
I think Dr. Achmad Rizal has explained and spoke well and values.
  • asked a question related to Survey Methodology and Data Analysis
Question
3 answers
We are working in a survey design to youths, but we will include children are 12-14 year old. The classic measuring of time preferneces and risk will be complex to childrens. Likewise, some recomendation of general application of this question in survey experiment would be perfect. 
Relevant answer
Answer
Hi
Maybe this research can help you:
Subjective time discount rates among teenagers and adults: Evidence from Israel
E Lahav, U Benzion, T Shavit - The Journal of Socio-Economics, 2010
Best wishes,
Yaakov
  • asked a question related to Survey Methodology and Data Analysis
Question
6 answers
I'm at US National Science Foundation -- National Center for Science and Engineering Statistics. We sponsored the National Survey of Recent College Graduates (https://nsf.gov/statistics/srvyrecentgrads/), conducted from 1973 through 2010, a cross-sectional biennial survey that provided demographic and career information about individuals holding a bachelor's or master's degree. I am compiling a list of publications using the Survey data. We plan on adding this list to our website. If you have used the Survey data, please post the citation to your research below, and/or send me a copy of your paper. Thanks!
Relevant answer
Answer
Hi Karen
1. Yes, I am utilizing national students 'surveys in my research but not the surveys of National Center for Science and Engineering Statistics.
2. Papers:
. Z. A. Al-Hemyari and A. M. Al-Sarmi (2017). HEIs Quality Improvement Through Students and Academic Staff’s Perception: Data Analysis and Robustness of the Results. International Journal for Quality Research, Vol.11, No.2, pp.261-278 .
. Z. A. Al-Hemyari and A. M. Al-Sarmi (2016). Validity and Reliability of Students and Academic Staff’s Surveys to Improve Higher Education. Educational Alternatives, Journal of International Scientific Publications, Vol.14, pp. 242-263.
. . A. Al-Hemyari and A. M. Al-Sarmi (2015). Standards, benchmarks and qualitative indicators to enhance the institution’s activities and performance: Surveys and data analysis. International Journal of Knowledge-Based Organizations (IJKBO), 5 (4), pp.38-62.
Regards,
Zuhair
  • asked a question related to Survey Methodology and Data Analysis
Question
4 answers
Hi,
I am trying to combine parent data information to their children file.
1. I have one dta file for Children information and another for their parents who are in the separate data file. The children data set contains unique identifier for their parents ID as well which I can use who is the childs mother or father. What I want to do is to match some details such as Father and Mothers employment details, education level to their children database. Is there any efficient way to do this.
2. If I have combined the two Children and Adult dataset into one dta file, can I then do what I want to do above or I will have to do it separately as I mention in (1).
3. What if I have organised the children data into a panel dataset and now want to add this information about parents. Any efficient way to do from here then?
Looking for ideas.
Thank You.
Relevant answer
Answer
Hi Kushneel,
Here's a link to a book about database management. It contains several discussions about data relationships and crafting a relational database. It may be of some benefit.
Have a great day!
--Adrian
  • asked a question related to Survey Methodology and Data Analysis
Question
4 answers
I'm at US National Science Foundation -- National Center for Science and Engineering Statistics. We sponsor the Survey of Earned Doctorates (https://nsf.gov/statistics/srvydoctorates/), a census of all individuals receiving a research doctorate from a US university. I am compiling a list of publications using the Survey data. We plan on adding this list to our website. If you have used the Survey data, please post the citation to your research below, and/or send me a copy of your paper. Thanks!
Relevant answer
Answer
Thank you! I sended come of my publication on Reserch gate. Full list of publication on our Museum page:
http://museumkiev.org/upload/scientist/gritsenko_praci.pdf. If anybody will read ever, I can send it with pleasure.
Best wishes and respect from Ukraine,
Volodymyr
  • asked a question related to Survey Methodology and Data Analysis
Question
10 answers
Hi, 
So I'm a beginner at SPSS and I want to know what one should do when inputting survey data into SPSS and the question can yield both continuous numerical data as well as categorical data.
Example: I'm working on a survey of migrant workers' incomes and there is one question that asks about workers' incomes pre-migration. There are five options:
1) Write down your income:________$
2) I was self-employed
3) I worked in agricultural labour
4) I did not have a paid job
5) Don't know/can't remember
Unlike straightforward closed questions where one can code responses to particular values, there is the possibility for continuous input. As such, what is the best practice under such a circumstance? 
Relevant answer
Answer
I FOLLOW THE QUESTION
  • asked a question related to Survey Methodology and Data Analysis
Question
5 answers
I have 4 populations of different races that I am following for a period of time. I am looking for the incidence of a particular condition (lets call it condition A) after an event x. (lets say joining a particular business)... I find that one particular population (lets call it population k) has significantly lower incidence of the condition A compared to other populations . Upon further analysis I realize that event x occurred in most members of the population k a lot later than the other three populations.. (ie most members of the business joined the business a lot later than other 3 groups).
I want to know whether the lower incidence of condition A is due to the late occurrence of event x or do they actually have a lower incidence.
How do I approach this problem? I am thinking of getting kaplan meier curves of condition A in all 4 subgroups..what do I do there after?
Relevant answer
Answer
nice information thank you
  • asked a question related to Survey Methodology and Data Analysis
Question
7 answers
I am still collecting survey tools used by researchers who have used in studies of transgender men and women. The tools will be used to inform the development of a new survey tool that may be added to CDC's National HIV Behavioral Surveillance (NHBS) survey set. If you are willing to share your survey tool (and have not already done so) please do so.
All the best,
Stephen 
Relevant answer
Answer
I follow the question
  • asked a question related to Survey Methodology and Data Analysis
Question
8 answers
Think you are going to introduce a innovative therapy and you wanted to do a market survey ,what will be your population size and how to determine this.
Relevant answer
Answer
Interesting question
  • asked a question related to Survey Methodology and Data Analysis
Question
8 answers
I'm planning to perform a SEM in order to investigate intention to innovate. This dependent variable depends on attitude toward innovation (ATI) and entrepreneurial self-efficacy (ESE) - based on the scheme of the Theory of Planned Behaviour. Literature indicates that other variables play a role, such as gender and family exposure to entrepreneurship (FEE).
How should I perform the analysis, in order to see how gender and FEE impacts the scheme? The data comes from a survey with 1200 answers.
At first I thought on mediating model, but I believe the effect is not a causal relationship between gender and the dependent variable, but a moderator effect. Nevertheless, I am struggling to implement it in Stata. Most literature about it talk about mediation and moderator at the same time, which is not the case.
Other suggestions of analysis would also be really helpful.
Best regards, Pedro
Relevant answer
Answer
interesting question
  • asked a question related to Survey Methodology and Data Analysis
Question
3 answers
I'm at US National Science Foundation -- National Center for Science and Engineering Statistics. We sponsor the National Survey of College Graduates (https://nsf.gov/statistics/srvygrads/), a longitudinal biennial survey conducted since the 1970s that provides data on the nation's college graduates. I am compiling a list of publications using the Survey data. We plan on adding this list to our website. If you have used the Survey data, please post the citation to your research below, and/or send me a copy of your paper. Thanks!
Relevant answer
Answer
I Follow question
  • asked a question related to Survey Methodology and Data Analysis
Question
6 answers
Please comment on my methodology to get reposes from consumers.  I have proposed a mixture of on-the-field survey and on-line survey. My online survey has secured 121 responses so far and the process is still on. I am looking for some 1500 responses, whereas on the field I would secure around 500 self-administered survey questionnaires.   Please bear in mind that the total population of the country is 180Milion. Please comment if the sample size is appropriate. Also please share an article on this if possible. Please share an appropriate sample calculation method. Does online calculators reliable, or valid ??? Please also bear in mind that I have used a Mix Method Technique and have also secured 23 semi-structured interviews. Please suggest/ share a relevant article on that as well. Thanks
Relevant answer
Answer
You can use Krecjie and Morgan sample size determination to validate you sample size. However, anything is considered infinite; per the Krecjie and Morgan formular you can select only 384 which will be representative of the population.
  • asked a question related to Survey Methodology and Data Analysis
Question
5 answers
I have seen several references to "impure heteroscedasticity" online as heteroscedasticity caused by omitted variable bias.  However, I once saw an Internet reference, as I recall, which reminds me of a phenomenon where data that should be modeled separately are modeled together, causing an appearance of increased heteroscedasticity.  I think there was a youtube video.  That seems like another example of "impure" heteroscedasticity to me. Think of a simple linear regression, say with zero intercept, where the slope, b, for one group/subpopulation/population is slightly larger than another, but those two populations are erroneously modeled together, with a compromise b. The increase in variance of y for larger cases of x would be at least partially due to this modeling problem.  (I'm not sure that "model specification error" covers this case where one model is used instead of the two - or more - models needed.)
I have not found that reference online again.  Has anyone seen it? 
I am interested in any reference to heteroscedasticity mimicry.  I'd like to include such a reference in the background/introduction to a paper on analysis of heteroscedasticity which, in contrast, is only from the error structure for an appropriate model, with attention to unequal 'size' members of a population.  This would then delineate what my paper is about, in contrast to 'heteroscedasticity' caused by other factors. 
Thank you. 
Relevant answer
Answer
Multilevel random coefficient models are particularly suitable when between-group variances are heteroscedastic like in the example you mention. There are plenty of good references on these models in the statistical literature.
  • asked a question related to Survey Methodology and Data Analysis
Question
2 answers
Dear Fellows
I have got the survey data “World Bank’s Enterprise Survey 2013” in SPSS form. My research objective is to find out the obstacles faced by the firms while doing business in Pakistan. There are 15 obstacles listed below:
1. Electricity to Operations of This Establishment
2. Transport
3. Customs and Trade Regulations
4. Practices of competitors in informal sector
5. Access to Land
6. Crime, Theft and Disorder
7. Access to Finance
8. Tax Rates
9. Tax Administrations
10. Business Licensing and Permits
11. Political Instability
12. Corruption
13. Courts
14. Labor Regulations
15. Inadequately Educated Workforce
They are measured on 5-point likert scale (
No Obstacle ,Minor Obstacle , Moderate Obstacle, Major Obstacle , Very Server Obstacle).
Sampling Technique: Disproportionate Stratified Random Sampling
Three level of stratification have been used in the Survey: firm size, business sector, and geographic region within a country. Firm size levels are 5-19 (small), 20-99 (medium), and 100+ employees (large-sized firms). The business sector has been breakdown into manufacturing (Food, Textiles, Garments, Chemicals, Non‐metallic Minerals, Motor Vehicles, Other Manufacturing) and services (Retail and other services). Five regional stratification.
However, I am not interested in particular strata (groups) within the population. What kind of statistical tools can be applied here?
Thank you
Relevant answer
Answer
Does your research account for the home location of the firm e.g. domestic or foreign?
While frequency is interesting, the actual factum of "what exactly" is where it gets interesting. Focussing on a limited number of difficulties might also help to get more in depth knowledge.
Kind Regards
Roland
  • asked a question related to Survey Methodology and Data Analysis
Question
5 answers
Dear all,
Does anyone can suggest me some contributions about:
-the historical development of survey methodology in the social sciences
-the possibile integrations between survey and big data?
Thanks for your attention!
Francesco Molteni
Relevant answer
Answer
One well-known reference is J. Converse's book, Survey Research in the United States: Roots and Emergence 1890-1960.
  • asked a question related to Survey Methodology and Data Analysis
Question
12 answers
Sample size = min 100 respondents; Approach = Email > Web-based questionnaire; Number of UK universities = 133;
For both Random Sampling and Stratified Random Sampling, there should be the list of all elements. Theoretically, the list could be made as the staff names and their emails are readily available online; however, practically, it seems to be not feasible because there is no guarantee that I could find the contact details (emails) of all academic staff for each university. Therefore, if I do not have a full list of staff, I do not know the population size, so then I cannot use Random Sampling. Could you give me advice on which technique I may use and how to apply it. Thank you for your help.
Relevant answer
Answer
No doubt you should go for stratified random sampling, by doing stratified random sampling things become simpler for you. Make strata according to the type of university i.e. may be some are central, some are state and some are deemed university.
Then you decide the number of university to select from each strata by PPS. And after that you select your desirable sample size from that.
In this your sample will be representative and provide a good unbiased result.
Thanks
I hope it helps.
  • asked a question related to Survey Methodology and Data Analysis
Question
19 answers
If I pick up data from a survey for only 10% and randomly generate the rest of 90% from an application. (based on the 10%)  Will this work?  I am in IS discipline.
I think many people do simulate things in other domains too.
Relevant answer
Answer
What is your purpose for creating simulated data? If your study is entirely methodological, then the source of the data may not matter. But if you are attempting to describe a population, or perform an experiment, then adding simulated data does not help you to accomplish either task, and may actually get in the way. You may be adding random error that attenuates the relationship you are looking for, or may be artificially creating the appearance of a relationship that does not actually exist.
  • asked a question related to Survey Methodology and Data Analysis
Question
4 answers
I'm looking for good tools to document the development of pain/discomfort over a timespan of aprox. 10-15weeks for a small studie with 5-10 patients a group.
So far I only came across the VAS and SF-12, has one of you some other advice for me?
Cheers
Relevant answer
Answer
Thanks for your help!
  • asked a question related to Survey Methodology and Data Analysis
Question
4 answers
You want to do a survey of student opinion in a district regarding outdoor sports. What will be your sampling plan? How will you ensure randomness?
Relevant answer
Answer
thanks nitin sir Eddie Seva Sir, Basil sir
  • asked a question related to Survey Methodology and Data Analysis
Question
4 answers
Hi!
I've noticed that most questionnaires use a sample size of 100 to 1000 when testing a new positive psychology intervention. How would you decide on a reasonably representative sample? What method would you use? How would that method compare to experiments to decide the efficacy and safety of new medication?
 Put in another way :My interest is testing the efficacy of a psychology intervention, not a questionnaire. Let's say i want to design a randomized placebo controlled experiment, how would I decide  on the size of the sample to ascertain whether the intervention brings about  a statistically significant result.
Thank you so much
Ibrahim
Relevant answer
Answer
Dear Ibrahim,
A priori sample size calculations always require an estimate of the size of the effect you are measuring. For a large effect a sample size of 20 might be adequate. For a small effect a sample size of 1,000 might be inadequate. They also depend on the design of the experiment.
  • asked a question related to Survey Methodology and Data Analysis
Question
6 answers
does anybody have experience/guidance on whether to use the standard strongly disagree, disagree, neither agree nor disagree, agree, strongly agree format or some alternative which I find attractive: fully disagree, mainly disagree, neither agree nor disagree, mainly agree, fully agree?
I am pretty sure it will not make a big difference, but my 2nd option is not widely used (at least I know only of one or  two examples). The 2nd option appears to be more attractive when people may have complex opinions (along the lines - yes, but), and might be suitable for e.g. a survey amongst university staff members. 
Relevant answer
Answer
I don't want to be rude, but the above answers above reflect  partly lack of attention to my very simple question (although the answer/evidence may not be readily available?) and/or limited expertise in the field. I will not further monitor any responses to this question. If it would be possible, I would close this question for further comments, but I don't think RG allows me to do this. 
  • asked a question related to Survey Methodology and Data Analysis
Question
6 answers
My study is to identify:
- The motivations of the local grassroots volunteers by using the construct of Self-Determination Theory
- Explore for any differences in motivation between new and old volunteers
- Explore the impact of grassroots volunteerism on national identity
- Evaluate effectiveness in recruiting new volunteers
As of now, I am preparing to run the Volunteer Motivation Scale which has a 7-likert scale questions for the different types of motivation. However my question, how may I go about analysing the results to answer my research objectives? What kind of statistical test should I perform?
In addition, should I add in questions to measure how their volunteering work can impact on national identity? 
In addition to the questionnaire, I also intend to perform a face-to-face semi-structured interview to ask them on the Basic Needs that fuels their motivation, and their sustained motivation and also how their work can impact the national identity.
Relevant answer
Answer
The general idea is to combine the separate, Likert-scored items into a single scale, which you would correlate with the length of years volunteered.
However, with an N of only 30, it would be hard to produce statistically significant results with any form of test, due to the "low power" of a test with such such a small sample.
  • asked a question related to Survey Methodology and Data Analysis
Question
8 answers
Hi there. I am currently doing a BA in TESOL and this is my first research project, so bare with me if I sound a little clueless!
My question is: can I adapt a research instrument (survey) to fit my needs, or will this invalidate it? To clarify; I want to measure to what extent my students' motivations for using the learning management system are internalized and autonomous in nature. I want to use the LLOS-IEA (Noels et al, 2003), however, I will need to change the instrument to be asking questions about the "Flipped Learning" system we use.
Relevant answer
Answer
Changing the so-called "stem question" from asking about languages to asking about something else is known as using a measure "based on" the original format. At a minimum, would still want to assess the reliability of that new form, and it would be desirable to demonstrate its validity as well.
In contrast, changing the wording of the items themselves is a major modification of the original scale.
  • asked a question related to Survey Methodology and Data Analysis
Question
3 answers
I've carried out a questionnaire, where participants had to rate on a likert scale 1-5 a list of response strategies with respect to their ability addressing a certain issue. In total there are five such major issues and they were asked to rank the strategies under each separately. What is the best analysis technique to analyse and summarize the above data.
Relevant answer
Answer
I agree with Muayyad Ahmad that you have 5 sub scales. the next question is whether or not they are strongly correlated with each other. If so, they could form a single scale.
Because there have been so many questions here about the scaling of Likert-scored items, I have compiled a set of resources on this topic:
  • asked a question related to Survey Methodology and Data Analysis
Question
4 answers
Going to conduct student interviews for qualitative data - so survey identifying science skills.
Science self-efficacy survey before science fair participation and one before competition.
Thank you! 
Relevant answer
Answer
Thank you so much for your input!
  • asked a question related to Survey Methodology and Data Analysis
Question
7 answers
Where might I find a finite population dataset, with one dependent and two independent variables, with population size approximately 15 < N < 100, and all continuous (no count) variables?  (A sample from an infinite population, with approximately 15 < n < 100, perhaps preferably a random sample, might possibly do.  However it is important that two predictor variables are deemed adequate for the illustration I have in mind.) 
   
I have a method for comparing regression model performances graphically, for cases with all continuous data, for which I want to write a clear explanation, including a simple graphical illustration.  I am retired and would just be relying on Excel, and possibly a small population data set to illustrate this.  I tried searching the internet and was at first encouraged by the number of multiple regression datasets available, but quickly found that locating such a dataset for clear illustrative purposes, and with my limited programming resources, was elusive. 
Any suggestions would be appreciated.  -  Thank you.
Relevant answer
Answer
James, the stat.ethz.ch ... page has the list of datasets.
  • asked a question related to Survey Methodology and Data Analysis
Question
2 answers
Does the World Bank Enterprise Survey database include the indicator on firm innovation?
Relevant answer
Answer
Hi, @Bernard Thank you so much! I have registered the website and downloaded the raw data successfully.
In the view of New Structrual Economics by Prof. Lin Justin Yifu, countries at different development stages have different innovational structures and then their need for financial services is different. Firms in developed countries extend the technology friontier and firms in developing countries always imitate current technology which has been used in developed countries. So firms in developed countries face higher risks and need much more money than those in developing countries. Fiancial markets and banks have different advantages in diversifying risks and providing money for innovative firms.Therefore the optimal financial structures in high-income countries and low-income countries are distinctive. I would analyse the mechnism in micro level in detail and provide some empirical evidence on this subject.
How do you think about this issue? You can find more information in the book of New structural economics -- a framework for rethinking development if you are interested in this topic.
Best wishes,
Hao
  • asked a question related to Survey Methodology and Data Analysis
Question
6 answers
EN: In order to carry out a survey / questionnaire on the degree of satisfaction of the inhabitants In relation to walkability in the different tissues forming the city, I would need a few measurable indicators of walkability (in relation to urban morphology), I think Concept of "Comfort" and "Protection" of pedestrians. Are there any others.? Thank you
FR: Pour effectuer un sondage/questionnaire sur le degré de satisfaction des habitants par rapport à la marchabilité dans les différents tissus formant la ville, j'aurais besoin de quelques indicateurs mesurables de la marchabilité (en relation avec la morphologie urbaine), je pense à la notion du "Confort" et de "Protection" des piétonniers. y' en a t-ils d'autres? Merci
Relevant answer
Answer
Could be worth looking at the materials from EU project Pedestrian Quality Needs - http://www.walkeurope.org/
  • asked a question related to Survey Methodology and Data Analysis
Question
6 answers
"Survey" is a very broad term, having widely different meanings to a variety of people, and applies well where many may not fully realize, or perhaps even consider, that their scientific data may constitute a survey, so please interpret this question broadly across disciplines.
It is to the rigorous, scientific principles of survey/mathematical statistics that this particular question is addressed, especially in the use of continuous data.  Applications include official statistics, such as energy industry data, soil science, forestry, mining, and related uses in agriculture, econometrics, biostatistics, etc. 
Good references would include
Cochran, W.G(1977), Sampling Techniques, 3rd ed., John Wiley & Sons. 
Lohr, S.L(2010), Sampling: Design and Analysis, 2nd ed., Brooks/Cole.
and
Särndal, CE, Swensson, B. and Wretman, J. (1992), Model Assisted Survey Sampling, Springer-Verlang.  
For any scientific data collection, one should consider the overall impact of all types of errors when determining the best methods for sampling and estimation of aggregate measures and measures of their uncertainty.  Some historical considerations are given in Ken Brewer's Waksberg Award article:
 Brewer, K.R.W. (2014), “Three controversies in the history of survey sampling,” Survey Methodology,
(December 2013/January 2014), Vol 39, No 2, pp. 249-262. Statistics Canada, Catalogue No. 12-001-X.
     
In practice, however, it seems that often only certain aspects are emphasized, and others virtually ignored.  A common example is that variance may be considered, but bias not-so-much.  Or even more common, sampling error may be measured with great attention, but nonsampling error such as measurement and frame errors may be given short shrift.  Measures for one concept of accuracy may incidentally capture partial information on another, but a balanced thought of the cumulative impact of all areas on the uncertainty of any aggregate measure, such as a total, and overall measure of that uncertainty, may not get enough attention/thought. 
     
What are your thoughts on promoting a balanced attention to TSE? 
Thank you.  
Relevant answer
Answer
Promote increased replication. Every experiment is really two experiments. The first is the planned experiment that is conducted. We hope it has sufficient replication. However, I could go back and take another sample from the population that you sampled and I would get a different answer (at least quantitatively). So the experiment is one replicate of all possible experiments that could have happened. If you increased replication there would be room to do other types of analysis that could start to address this concern.
The place to start this would be an introductory statistics course so that more people consider experimenting on broader issues. However, this might be a problem for people that struggle with statistics.
I got to the second controversy in the Brewer article. Point 1, gave me trouble "Because randomization had not been used, the investigators had not been able to invoke the Central Limit Theorem. Consequently they had been unable to use the normality of the estimates " I think this statement is nonsense. The distribution of the individual sample will converge to the underlying distribution of the population being sampled. If I sample selectively, then the techniques used to choose my sample points also define my population. So random sampling doesn't seem to enter into this On the other hand, my experiment in one of many possible similar experiments unless the technique for sampling is somehow deterministic. This is not allowed: I will sample the first person named Sally who walks through the hospital door at 8:00 on Tuesday, and I know that the department head always arrives at 8:00 on Tuesday and her name is Sally (yes she could have a cold on my sampling day, or get stuck in traffic, thereby resulting in a small probability that I would get someone else). However, as long as there is some random process that is sampled, then the distribution of all possible experiments that I could do will be Gaussian by the CLT. Randomization, in terms of sampling design, doesn't enter into this. If I have an available population of 5000 people, and I assign each of them a number, then have a random number generator spit out 50 numbers and I use those people the CLT will apply to my sample. If I select every 100th person the CLT will still apply unless someone is making sure that all the people go through the door in exactly the same order every time I run the experiment no matter which door I use. Randomization may help insure that the sample is not deterministic, but it is not a requirement for the CLT to function.
  • asked a question related to Survey Methodology and Data Analysis
Question
10 answers
We are planning a study to compare a set of specific daily motivations between job families.
We will be developing a survey specifically for the constructs we are interested in using  combination of grounded analysis and PCA in a larger sample, but we would like to compare our results to existing instruments and/or to be able to base our constructs on elements that have already appeared in the context of work motivation.
I have used various intrinsic motivation-based surveys in the past, so I would especially like something on specific intrinsic / extrinsic motivators at work - that is, not just whether the job as a whole is motivating.
I have a background in mixed methods and experimental research, but this is my first real foray in to work/organizational psychology so a bit of help to get started would go a long way!
Relevant answer
Answer
Hi Andreas, 
I think Hamid had a typo: Gagné, M., & Deci, E. L. (2005). Self‐determination theory and work motivation. Journal of Organizational behavior, 26(4), 331-362.
Christian
  • asked a question related to Survey Methodology and Data Analysis
Question
5 answers
I want to identify the relationship between DV (which in likert scale 1-5) and 3 or 4 or 5 IV which in likert scale too. I have computed the items of each scale (mean). It is for my thesis and from bibliografy i have found that there is relationship between the constructs. I have made the attached model and i want to confirm it. I must say that i have non-normal distribution in my data
Relevant answer
Answer
When the response categories are ordered, you could run a multinomial regression model. The disadvantage is that you are throwing away information about the ordering. An ordinal logistic regression model preserves that information, but it is slightly more involved.
  • asked a question related to Survey Methodology and Data Analysis
Question
8 answers
I am conducting a project in three different centers. I have sent questionnaire to 1,000 respondents in each center and got responses of as follows;
1) less than 200 in first two centers, and 
2) more than 750 in third center
As response of last center is extremely large as compare to first two. 
How I will compare these three centers data. Kindly guide me I want to apply t test and ANOVA.
thank you so much
Relevant answer
Answer
I think you need to do ANOVA  and to follow it with post hoc to compare between 3 groups. {if you hanlde your data with multiple comparison through ttest, then inflation in alpha will arise}
However, because you have one group obviosly larger in sample size, you need to consider: Weighted Means Analysis, in ANOVA, the attached word file is to help you in how to do weighted means, which was retrived from the following link. Hope this help
  • asked a question related to Survey Methodology and Data Analysis
Question
7 answers
I am a PhD researcher and have utilised an exploratory sequential instrument development design.  I conducted semi-structured interviews and have consequently developed a survey from the findings.  The survey is descriptive and cross-sectional and is simply a follow-up quantitative phase to the main qualitative phase of the research.
Due to the issues with potential response rates (challenging to engage the study population), I envisage piloting to be difficult as I am asking additional questions to merely completing the survey.  Is it appropriate for me to go back to the original interview participants and pilot test the survey with them in addition to identifying new pilot testers? Or methodologically would that not be approved of, given their participation in the original interviews and their existing awareness of the topic area?
Any guidance would be appreciated,
Thanks
Relevant answer
Answer
If it is practical, I would recommend working with a new pilot sample. In addition to the previous participants now being more familiar with your topic, they are also the source of your current measures. if you want to be sure that your instrument works more broadly, you new to try it on a new sample.
  • asked a question related to Survey Methodology and Data Analysis
Question
3 answers
dataset for analysis?
technique to apply on the dataset?
can I use R or other analytical tool?
Relevant answer
Answer
You ought to consider some form of canonical correlation, that is, devise a composite variable (variate) (using some special syntax in SPSS or R) compososed of social interaction and technological capability, and correlate it with your relevant survey scales (or individual items, if that matters). Alternatively, you can compute some factor scores of your DVs that constitute a factor/component based on exploratory factor analysis, derive the factor scores, and apply them for regression together with your survey data.
  • asked a question related to Survey Methodology and Data Analysis
Question
5 answers
I have purchased the PASE scoring manual and have been trying to use the Syntax in SPSS. However, the Syntax won’t calculate the PASE score when there are missing data in the questionnaire. This occurs specifically in questions 2,3,4,5,6, and 10 where the questions include multiple parts. Has anyone had a similar experience? Do I need to use a different software?
Also, there is no clear instruction in the manual on how to deal with missing data when scoring the questionnaire manually.
Does anyone know how to deal with missing data when scoring the questionnaire?
Thanks.
Relevant answer
I also would suggest literature review - especially about multiple imputation methods.
  • asked a question related to Survey Methodology and Data Analysis
Question
6 answers
I have two sets of data from the same survey with 55 responses each. The samples are independent. Subjects were asked to select from a list of 16 skill types (nominal) their top five most important skills. They were then asked to select their five least important skills. Each of these 10 skills that were selected are unique (no repeats).
After selecting those 5 most (least) important skills, I asked respondents to rank them in order of their importance (non-importance).
I would like to know what statistical methods I can use to analyze each data set and how can I compare the two data sets. I am trying to find out if there are any meaningful differences in skills that were chosen between these two groups.
Relevant answer
Answer
I believe the Mann-Whitney test is the most appropriate method to use here.
  • asked a question related to Survey Methodology and Data Analysis
Question
10 answers
I am doing research on FDI in retail.
To examine "consumer’s  perception towards FDI in retail." I had collected data from two cities meerut and agra. What type of test I should apply to analysis  the data. Hereby Questionnaire is attached. 
Relevant answer
Answer
Dear Sourabh,
I suggest you carry out a scale reliability analysis on each of your sections with at least 3 questions to see whether any of them could be reduced into a single scale variable. I recommend using a Principal Component Analysis to do this - see my guide:
Once you have done this you can either use a t-test or, if its assumptions are not met, a Mann-Whitney U test.