ArticleLiterature Review

Commonly Used Data-collection Approaches in Clinical Research

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

We provide an overview of the different data-collection approaches that are commonly used in carrying out clinical, public health, and translational research. We discuss several of the factors that researchers need to consider in using data collected in questionnaire surveys, from proxy informants, through the review of medical records, and in the collection of biologic samples. We hope that the points raised in this overview will lead to the collection of rich and high-quality data in observational studies and randomized controlled trials.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Different resource-use data sources confer different advantages and disadvantages for economic evaluations [1,2]. Data can be self-reported, such as that collected through diaries or resource-use questionnaires (RUQs), and come in various mediums (online, paper and pencil, etc.). ...
... Administrative data can be relatively expensive and time-consuming to obtain, or researchers may have limited or restrictive access [1,2]. However, despite the possibility of omissions and data entry errors, administrative data is considered to be more accurate, standardised, reliable and practical. ...
... Conversely, self-reported data can be collected alongside other trial data and may include resource use not otherwise captured in administrative data (e.g. informal care data) [1,2]. However, self-reported data is reliant on responders' memory, therefore making it vulnerable to recall bias and poorer validity. ...
Article
Self-reported service use informs resource utilisation and cost estimates, though its validity for use within economic evaluations is uncertain. The aim of this study is to assess agreement in health resource-use measurement between self-reported and administrative data across different resource categories, over time and between different recall periods by subgroups among Australians living with psychosis. Data were obtained for 104 participants with psychotic disorders from a randomised controlled trial. Agreement between self-reported resource-use questionnaires and administrative data on community-based services and medication use was assessed through estimating differences of group mean number of visits and medications used and intraclass correlation coefficients (ICC) over multiple time periods. ICC showed moderate agreement across most time periods for general practitioners, psychiatrists and mental health medications. No clear trends were discernible over time, between varying lengths of recall periods nor across participant subgroups. Despite poor agreement, when measuring visits to psychologists and other health professionals, small overall differences in group mean number of visits indicate that self-reported data may still be valid for use in economic evaluations in people living with psychosis.
... The intrinsic nature of service provision within US health care organizations creates significant interpatient variation in how much, when, and what information is collected and recorded. 32,37,94 This reality is far from the epidemiologic ideal of standardized measurement according to a structured protocol and time schedule, applied uniformly across all study participants. Much of the care provided by US health care organizations revolves around addressing specific complaints (eg, symptoms) or managing chronic disease. ...
... Accordingly, the information-gathering tactics of providers often focus on these specific needs and are thus limited. 37,86,94 Providers vary in how information is elicited, as do patients in their willingness to disclose certain disorders. 32 Clinical documentation tends to reflect only presence of certain conditions, while absence is seldom documented, making disease absence versus disease status unknown indistinguishable. ...
... 32 Clinical documentation tends to reflect only presence of certain conditions, while absence is seldom documented, making disease absence versus disease status unknown indistinguishable. 74,94,95 Quantitative measurements made in routine practice can be subject to meaningful degrees of measurement error when clinical protocols are not adhered to. For example, systolic BP measurements made in clinical practice have greater variability and are often biased high when protocols maximizing validity are not followed, while quantitative blood glucose measurements are highly sensitive to fasting status. ...
Article
Full-text available
Cardiovascular disease surveillance involves quantifying the evolving population‐level burden of cardiovascular outcomes and risk factors as a data‐driven initial step followed by the implementation of interventional strategies designed to alleviate this burden in the target population. Despite widespread acknowledgement of its potential value, a national surveillance system dedicated specifically to cardiovascular disease does not currently exist in the United States. Routinely collected health care data such as from electronic health records (EHRs) are a possible means of achieving national surveillance. Accordingly, this article elaborates on some key strengths and limitations of using EHR data for establishing a national cardiovascular disease surveillance system. Key strengths discussed include the: (1) ubiquity of EHRs and consequent ability to create a more “national” surveillance system, (2) existence of a common data infrastructure underlying the health care enterprise with respect to data domains and the nomenclature by which these data are expressed, (3) longitudinal length and detail that define EHR data when individuals repeatedly patronize a health care organization, and (4) breadth of outcomes capable of being surveilled with EHRs. Key limitations discussed include the: (1) incomplete ascertainment of health information related to health care–seeking behavior and the disconnect of health care data generated at separate health care organizations, (2) suspect data quality resulting from the default information‐gathering processes within the clinical enterprise, (3) questionable ability to surveil patients through EHRs in the absence of documented interactions, and (4) the challenge in interpreting temporal trends in health metrics, which can be obscured by changing clinical and administrative processes.
... Advancing our knowledge through research requires the development of a well-designed research protocol that aligns with the research question(s). The results or the outcomes of clinical research are also dependent on high quality and reliable data collection methods (Saczynski, McManus, & Goldberg, 2013). Careful consideration is needed to ensure that the data collected matches the aims of the research, feasible to obtain and able to evaluate the outcome (phenomena of interest) of the investigation. ...
... Much of the data collected in clinical research uses participant self-report on standardized questionnaires, wherein everyone answers the same questions (Saczynski et al., 2013). Collecting self-report data allows the investigator to collate information from a sample of individuals through their responses (Ponto, 2015). ...
... Questionnaires can use a qualitative approach (open-ended questions), a quantitative approach (close-ended questions with numerically rated items), or both (mixed methods) (Ponto, 2015). Factors often collected include socio-demographic characteristics, medical history, medication use, and lifestyle practices (Saczynski et al., 2013), which can serve as independent or confounding variables in the data analysis. Other questions commonly asked focus on human behavior, healthcare knowledge, quality of life, and functional status (Saczynski et al., 2013). ...
Article
Editor's note: This is the fifth article in a series on clinical research by nurses. The series is designed to give nurses the knowledge and skills they need to participate in research, step by step. Each column will present the concepts that underpin evidence-based practice—from research design to data interpretation. The articles will be accompanied by a podcast offering more insight and context from the author. To see all the articles in the series, go to http://links.lww.com/AJN/A204.
... There are several methods for gathering tabular data. A questionnaire survey, patient-reported data, genetic information, proxy or informant data, a review of ambulatory or hospital medical records, and a collection of biological samples are the most frequently used datagathering methods among many others in clinical research [47,64]. ...
... Clinical and bio-medical data are essential components of healthcare systems, and they typically comprise information about patients' demographics, socioeconomic conditions, medical history, etc. [47]. Machine learning (ML) algorithms are often employed to make clinical decisions from such data, and several State-Of-The-Art (SOTA) ML algorithms are available for this purpose. ...
Article
Full-text available
Recent advancements in generative approaches in AI have opened up the prospect of synthetic tabular clinical data generation. From filling in missing values in real-world data, these approaches have now advanced to creating complex multi-tables. This review explores the development of techniques capable of synthesizing patient data and modeling multiple tables. We highlight the challenges and opportunities of these methods for analyzing patient data in physiology. Additionally, it discusses the challenges and potential of these approaches in improving clinical research, personalized medicine, and healthcare policy. The integration of these generative models into physiological settings may represent both a theoretical advancement and a practical tool that has the potential to improve mechanistic understanding and patient care. By providing a reliable source of synthetic data, these models can also help mitigate privacy concerns and facilitate large-scale data sharing.
... Collecting epidemiological information using a questionnaire as part of a research study is now common practice [20][21][22]. Questions are also often asked about participants' knowledge and attitudes toward various lifestyle and disease predisposing factors [20]. ...
... Collecting epidemiological information using a questionnaire as part of a research study is now common practice [20][21][22]. Questions are also often asked about participants' knowledge and attitudes toward various lifestyle and disease predisposing factors [20]. ...
Article
Full-text available
Aim: This study examines the prevalence and severity of orofacial disorders in patients with fibromyalgia syndrome (FMS). The research assesses the correlation with the Fibromyalgia Assessment Status (FAS) index. The goal is to improve the clinical approaches to these patients. Methods: A cross-sectional study was conducted using a structured questionnaire focused on the correlation between FMS and orofacial problems. The research involved 107 rheumatology patients diagnosed with FMS. Statistical analyses, including Spearman’s correlation, were utilized to investigate the relationships between the FAS index scores and various orofacial symptoms. Results: Of the participants, 11.2% responded that they were aware of the correlation between fibromyalgia and oral health. The statistical analysis showed statistically significant correlations between the FAS index and symptoms such as gum bleeding, teeth grinding during the day, and neck pain (p < 0.05). The correlation between the FAS index and joint noise upon opening the mouth, as well as dissatisfaction with one’s smile, also proved to be highly significant (p < 0.001). Conclusion: The outcomes demonstrate that, as the FAS index increases, the likelihood of developing orofacial disorders also increases among FMS patients. This highlights the importance of a multidisciplinary treatment approach.
... It was designed with the thought that it would be easier to design a pain scale in the evaluation of phantom pain in clinical practice than with existing pain scales and that patients would be successful in pain management. It is stated in the literature that applying a draft scale prepared for use with a small representative sample group will be beneficial; therefore, it will be beneficial to conduct a preliminary study (17). In the present study, the preliminary application of the 35-question draft scale was carried out with 30 patients before the validity application of the scale commenced. ...
... This single factor explains 46.217% of the variance of the PLP rating scale. Higher variance rates obtained as a result of the analysis reflect a stronger factor structure of the scale (17,23,24). For the present study, the variance rate was found to be 46.2%. ...
Article
Full-text available
Purpose: The aim of this study was to develop a valid and reliable scale to evaluate and measure phantom limb pain. Material and Methods: This study, which was designed in a methodological type, was conducted with a total of 258 patients. A demographics form and a draft scale developed by the research authors were used to collect the study data. Kuder-Richardson Formula 20 was used to provide descriptive statistics and reliability analyses for the study data. Exploratory Factor Analysis was used in the development of the phantom limb pain rating scale, and Reliability and Confirmatory Factor Analysis were used for the study’s validity and reliability evaluations. Results: The Kuder-Richardson 20 value, which shows the internal consistency of the questions of the 16-item the rating scale, was found to be 0.921. The total score of the rating scale ranged from 1 to 16, with an average of 11.19±4.94. It was determined that the fit criterias and corrected chi-square values showed acceptable fit, and that the scale was both statistically significant and valid (p=0.001; p
... The knowledgeable workforce was also chosen through the use of purposeful sampling. Tongco [29] claims that this method, commonly known as judgmental sampling, involves a researcher selecting participants for study participation based only on their personal judgements and perceived knowledge or experience related to the research being conducted. The key informants, who included Soroti Municipality council members and top management personnel, were chosen through the technique of purposeful sampling. ...
... Basic random sampling and selective sampling approaches were used in the inquiry [29]. The key informants, who included Soroti Municipality council members and top management personnel, were chosen through the technique of purposeful sampling. ...
Article
Full-text available
This study undertakes a comprehensive examination of solid waste management practices in Soroti Municipality, employing a cross-sectional survey design to gather data from a diverse sample of 314 respondents. Utilizing a combination of quantitative and qualitative methods, the research explores various aspects of waste management, including collection, transportation, disposal, and treatment techniques. Sampling techniques such as basic random sampling and purposeful sampling were employed to ensure representation across demographic groups and occupational sectors. Data collection methods included interviews, questionnaires, and observations, allowing for a multifaceted analysis of the subject matter. The study presents detailed findings on respondents' demographic characteristics, including gender distribution, age distribution, occupation, marital status, and education level, providing insights into the diverse perspectives within the community. Additionally, the research examines respondents' attitudes towards solid waste management practices, revealing both strengths and weaknesses in current approaches. Analysis of the data highlights several key challenges facing Soroti Municipality in solid waste management, including limited resources, inadequate infrastructure, and a lack of community engagement. Despite these challenges, the study identifies potential solutions and opportunities for improvement. Recommendations include raising public awareness, implementing 4R strategies (reduce, reuse, recycle, recover), encouraging stakeholder participation, and investing in infrastructure and technology. By addressing these challenges and implementing recommended strategies, Soroti Municipality can work towards more sustainable and effective solid waste management practices. This not only enhances the quality of life for its residents but also contributes to environmental conservation and sustainable development in the region.
... The most logical study design accommodated by EHR data is the historical (retrospective) cohort, though many of the issues discussed are relevant to other study types, including prospective studies [15][16][17][18]. EHR data are generated through the routine interactions of patients with healthcare organizations. ...
... The data elements eligible for consideration in any EHR-based epidemiologic study are those documented during the usual course of healthcare service provision and can be generally grouped into demographics, vital signs, diagnoses, procedures, medications, and laboratory tests [1,2,40]. The quantitative (how much) and qualitative (how good) attributes of EHR data undoubtedly rely on default measurement processes implemented within the clinical enterprise and documentation tactics of individual healthcare providers [6,18,34]. Physicians spend an estimated 20% of their professional time documenting clinical encounters, and though incentives exist to be exhaustive in documentation (to maximize reimbursement) while not over-documenting (to avoid fraud), it is impossible to retrospectively determine how well these principles were adhered to in practice [3][4][5]. Misclassification of categorical characteristics, the measurement error of continuous characteristics, and missing data are concerns in every epidemiologic study-concerns that are magnified in EHR-based research by the nature of the data-generating process [25,32,41,42]. ...
Article
Full-text available
In the United States, electronic health records (EHR) are increasingly being incorporated into healthcare organizations to document patient health and services rendered. EHRs serve as a vast repository of demographic, diagnostic, procedural, therapeutic, and laboratory test data generated during the routine provision of health care. The appeal of using EHR data for epidemiologic research is clear: EHRs generate large datasets on real-world patient populations in an easily retrievable form permitting the cost-efficient execution of epidemiologic studies on a wide array of topics. Constructing epidemiologic cohorts from EHR data involves as a defining feature the development of data machinery, which transforms raw EHR data into an epidemiologic dataset from which appropriate inference can be drawn. Though data machinery includes many features, the current report focuses on three aspects of machinery development of high salience to EHR-based epidemiology: (1) selecting study participants; (2) defining “baseline” and assembly of baseline characteristics; and (3) follow-up for future outcomes. For each, the defining features and unique challenges with respect to EHR-based epidemiology are discussed. An ongoing example illustrates key points. EHR-based epidemiology will become more prominent as EHR data sources continue to proliferate. Epidemiologists must continue to improve the methods of EHR-based epidemiology given the relevance of EHRs in today’s healthcare ecosystem.
... 3 When data collection or storage is inconsistent-for example, misclassifying a continuous variable as categorical-analyses can become flawed, resulting in biased or incorrect conclusions, and undermining the reliability of research findings. 4 Clinical research studies typically employ designs such as prospective and retrospective cohorts, as well as casecontrol studies. Each of these designs calls for a tailored approach to analysis. ...
Article
Full-text available
Objectives This narrative review aims to provide a comprehensive and clinically relevant synthesis of logistic regression applications in clinical medicine, particularly in risk prediction and diagnostic modeling. Key objectives include evaluating best practices, addressing common pitfalls, and outlining validation techniques when using logistic regression to analyze binary outcomes such as disease presence versus absence. Methods The review synthesizes data from 41 peer-reviewed articles spanning from 1987 to 2025, selected from databases including PubMed, MEDLINE, and Scopus using keywords including “logistic regression,” “clinical medicine,” “diagnostic studies,” “prognostic models,” “odds ratio,” and “model validation.” The narrative approach was chosen to integrate findings from various study designs, allowing for a broad discussion on the advantages and limitations of logistic regression in clinical research. The manuscript details key methodological considerations such as the appropriate coding of continuous and categorical variables, verification of core assumptions (e.g., linearity in the log-odds, independence of observations, absence of perfect separation), and adherence to sample size requirements. In addition, the review highlights the importance of splitting datasets into training, validation, and testing subsets, and incorporates performance metrics including sensitivity, specificity, precision, and F1 scores. Results The review reveals that logistic regression remains a cornerstone technique in clinical risk prediction due to its interpretability and robust framework for handling binary outcomes. Findings indicate that logistic regression models, when appropriately validated, significantly enhance diagnostic accuracy and provide reliable risk estimates through odds ratios and confidence intervals. The review identifies that data integrity, proper variable categorization, and rigorous assumption checks are critical for avoiding model misclassification. Furthermore, visual tools like violin plots are highlighted for their utility in comparing distributions of predicted probabilities across different outcome groups. Real-world examples demonstrate that factors such as biomarker levels (e.g., troponin in acute coronary syndrome) and patient characteristics (e.g., albumin levels, BMI in postoperative infections) are effectively modeled using logistic regression, leading to clinically meaningful inferences. Conclusion Logistic regression is an indispensable tool in clinical research for predicting binary outcomes and informing evidence-based practice. By integrating a detailed discussion of best practices, common pitfalls, and model validation techniques, the manuscript offers a definitive guide for clinicians and researchers. It emphasizes that rigorous adherence to methodological standards—from data preparation to performance evaluation—can significantly improve predictive accuracy and clinical decision-making. This study hopes to serve as a valuable reference to clinicians, and explain statistical and machine learning topics in a clinical context that is easily understood and widely accessible.
... This research strives to advance TB diagnostics globally, with a focus on marginalized regions, using three CNNs for improved early detection. It builds upon prior studies covering TB transmission, drug resistance, diagnostics, treatment outcomes, and public health interventions, employing diverse methods [30]. Large-scale studies and trials provide comprehensive insights into TB epidemiology, treatment efficacy, and strategies, enhancing prevention, diagnosis, and treatment [20]. ...
Chapter
Full-text available
This paper introduces an innovative healthcare approach using convolutional neural networks (CNNs) to detect tuberculosis (TB) in chest X-rays. The primary objective is to develop an advanced neural network solution for TB diagnosis. This work employ ResNet50, VGG16, and MobileNet-V2 architectures to analyze a dataset of 4200 chest radiographs from 700 confirmed TB patients. The experiments show ResNet50's superiority, achieving an impressive 93% accuracy with a 23% loss. This underscores ResNet50's potential for early TB detection and its effectiveness in medical imaging. To attain these insights, the team extensively researched modern image classification techniques, object recognition methods, and transfer learning strategies. The CNN models were meticulously assessed deepening our understanding of their strengths and limitations. These findings hold profound implications for TB detection and significantly advance deep learning in medical imaging tasks.
... Navigating the landscape of hospital-based field research can pose an array of challenges ranging from issues with data collection procedures to cooperation from health care professionals or suspending the underlying sense of suspicion related to research. [6][7][8] These challenges can arise due to various factors, such as busy schedules, competing priorities, and institutional bureaucracy. 9 Obtaining necessary approvals and permissions can be an exhausting process given the multiple layers of review and coordination with hospital administrators and ethics committees. ...
... The selection of the type of response desired was often made based on the difficulty of the question asked and the depth of knowledge and level of precision the investigator would like to have about a particular factor. 71 Usability tests and interviews were most commonly used for involving users and capturing their perspectives across three stages of the medical device lifecycle. During usability tests, various data collection methods were used to collect both qualitative and quantitative data. ...
Article
Full-text available
Objective This systematic review aims to describe the involvement of persons with epilepsy (PWE), healthcare professionals (HP) and caregivers (CG) in the design and development of medical devices is epilepsy. Methods A systematic review was conducted, adhering to the Preferred Reporting Items for Systematic Reviews and Meta‐Analyses (PRISMA) guidelines. Eligibility criteria included peer‐reviewed research focusing on medical devices for epilepsy management, involving users (PWE, CG, and HP) during the MDD process. Searches were performed on PubMed, Web of Science, and Scopus, and a total of 55 relevant articles were identified and reviewed. Results From 1999 to 2023, there was a gradual increase in the number of publications related to user involvement in epilepsy medical device development (MDD), highlighting the growing interest in this field. The medical devices involved in these studies encompassed a range of seizure detection tools, healthcare information systems, vagus nerve stimulation (VNS) and electroencephalogram (EEG) technologies reflecting the emphasis on seizure detection, prediction, and prevention. PWE and CG were the primary users involved, underscoring the importance of their perspectives. Surveys, usability testing, interviews, and focus groups were the methods used for capturing user perspectives. User involvement occurs in four out of the five stages of MDD, with production being the exception. Significance User involvement in the MDD process for epilepsy management is an emerging area of interest holding a significant promise for improving device quality and patient outcomes. This review highlights the need for broader and more effective user involvement, as it currently lags in the development of commercially available medical devices for epilepsy management. Future research should explore the benefits and barriers of user involvement to enhance medical device technologies for epilepsy. Plain Language Summary This review covers studies that have involved users in the development process of medical devices for epilepsy. The studies reported here have focused on getting input from people with epilepsy, their caregivers, and healthcare providers. These devices include tools for detecting seizures, stimulating nerves, and tracking brain activity. Most user feedback was gathered through surveys, usability tests, interviews, and focus groups. Users were involved in nearly every stage of device development except production. The review highlights that involving users can improve device quality and patient outcomes, but more effective involvement is needed in commercial device development. Future research should focus on the benefits and challenges of user involvement.
... Moreover, such data is sourced from different clinical institutions and is often gathered at various time intervals to facilitate longitudinal studies (Dipietro et al., 2023). Data collection within hospital settings often relies on traditional methods such as paper-based documentation or static spreadsheets (Poline et al., 2012;Saczynski et al., 2013). These methods, despite being straightforward and widespread in clinics, may present some limitations in data standardization and accessibility (Wilcox et al., 2012). ...
Article
Full-text available
Neuroscience studies entail the generation of massive collections of heterogeneous data (e.g. demographics, clinical records, medical images). Integration and analysis of such data in research centers is pivotal for elucidating disease mechanisms and improving clinical outcomes. However, data collection in clinics often relies on non-standardized methods, such as paper-based documentation. Moreover, diverse data types are collected in different departments hindering efficient data organization, secure sharing and compliance to the FAIR (Findable, Accessible, Interoperable, Reusable) principles. Henceforth, in this manuscript we present a specialized data management system designed to enhance research workflows in Deep Brain Stimulation (DBS), a state-of-the-art neurosurgical procedure employed to treat symptoms of movement and psychiatric disorders. The system leverages REDCap to promote accurate data capture in hospital settings and secure sharing with research institutes, Brain Imaging Data Structure (BIDS) as image storing standard and a DBS-specific SQLite database as comprehensive data store and unified interface to all data types. A self-developed Python tool automates the data flow between these three components, ensuring their full interoperability. The proposed framework has already been successfully employed for capturing and analyzing data of 107 patients from 2 medical institutions. It effectively addresses the challenges of managing, sharing and retrieving diverse data types, fostering advancements in data quality, organization, analysis, and collaboration among medical and research institutions.
... Background Accurate and complete health outcome data directly from patients or study participants (hereon referred to as patients) are critical for health care and research [1][2][3]. Unfortunately, it can be burdensome to extract patient-reported health data that researchers or providers need [4,5]. Collecting patient-reported outcomes data is becoming increasingly important in clinical research and care [6,7]. ...
Article
Full-text available
Background Self-administered web-based questionnaires are widely used to collect health data from patients and clinical research participants. REDCap (Research Electronic Data Capture; Vanderbilt University) is a global, secure web application for building and managing electronic data capture. Unfortunately, stakeholder needs and preferences of electronic data collection via REDCap have rarely been studied. Objective This study aims to survey REDCap researchers and administrators to assess their experience with REDCap, especially their perspectives on the advantages, challenges, and suggestions for the enhancement of REDCap as a data collection tool. Methods We conducted a web-based survey with representatives of REDCap member organizations in the United States. The survey captured information on respondent demographics, quality of patient-reported data collected via REDCap, patient experience of data collection with REDCap, and open-ended questions focusing on the advantages, challenges, and suggestions to enhance REDCap’s data collection experience. Descriptive and inferential analysis measures were used to analyze quantitative data. Thematic analysis was used to analyze open-ended responses focusing on the advantages, disadvantages, and enhancements in data collection experience. Results A total of 207 respondents completed the survey. Respondents strongly agreed or agreed that the data collected via REDCap are accurate (188/207, 90.8%), reliable (182/207, 87.9%), and complete (166/207, 80.2%). More than half of respondents strongly agreed or agreed that patients find REDCap easy to use (165/207, 79.7%), could successfully complete tasks without help (151/207, 72.9%), and could do so in a timely manner (163/207, 78.7%). Thematic analysis of open-ended responses yielded 8 major themes: survey development, user experience, survey distribution, survey results, training and support, technology, security, and platform features. The user experience category included more than half of the advantage codes (307/594, 51.7% of codes); meanwhile, respondents reported higher challenges in survey development (169/516, 32.8% of codes), also suggesting the highest enhancement suggestions for the category (162/439, 36.9% of codes). Conclusions Respondents indicated that REDCap is a valued, low-cost, secure resource for clinical research data collection. REDCap’s data collection experience was generally positive among clinical research and care staff members and patients. However, with the advancements in data collection technologies and the availability of modern, intuitive, and mobile-friendly data collection interfaces, there is a critical opportunity to enhance the REDCap experience to meet the needs of researchers and patients.
... 1,2 Data collection tools include paper forms to be filled out at the bedside; electronic forms filled out using a computer, tablet, or smartphone; software that automatically collects data from the electronic medical record; or qualitative forms to collect free-text data. 3 Data can be sourced directly by researchers through medical records, recorded at the bedside by bedside staff, through surveys, or downloaded from devices. In most cases, the data should not solely exist within the data collection tool but also be available in another location, such as an electronic medical record for lab values, blood gas results, or ventilator settings. ...
Article
Research studies generate data in various forms. Data can be quantitative or qualitative. Research involving human subjects requires protection of data to ensure privacy. Various regulations and local policies need to be followed to ensure data security. Data management plans are critical for effective data stewardship and include details plan on data collection, management, storage, and formatting. This paper will review data collection tools, data security strategies, file management, data storage, government regulations, prepping data for analysis, reference management, and file management.
... Clinical and Biomedical Routine Data (CBRD) is largely tabular in nature: Research projects in clinical, translational, and biomedical fields require high-quality data, consisting of diverse clinical and biomedical information from patients. Many approaches exist to collect data, including questionnaires and patient self-reports, proxies and informants, hospital and ambulatory medical records, and analysis of biological samples (Saczynski, McManus, & Goldberg, 2013). Especially for tabular data, better interpretation of machine learning models can be achieved using feature selection and prioritization (Hee, Dritsaki, Willis, Underwood, & Patel, 2017). ...
Article
Full-text available
Tabular Clinical and Biomedical Routine Data (CBRD) contains diverse feature types. Recent research shows that the conventional application of Uniform Manifold Projection and Approximation (UMAP) to extract clusters from the low dimensional embedding can prove ineffective due to the diverse feature types in such datasets. Feature-type Distributed Clustering (FDC) workflow accounts for these diverse feature types resulting in a more informative low-dimensional embedding. However, a rigorous assessment of the FDC algorithm is missing so far. In this work, we conducted comprehensive benchmarking experiments to compare the quality of the cluster distributions and low dimensional embeddings generated by the FDC against that of the ones generated by UMAP using standard objective measures: Silhouette score, Dunn index, and ANOVA. Our results confirm that FDC can indeed be the better choice to embed tabular data with diverse feature types in low dimensions and thereby extract clusters from such an embedding. In addition, we provide a rationale behind the choice of metrics proposed in the FDC workflow. Moreover, we also point out some problems with the original Canberra metric used to reduce ordinal features in the FDC workflow and provide a solution in the form of a modified version of the Canberra metric. Using seven datasets from the medical domain for benchmarking, we demonstrate that FDC leads to improved patient
... Surveys are one of the most common sources of data in social science [1][2][3], political science [4][5][6][7], public health [8,9] and medical research [10,11], informing public policy, medical practice and public opinion. Despite the widespread use of survey research, self-report data has come under increasing scrutiny due to data quality concerns [12][13][14]. ...
Article
Full-text available
Survey respondents who are non-attentive, respond randomly, or misrepresent who they are can impact the outcomes of surveys. Prior findings reported by the CDC have suggested that people engaged in highly dangerous cleaning practices during the COVID-19 pandemic, including ingesting household cleaners such as bleach. In our attempts to replicate the CDC’s results, we found that 100% of reported ingestion of household cleaners are made by problematic respondents. Once inattentive, acquiescent, and careless respondents are removed from the sample, we find no evidence that people ingested cleaning products to prevent a COVID-19 infection. These findings have important implications for public health and medical survey research, as well as for best practices for avoiding problematic respondents in all survey research conducted online.
... The most significant aspect of a clinical trial/research is data collection. Because there is a tremendous amount of patient/participant data that needs to be collected and because there are many medical terminologies, we need a harmonized approach 55 . We have what is called MedDRA or Medical Dictionary to solve this issue. ...
Article
Full-text available
The research carried out to find a better treatment, improve healthcare, and benefit the current medical practice is termed clinical research. Clinical trial includes the pharmacodynamics (mechanisms of action of a new drug), pharmacokinetics (drug metabolism inside the body), therapeutics (efficacy of the drug), and adverse effects (safety of the drug) of the novel medical products. Clinical research is a process that involves human subjects and their biological specimens. The clinical trial is a meticulously planned protocol-based study of a drug/device to discover a new/better way to prevent, diagnose, and treat a disease/illness. Considering the involvement of both healthy and diseased people in clinical trials, the regulatory authorities have a significant role in the processes involving the conduction of clinical research and carefully evaluate their potential implications on humans. Because clinical trials are usually aimed at assessing the safety and efficacy of novel pharmaceutical compounds and medical devices, pharmacovigilance laws and risk management assume increased significance while conducting clinical research/trials. In this review, we attempt to discuss the regulatory authorities' roles in different geographical regions, including the United States of America, The European Union, and India. We also focus on the importance of pharmacovigilance laws and risk management during clinical trials.
... 102 For instance, different healthcare providers and systems can record substantially different amounts of detail about medical histories, clinical variables, and laboratory results. 103 These differences can lead to differences in the measurement error (either random or systematic error), which will affect an existing model's performance and a new model's regression coefficients. 60 104-106 Authors should therefore explain how predictors were measured, so that readers can evaluate this information before choosing to use a model in practice. ...
Article
Full-text available
The TRIPOD-Cluster (transparent reporting of multivariable prediction models developed or validated using clustered data) statement comprises a 19 item checklist, which aims to improve the reporting of studies developing or validating a prediction model in clustered data, such as individual participant data meta-analyses (clustering by study) and electronic health records (clustering by practice or hospital). This explanation and elaboration document describes the rationale; clarifies the meaning of each item; and discusses why transparent reporting is important, with a view to assessing risk of bias and clinical usefulness of the prediction model. Each checklist item of the TRIPOD-Cluster statement is explained in detail and accompanied by published examples of good reporting. The document also serves as a reference of factors to consider when designing, conducting, and analysing prediction model development or validation studies in clustered data. To aid the editorial process and help peer reviewers and, ultimately, readers and systematic reviewers of prediction model studies, authors are recommended to include a completed checklist in their submission.
... Actually, it is a widely used method for data collection in epidemiological research by reviewing ambulatory or hospital medical records, which is readily available and contain much more demographic and clinical information. On the other hand, due to the non-standardized way in which this information is collected, recorded, and/or extracted by various healthcare professionals and research team members, there may be conflicts or problems with the accuracy of the data contained in medical records [63]. In addition, the extent of documentation about key history or clinical variables can vary widely between healthcare systems, which may partly explain significant heterogeneity within studies. ...
Article
Full-text available
Background Charcot–Marie–Tooth disease and related inherited peripheral neuropathies (CMT&RIPNs) brings great suffering and heavy burden to patients, but its global prevalence rates have not been well described. Methods We searched major English and Chinese databases for studies reporting the prevalence of CMT&RIPNs from the establishment of the databases to September 26, 2022. Based on the age, gender, study design, study region, and disease subtype, the included studies were correspondingly synthesized for meta-analyses on the overall prevalence and/or the subgroup analyses by using pool arcsine transformed proportions in the random-effects model. Results Of the finally included 31 studies, 21 studied the whole age population and various types of CMT&RIPNs, and the others reported specific disease subtype(s) or adult or non-adult populations. The pooled prevalence was 17.69/100,000 (95% CI 12.32–24.33) for the whole age population and significantly higher for CMT1 [10.61/100,000 (95% CI 7.06–14.64)] than for other subtypes (P’ < 0.001). Without statistical significance, the prevalence seemed higher in those aged ≥ 16 or 18 years (21.02/100,000) than in those aged < 16 years (16.13/100,000), in males (22.50/100,000) than in females (17.95/100,000), and in Northern Europe (30.97/100,000) than in other regions. Conclusion CMT&RIPNs are relatively more prevalent as CMT1 in the disease subtypes, and probably prevalent in older ages, males, and Northern Europe. More studies on the epidemiological characteristics of CMT&RIPNs with well-defined diagnosis criteria are needed to improve the prevalence evaluation and to arouse more attention to health care support.
... CAD systems in medical analysis are usually trained and tested on an ensemble of data called a dataset, that are generally composed of images and other important information called metadata (e.g., age of patient, race, sex, Insurance type). Some hospitals, universities and laboratories in different countries used several approaches to collect data that belong to patients [34]. Datasets collection in medical area aims to advance research in detecting diseases. ...
Article
Full-text available
Chest X-ray radiography (CXR) is among the most frequently used medical imaging modalities. It has a preeminent value in the detection of multiple life-threatening diseases. Radiologists can visually inspect CXR images for the presence of diseases. Most thoracic diseases have very similar patterns, which makes diagnosis prone to human error and leads to misdiagnosis. Computer-aided detection (CAD) of lung diseases in CXR images is among the popular topics in medical imaging research. Machine learning (ML) and deep learning (DL) provided techniques to make this task more efficient and faster. Numerous experiments in the diagnosis of various diseases proved the potential of these techniques. In comparison to previous reviews our study describes in detail several publicly available CXR datasets for different diseases. It presents an overview of recent deep learning models using CXR images to detect chest diseases such as VGG, ResNet, DenseNet, Inception, EfficientNet, RetinaNet, and ensemble learning methods that combine multiple models. It summarizes the techniques used for CXR image preprocessing (enhancement, segmentation, bone suppression, and data-augmentation) to improve image quality and address data imbalance issues, as well as the use of DL models to speed-up the diagnosis process. This review also discusses the challenges present in the published literature and highlights the importance of interpretability and explainability to better understand the DL models’ detections. In addition, it outlines a direction for researchers to help develop more effective models for early and automatic detection of chest diseases.
... Accessible and affordable software like REDCap [2] and R [3] lower the barrier to entry for independent investigators while maintaining the high accuracy and reliability needed for large studies. As the number of studies grows so does the amount of data needed to answer tomorrow's research questions and data collection remains an integral part of the clinical research enterprise [4]. ...
The impacts of COVID-19 among men who have sex with men (MSM), who face limited access to HIV services due to stigma, discrimination, and violence, need to be assessed and quantified in terms of HIV treatment outcomes for future pandemic preparedness. This study aimed to evaluate the effects of the COVID-19 lockdown on the HIV treatment cascade among MSM in selected provinces of South Africa using routine programme data after the implementation of differentiated service delivery (DSD) models. An interrupted time series analysis was employed to observe the trends and patterns of HIV treatment outcomes among MSM in Gauteng, Mpumalanga, and KwaZulu-Natal from 1 January 2018 to 31 December 2022. Interrupted time series analysis was applied to quantify changes in the accessibility and utilisation of HIV treatment services using the R software version 4.4.1. The segmented regression models showed a decrease followed by an upward trend in all HIV treatment outcomes. After the implementation of the DSD model, significant increases in positive HIV tests (estimate = 0.001572; p < 0.001), linkage to HIV care (estimate = 0.001486; p < 0.001), ART initiations (estimate = 0.001003; p = 0.004), ART collection (estimate = 0.001748; p < 0.001), and taking viral load tests (estimate = 0.001109; p = 0.001) were observed. There was an overall increase in all HIV treatment outcomes during the COVID-19 lockdown in light of the DSD model.
Cover Page
Full-text available
Use of Relational Databases: Relational databases offer a systematic way to store and manage large datasets. However, setting up and maintaining these databases require technical expertise and resources. Creation of Frequency Distribution: From Tabulation to Graphical Representation: Presenting data effectively involves organizing it into frequency distributions. This step ranges from simple tabulations to advanced graphical representations, ensuring that data is accessible and interpretable. Statistics plays a pivotal role in healthcare by guiding decision-making, enhancing research methodologies, and improving patient outcomes. This paper explores the fundamental aspects of statistics, its applications in biostatistics, and the structured processes of data collection, organization, analysis, and interpretation. Key issues encountered in healthcare settings, such as data integrity, ethical concerns, and misconceptions about statistical analysis, are examined. The clinical significance of utilizing statistical findings for practical decision-making is highlighted, emphasizing the importance of interprofessional collaboration. Ultimately, this work underscores the critical need for healthcare professionals to develop statistical literacy to ensure accurate data utilization and improved healthcare delivery.
Conference Paper
Full-text available
Advantages and Disadvantages of Data Collection Approaches: Various methods, such as surveys, interviews, and observational studies, each have their own strengths and weaknesses. For instance, while surveys can reach a large population, they may suffer from response bias, whereas interviews provide in-depth data but are time-consuming and resource-intensive. Researcher-Participant Partnership: Establishing a strong partnership between researchers and participants is essential for obtaining reliable data. Trust and transparency can significantly influence participant engagement and the quality of information shared. Statistics plays a pivotal role in healthcare by guiding decision-making, enhancing research methodologies, and improving patient outcomes. This paper explores the fundamental aspects of statistics, its applications in biostatistics, and the structured processes of data collection, organization, analysis, and interpretation. Key issues encountered in healthcare settings, such as data integrity, ethical concerns, and misconceptions about statistical analysis, are examined. The clinical significance of utilizing statistical findings for practical decision-making is highlighted, emphasizing the importance of interprofessional collaboration. Ultimately, this work underscores the critical need for healthcare professionals to develop statistical literacy to ensure accurate data utilization and improved healthcare delivery.
Poster
Full-text available
Differences in the Application of Clinical and Statistical Significance: Statistical significance does not always equate to clinical relevance. Researchers must distinguish between the two to draw conclusions that are meaningful in real-world healthcare settings. Addressing these concerns is essential for ensuring the integrity and utility of statistical applications in healthcare research. Each step in the statistical process must be conducted with rigor, transparency, and ethical responsibility to produce valid and actionable findings. Statistics plays a pivotal role in healthcare by guiding decision-making, enhancing research methodologies, and improving patient outcomes. This paper explores the fundamental aspects of statistics, its applications in biostatistics, and the structured processes of data collection, organization, analysis, and interpretation. Key issues encountered in healthcare settings, such as data integrity, ethical concerns, and misconceptions about statistical analysis, are examined. The clinical significance of utilizing statistical findings for practical decision-making is highlighted, emphasizing the importance of interprofessional collaboration. Ultimately, this work underscores the critical need for healthcare professionals to develop statistical literacy to ensure accurate data utilization and improved healthcare delivery.
Research Proposal
Full-text available
Limitation of Data and Its Measurement in Studying Health Disparities: Data limitations, such as incomplete or inconsistent records, can hinder the study of health disparities. Additionally, measuring complex phenomena like social determinants of health requires robust and nuanced approaches. Ethical Issues on the Use of Secondary Data Analysis: Using pre-existing data raises ethical concerns, including the potential misuse of data and the need to respect the original consent provided by participants. Statistics plays a pivotal role in healthcare by guiding decision-making, enhancing research methodologies, and improving patient outcomes. This paper explores the fundamental aspects of statistics, its applications in biostatistics, and the structured processes of data collection, organization, analysis, and interpretation. Key issues encountered in healthcare settings, such as data integrity, ethical concerns, and misconceptions about statistical analysis, are examined. The clinical significance of utilizing statistical findings for practical decision-making is highlighted, emphasizing the importance of interprofessional collaboration. Ultimately, this work underscores the critical need for healthcare professionals to develop statistical literacy to ensure accurate data utilization and improved healthcare delivery.
Technical Report
Full-text available
The Interpretation of p-Values: P-values are often misunderstood or misused. Researchers must clearly understand what p-values indicate about statistical significance without overstating their importance. Steps to Data Summarization: Summarizing data involves reducing complexity while preserving essential information. This process requires careful consideration to ensure that key insights are not lost. Statistics plays a pivotal role in healthcare by guiding decision-making, enhancing research methodologies, and improving patient outcomes. This paper explores the fundamental aspects of statistics, its applications in biostatistics, and the structured processes of data collection, organization, analysis, and interpretation. Key issues encountered in healthcare settings, such as data integrity, ethical concerns, and misconceptions about statistical analysis, are examined. The clinical significance of utilizing statistical findings for practical decision-making is highlighted, emphasizing the importance of interprofessional collaboration. Ultimately, this work underscores the critical need for healthcare professionals to develop statistical literacy to ensure accurate data utilization and improved healthcare delivery.
Poster
Full-text available
The statistical process is inherently iterative. Each completed study often raises new questions, prompting further investigation and restarting the cycle. By continuously collecting, organizing, analyzing, and interpreting data, researchers contribute to the advancement of science and the improvement of healthcare practices. This iterative process underscores Statistics plays a pivotal role in healthcare by guiding decision-making, enhancing research methodologies, and improving patient outcomes. This paper explores the fundamental aspects of statistics, its applications in biostatistics, and the structured processes of data collection, organization, analysis, and interpretation. Key issues encountered in healthcare settings, such as data integrity, ethical concerns, and misconceptions about statistical analysis, are examined. The clinical significance of utilizing statistical findings for practical decision-making is highlighted, emphasizing the importance of interprofessional collaboration. Ultimately, this work underscores the critical need for healthcare professionals to develop statistical literacy to ensure accurate data utilization and improved healthcare delivery.
Negative Results
The final step of the statistical process involves interpreting the results to derive meaningful conclusions. This stage connects the analysis back to the research context, generating actionable insights and contributing to the body of knowledge. Accurate interpretation ensures that findings are relevant, reproducible, and applicable to real-world scenarios. Statistics plays a pivotal role in healthcare by guiding decision-making, enhancing research methodologies, and improving patient outcomes. This paper explores the fundamental aspects of statistics, its applications in biostatistics, and the structured processes of data collection, organization, analysis, and interpretation. Key issues encountered in healthcare settings, such as data integrity, ethical concerns, and misconceptions about statistical analysis, are examined. The clinical significance of utilizing statistical findings for practical decision-making is highlighted, emphasizing the importance of interprofessional collaboration. Ultimately, this work underscores the critical need for healthcare professionals to develop statistical literacy to ensure accurate data utilization and improved healthcare delivery.
Preprint
Full-text available
In the healthcare setting, several issues arise when applying statistics, particularly in research studies. These issues are frequently encountered in both community and clinical research contexts. Key concerns include challenges in data collection, organization, analysis, and interpretation. Below is an expanded discussion of these points: Statistics plays a pivotal role in healthcare by guiding decision-making, enhancing research methodologies, and improving patient outcomes. This paper explores the fundamental aspects of statistics, its applications in biostatistics, and the structured processes of data collection, organization, analysis, and interpretation. Key issues encountered in healthcare settings, such as data integrity, ethical concerns, and misconceptions about statistical analysis, are examined. The clinical significance of utilizing statistical findings for practical decision-making is highlighted, emphasizing the importance of interprofessional collaboration. Ultimately, this work underscores the critical need for healthcare professionals to develop statistical literacy to ensure accurate data utilization and improved healthcare delivery.
Method
The next step involves extracting meaningful patterns and insights through statistical analysis. Depending on the goals of the study, analysis may be: Descriptive: Summarizing data using narratives, tables, graphs, or charts to highlight trends and distributions. Inferential: Drawing conclusions about a population based on sample data through hypothesis testing or parameter estimation. Modern statistical software facilitates efficient data analysis, offering tools tailored to various research needs. The choice of software and methods should align with the study’s objectives and data characteristics. Statistics plays a pivotal role in healthcare by guiding decision-making, enhancing research methodologies, and improving patient outcomes. This paper explores the fundamental aspects of statistics, its applications in biostatistics, and the structured processes of data collection, organization, analysis, and interpretation. Key issues encountered in healthcare settings, such as data integrity, ethical concerns, and misconceptions about statistical analysis, are examined. The clinical significance of utilizing statistical findings for practical decision-making is highlighted, emphasizing the importance of interprofessional collaboration. Ultimately, this work underscores the critical need for healthcare professionals to develop statistical literacy to ensure accurate data utilization and improved healthcare delivery.
Cover Page
In healthcare, statistics play a crucial role in evaluating health situations, identifying trends, and guiding interventions. Biostatistics specifically focuses on analyzing health-related data, ranging from vital statistics to clinical trial results. It aids in forming hypotheses, testing them rigorously, and ultimately improving healthcare outcomes. By applying statistical tools, professionals can make informed decisions, design effective studies, and interpret complex datasets. Statistics plays a pivotal role in healthcare by guiding decision-making, enhancing research methodologies, and improving patient outcomes. This paper explores the fundamental aspects of statistics, its applications in biostatistics, and the structured processes of data collection, organization, analysis, and interpretation. Key issues encountered in healthcare settings, such as data integrity, ethical concerns, and misconceptions about statistical analysis, are examined. The clinical significance of utilizing statistical findings for practical decision-making is highlighted, emphasizing the importance of interprofessional collaboration. Ultimately, this work underscores the critical need for healthcare professionals to develop statistical literacy to ensure accurate data utilization and improved healthcare delivery.
Chapter
Data management follows, involving the organization and analysis of health data tailored to the study’s objectives. Whether through raw data arrays or frequency distributions, the choice of data organization method depends on variables’ types and levels of measurement. Properly organized data facilitates descriptive and inferential analyses, utilizing tools like tables, graphs, and statistical software to draw meaningful conclusions. Statistics plays a pivotal role in healthcare by guiding decision-making, enhancing research methodologies, and improving patient outcomes. This paper explores the fundamental aspects of statistics, its applications in biostatistics, and the structured processes of data collection, organization, analysis, and interpretation. Key issues encountered in healthcare settings, such as data integrity, ethical concerns, and misconceptions about statistical analysis, are examined. The clinical significance of utilizing statistical findings for practical decision-making is highlighted, emphasizing the importance of interprofessional collaboration. Ultimately, this work underscores the critical need for healthcare professionals to develop statistical literacy to ensure accurate data utilization and improved healthcare delivery.
Article
Statistics, as a scientific discipline, encompasses the systematic process of acquiring and managing data. Within the realms of medicine and life sciences, this branch of knowledge is often referred to as biostatistics, emphasizing its critical application to health and medical research. Statistics provides valuable insights into health conditions and informs healthcare professionals’ decisions, whether in clinical practice or research settings. Statistics plays a pivotal role in healthcare by guiding decision-making, enhancing research methodologies, and improving patient outcomes. This paper explores the fundamental aspects of statistics, its applications in biostatistics, and the structured processes of data collection, organization, analysis, and interpretation. Key issues encountered in healthcare settings, such as data integrity, ethical concerns, and misconceptions about statistical analysis, are examined. The clinical significance of utilizing statistical findings for practical decision-making is highlighted, emphasizing the importance of interprofessional collaboration. Ultimately, this work underscores the critical need for healthcare professionals to develop statistical literacy to ensure accurate data utilization and improved healthcare delivery.
Book
The statistical process involves a series of interconnected steps, creating a continuous cycle of scientific activities. It begins with the acquisition of health data, using tools like surveys or questionnaires to collect primary data directly from respondents. Alternatively, secondary data sources, such as vital or health statistics, may be utilized. Ensuring the accuracy and reliability of this data is paramount, as errors can compromise subsequent analyses and interpretations. Statistics plays a pivotal role in healthcare by guiding decision-making, enhancing research methodologies, and improving patient outcomes. This paper explores the fundamental aspects of statistics, its applications in biostatistics, and the structured processes of data collection, organization, analysis, and interpretation. Key issues encountered in healthcare settings, such as data integrity, ethical concerns, and misconceptions about statistical analysis, are examined. The clinical significance of utilizing statistical findings for practical decision-making is highlighted, emphasizing the importance of interprofessional collaboration. Ultimately, this work underscores the critical need for healthcare professionals to develop statistical literacy to ensure accurate data utilization and improved healthcare delivery.
Presentation
Full-text available
Statistics is indispensable in healthcare, bridging the gap between data and informed decision-making. Despite its technical nature, statistical literacy is essential for all healthcare professionals to ensure ethical, accurate, and meaningful application of data. By addressing challenges and focusing on clinical relevance, healthcare teams can leverage statistics to enhance patient care, drive research, and support evidence-based practices. Collaboration and continual learning are vital for maximizing the benefits of statistical applications in healthcare. Statistics plays a pivotal role in healthcare by guiding decision-making, enhancing research methodologies, and improving patient outcomes. This paper explores the fundamental aspects of statistics, its applications in biostatistics, and the structured processes of data collection, organization, analysis, and interpretation. Key issues encountered in healthcare settings, such as data integrity, ethical concerns, and misconceptions about statistical analysis, are examined. The clinical significance of utilizing statistical findings for practical decision-making is highlighted, emphasizing the importance of interprofessional collaboration. Ultimately, this work underscores the critical need for healthcare professionals to develop statistical literacy to ensure accurate data utilization and improved healthcare delivery.
Presentation
Full-text available
The Integrity of Data Collection: Ensuring the accuracy and reliability of collected data is a critical concern. Any errors or biases introduced during data collection can compromise the validity of the entire study, leading to flawed conclusions. Data Collection About “Dying Patients”: Collecting data from terminally ill patients presents unique ethical and logistical challenges. Researchers must balance the need for accurate data with respect for patient dignity and consent. Statistics plays a pivotal role in healthcare by guiding decision-making, enhancing research methodologies, and improving patient outcomes. This paper explores the fundamental aspects of statistics, its applications in biostatistics, and the structured processes of data collection, organization, analysis, and interpretation. Key issues encountered in healthcare settings, such as data integrity, ethical concerns, and misconceptions about statistical analysis, are examined. The clinical significance of utilizing statistical findings for practical decision-making is highlighted, emphasizing the importance of interprofessional collaboration. Ultimately, this work underscores the critical need for healthcare professionals to develop statistical literacy to ensure accurate data utilization and improved healthcare delivery.
Experiment Findings
Once data is collected, it must be organized systematically for analysis. This step involves sorting, classifying, and structuring the data based on the study’s objectives. Researchers may use different methods of organization depending on the nature of the data: Raw Data: Organized as individual entries, typically used for small populations (e.g., case studies). Frequency Distribution: Used for larger datasets, presenting data as discrete or continuous series. Choosing the appropriate organization method depends on the type of variables (qualitative or quantitative) and their levels of measurement (nominal, ordinal, interval, ratio). Proper data management ensures that the dataset is primed for accurate analysis. Statistics plays a pivotal role in healthcare by guiding decision-making, enhancing research methodologies, and improving patient outcomes. This paper explores the fundamental aspects of statistics, its applications in biostatistics, and the structured processes of data collection, organization, analysis, and interpretation. Key issues encountered in healthcare settings, such as data integrity, ethical concerns, and misconceptions about statistical analysis, are examined. The clinical significance of utilizing statistical findings for practical decision-making is highlighted, emphasizing the importance of interprofessional collaboration. Ultimately, this work underscores the critical need for healthcare professionals to develop statistical literacy to ensure accurate data utilization and improved healthcare delivery.
Conference Paper
Statistics, as a science, involves the systematic acquisition, organization, analysis, interpretation, and presentation of data. It is a fundamental tool for understanding and addressing various phenomena by providing insights derived from empirical evidence. In the medical field and other life sciences, the term “biostatistics” is often used to emphasize its specialized application to medicine and health. By employing statistical methods, researchers and healthcare professionals can extract valuable information from data, enabling evidence-based decision-making in research studies and clinical practice. Statistics plays a pivotal role in healthcare by guiding decision-making, enhancing research methodologies, and improving patient outcomes. This paper explores the fundamental aspects of statistics, its applications in biostatistics, and the structured processes of data collection, organization, analysis, and interpretation. Key issues encountered in healthcare settings, such as data integrity, ethical concerns, and misconceptions about statistical analysis, are examined. The clinical significance of utilizing statistical findings for practical decision-making is highlighted, emphasizing the importance of interprofessional collaboration. Ultimately, this work underscores the critical need for healthcare professionals to develop statistical literacy to ensure accurate data utilization and improved healthcare delivery.
Preprint
Objective This systematic review aims to describe the involvement of persons with epilepsy (PWE), healthcare professionals (HP) and caregivers (CG) in the design and development of medical devices is epilepsy. Methods: A systematic review was conducted, adhering to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. Eligibility criteria included peer-reviewed research focusing on medical devices for epilepsy management, involving users (PWE, CG, and HP) during the MDD process. Searches were performed on PubMed, Web of Science, and Scopus, and a total of 55 relevant articles were identified and reviewed. Results: From 1999 to 2023, there was a gradual increase in the number of publications related to user involvement in epilepsy medical device development (MDD), highlighting the growing interest in this field. The medical devices involved in these studies encompassed a range of seizure detection tools, healthcare information systems, vagus nerve stimulation (VNS) and electroencephalogram (EEG) technologies reflecting the emphasis on seizure detection, prediction, and prevention. PWE and CG were the primary users involved, underscoring the importance of their perspectives. Surveys, usability testing, interviews, and focus groups were the methods employed for capturing user perspectives. User involvement occurs in four out of the five stages of MDD, with production being the exception. Significance User involvement in the MDD process for epilepsy management is an emerging area of interest holding a significant promise for improving device quality and patient outcomes. This review highlights the need for broader and more effective user involvement, as it currently lags in the development of commercially available medical devices for epilepsy management. Future research should explore the benefits and barriers of user involvement to enhance medical device technologies for epilepsy.
Article
Background Analyses of gender in academic authorship are key to characterizing representation in surgical fields, but current methods of manual data collection are time-consuming and error prone. The purpose of this study was to design a program to automatically extract publication data and verify the accuracy of this program in comparison to manually-collected data in a pilot study of three orthopaedic surgery journals. Methods Publications from three orthopaedic subspecialty journals between January 2019 and June 2021 were identified via PubMed search. For each publication, online publication date, journal issue month, first author name, and senior author name were collected from PubMed listings by hand and programmatically in a Python script (JournalADE). Gender was determined using Gender API. Results The percent of publications for which manually- and program-collected online publication dates were within 14 days of each other was above 95% for all journals. There was 98.3% (95% CI=97.84-98.76%) agreement for online publication date, with a mean difference of 6.43 (SD 0.87) days. Journal issue month agreement was 99.6% (95% CI=99.37-99.83%). Agreement for first author gender was 97.33% (95% CI=96.75-97.91%) and for senior author gender was 96.77% (95% CI=96.14-97.4%). Estimated labor time for manual collection was 100 hr, compared to 15 min for JournalADE. Conclusions When comparing the JournalADE- and manually-collected data, rates of agreement were high at a fraction of the time. This supports the efficacy of JournalADE and sets the stage for its use in future studies of gender in authorship.
Preprint
Full-text available
Swarm intelligence (SI) represents a paradigm in computational and decision-making processes derived from the collective behavior of decentralized agents such as insects, birds, and fish. This study explores the comprehensive application and evolution of SI within the decision sciences, emphasizing its role in enhancing decision-making across various sectors, including logistics, healthcare, and urban planning. Utilizing methodologies like literature review and case study analysis, the research underscores SI's adaptability and efficiency in solving complex, dynamic problems by leveraging collective behavior, decentralization, and self-organization principles. The study highlights the performance of SI algorithms, particularly Ant Colony Optimization (ACO) and Particle Swarm Optimization (PSO), in real-world scenarios, as they optimize processes and outputs in diverse environments. Furthermore, the study addresses SI's practical implementations and theoretical implications, discussing the challenges and future prospects for integrating SI more deeply into decision-making frameworks. The results show that SI not only makes decision-making much more efficient, but it also provides strong solutions that can be used in changing situations. This makes it a key approach in the ongoing development of decision sciences.
Chapter
People living with cancer and its associated treatment can experience important physical, psychological, and social burdens. Furthermore, these burdens compromise health-related quality of life (HRQoL) not only of individuals living with cancer but also their caregivers. Objective quantitative assessment of these health-related domains is sparse in oncology due to key factors such as the lack of patient-generated data, limitations in traditional assessment methods (e.g., questionnaires), or both. In this chapter, we provide an overview of the current practices and trends in real-world data (RWD) collection from longitudinal patient observations in routine cancer care. In addition, data science methods that can be applied to gain insights from the data will be reviewed so that it can be leveraged to provide real-world evidence (RWE) that can further inform the development and update of clinical practice guidelines (CPGs) in the oncology domain.
Article
Full-text available
The need for sufficient clinical evidence and the collection of real-world evidence (RWE) is at the forefront of medical device and drug regulations, however, the collection of clinical data can be a time consuming and costly process. The advancement of Digital Health Technologies (DHTs) is transforming the way health data can be collected, analysed, and shared, presenting an opportunity for the implementation of DHTs in clinical research to aid with obtaining clinical evidence, particularly RWE. DHTs can provide a more efficient and timely way of collecting numerous types of clinical data (e.g., physiological, and behavioural data) and can be beneficial with regards to participant recruitment, data management and cost reduction. Recent guidelines and regulations on the use of RWE within regulatory decision-making processes opens the door for the wider implementation of DHTs. However, challenges and concerns remain regarding the use of DHT (such as data security and privacy). Nevertheless, the implementation of DHT in clinical research presents a promising opportunity for providing meaningful and patient-centred data to aid with regulatory decisions.
Article
Full-text available
A 36-item short-form (SF-36) was constructed to survey health status in the Medical Outcomes Study. The SF-36 was designed for use in clinical practice and research, health policy evaluations, and general population surveys. The SF-36 includes one multi-item scale that assesses eight health concepts: 1) limitations in physical activities because of health problems; 2) limitations in social activities because of physical or emotional problems; 3) limitations in usual role activities because of physical health problems; 4) bodily pain; 5) general mental health (psychological distress and well-being); 6) limitations in usual role activities because of emotional problems; 7) vitality (energy and fatigue); and 8) general health perceptions. The survey was constructed for self-administration by persons 14 years of age and older, and for administration by a trained interviewer in person or by telephone. The history of the development of the SF-36, the origin of specific items, and the logic underlying their selection are summarized. The content and features of the SF-36 are compared with the 20-item Medical Outcomes Study short-form.
Article
Full-text available
A 36-item short-form (SF-36) was constructed to survey health status in the Medical Outcomes Study. The SF-36 was designed for use in clinical practice and research, health policy evaluations, and general population surveys. The SF-36 includes one multi-item scale that assesses eight health concepts: 1) limitations in physical activities because of health problems; 2) limitations in social activities because of physical or emotional problems; 3) limitations in usual role activities because of physical health problems; 4) bodily pain; 5) general mental health (psychological distress and well-being); 6) limitations in usual role activities because of emotional problems; 7) vitality (energy and fatigue); and 8) general health perceptions. The survey was constructed for self-administration by persons 14 years of age and older, and for administration by a trained interviewer in person or by telephone. The history of the development of the SF-36, the origin of specific items, and the logic underlying their selection are summarized. The content and features of the SF-36 are compared with the 20-item Medical Outcomes Study short-form.
Article
Full-text available
to determine how accurately information on disability provided by a caregiver (proxy respondent) reflected the opinion of subjects themselves, and if this agreement varied by severity of dementia or relationship of the caregiver to the subject. the study was based on data from the Canadian Study of Health and Aging, a multicentre study of dementia and health of Canadians age 65 and over. Eight hundred study subjects and their caregivers were independently interviewed regarding the subjects' activities of daily living (ADL). the percentage of subjects who were independent for individual ADL items and the agreement in these reports between subjects and caregivers were investigated using three-level kappa statistics. index subjects with caregivers other than spouses or offspring required more assistance with ADL. The reported percentage of independence decreased with increasing severity of dementia. There was more agreement between self- and proxy-reported level of independence for physical ADL than for instrumental ADL items. Agreement decreased with increasing severity of dementia. Few statistically significant differences were noted between level of agreement and caregiver relationship. satisfactory levels of agreement on ADL between cognitively normal subjects and their caregivers indicate that proxy respondents are a reasonable source of information on ADL when data collection from the subjects themselves is not feasible. Since agreement decreases as the severity of dementia increases, caregiver reports may be preferred for elderly patients even with mild dementia in order to facilitate longitudinal assessment of ADL ratings as the dementia progresses.
Article
Full-text available
This article presents information about the development and evaluation of the SF-36 Health Survey, a 36-item generic measure of health status. It summarizes studies of reliability and validity and provides administrative and interpretation guidelines for the SF-36. A brief history of the International Quality of Life Assessment (IQOLA) Project is also included.
Article
The CONSORT (Consolidated Standards of Reporting Trials) Statement aims to improve the reporting of randomized controlled trials (RCTs); however, it lacks guidance on the reporting of patient-reported outcomes (PROs), which are often inadequately reported in trials, thus limiting the value of these data. In this article, we describe the development of the CONSORT PRO extension based on the methodological framework for guideline development proposed by the Enhancing the Quality and Transparency of Health Research (EQUATOR) Network. Five CONSORT PRO checklist items are recommended for RCTs in which PROs are primary or important secondary end points. These recommendations urge that the PROs be identified as a primary or secondary outcome in the abstract, that a description of the hypothesis of the PROs and relevant domains be provided (ie, if a multidimensional PRO tool has been used), that evidence of the PRO instrument's validity and reliability be provided or cited, that the statistical approaches for dealing with missing data be explicitly stated, and that PRO-specific limitations of study findings and generalizability of results to other populations and clinical practice be discussed. Examples and an updated CONSORT flow diagram with PRO items are provided. It is recommended that the CONSORT PRO guidance supplement the standard CONSORT guidelines for reporting RCTs with PROs as primary or secondary outcomes. Improved reporting of PRO data should facilitate robust interpretation of the results from RCTs and inform patient care.
Article
In this article we provide an overview of the different study designs commonly utilized in carrying out clinical and public health research and of the points to consider in reviewing these study designs. The design and conduct of cross-sectional health surveys, case-control, prospective, and case-crossover observational studies, and randomized controlled trials, are discussed in this review article. It is hoped that careful attention to the concerns we have raised will lead to the design and conduct of high-quality research projects and their write-up.
Article
To identify and describe the validity of algorithms used to detect heart failure (HF) using administrative and claims data sources. A systematic review of PubMed and Iowa Drug Information Service searches of the English language was performed to identify studies published between 1990 and 2010 that evaluated the validity of algorithms for the identification of patients with HF using and claims data. Abstracts and articles were reviewed by two study investigators to determine their relevance on the basis of predetermined criteria. The initial search strategy identified 887 abstracts. Of these, 499 full articles were reviewed and 35 studies included data to evaluate the validity of identifying patients with HF. Positive predictive values (PPVs) were in the acceptable to high range, with most being very high (>90%). Studies that included patients with a primary hospital discharge diagnosis of International Classification of Diseases, Ninth Revision, code 428.X had the highest PPV and specificity for HF. PPVs for this algorithm ranged from 84% to 100%. This algorithm, however, may compromise sensitivity because many HF patients are managed on an outpatient basis. The most common 'gold standard' for the validation of HF was the Framingham Heart Study criteria. The algorithms and definitions used to identify HF using administrative and claims data perform well, particularly when using a primary hospital discharge diagnosis. Attention should be paid to whether patients who are managed on an outpatient basis are included in the study sample. Including outpatient codes in the described algorithms would increase the sensitivity for identifying new cases of HF.
Article
To perform a systematic review of the validity of algorithms for identifying cerebrovascular accidents (CVAs) or transient ischemic attacks (TIAs) using administrative and claims data. PubMed and Iowa Drug Information Service searches of the English language literature were performed to identify studies published between 1990 and 2010 that evaluated the validity of algorithms for identifying CVAs (ischemic and hemorrhagic strokes, intracranial hemorrhage, and subarachnoid hemorrhage) and/or TIAs in administrative data. Two study investigators independently reviewed the abstracts and articles to determine relevant studies according to pre-specified criteria. A total of 35 articles met the criteria for evaluation. Of these, 26 articles provided data to evaluate the validity of stroke, seven reported the validity of TIA, five reported the validity of intracranial bleeds (intracerebral hemorrhage and subarachnoid hemorrhage), and 10 studies reported the validity of algorithms to identify the composite endpoints of stroke/TIA or cerebrovascular disease. Positive predictive values (PPVs) varied depending on the specific outcomes and algorithms evaluated. Specific algorithms to evaluate the presence of stroke and intracranial bleeds were found to have high PPVs (80% or greater). Algorithms to evaluate TIAs in adult populations were generally found to have PPVs of 70% or greater. The algorithms and definitions to identify CVAs and TIAs using administrative and claims data differ greatly in the published literature. The choice of the algorithm employed should be determined by the stroke subtype of interest.
Article
The validity of findings from surveillance activities, which use administrative and claims data to link exposures to adverse events, depends in part on the validity of algorithms to identify health outcomes using these data. This review provides a high level overview of the findings of 19 systematic reviews of studies, which have examined the validity of algorithms to identify health outcomes using these data. The author categorized outcomes on the basis of the strength of evidence supporting valid algorithms to identify acute or incident events and suggested priorities for future validation studies. The 19 reviews were evaluated, and key findings and suggestions for future research were summarized by a single reviewer. Outcomes with algorithms that consistently identified acute events or incident conditions with positive predictive values of greater than 70% across multiple studies and populations are described as low priority for future algorithm validation studies. Algorithms to identify cerebrovascular accidents, transient ischemic attacks, congestive heart failure, deep vein thrombosis, pulmonary embolism, angioedema, and total hip arthroplasty revision performed well across multiple studies and are considered low priority for future validation studies. Other outcomes were generally thought to require additional validation studies or algorithm refinement to be confident in algorithms. Few studies examined the validity of International Classification of Diseases, 10th Revision, codes. Users of these reviews need to consider the generalizability of findings to their study populations. For some outcomes with poorly performing codes, it may always be necessary to validate cases.
Article
Cognitive decline in a sample of 64 elderly people was assessed by a standardised informant interview dealing with changes in memory and intelligence which had taken place in the previous 10 years. Scores from the interview were found to correlate (r = 0.74) with the Mini-Mental State Examination. Moreover, the informant interview was found to be less affected by pre-morbid ability than the MMSE. Direct assessment of decline by informants may be a solution to the problem of contamination by pre-morbid ability which affects traditional cognitive screening instruments.
Article
Research and surveillance activities sometimes require that proxy respondents provide key exposure or outcome information, especially for studies of people with disability (PWD). In this study, we compared the health-related quality of life (HRQoL) responses of index PWD to proxies. Subjects were selected from nursing home, other assisted living residences, and from several clinic samples of PWD. Each index identified one or more proxy respondents. Computer-assisted interviews used a random order of measures. Proxy reliability was measured by intraclass correlation (ICC) and kappa statistics. HRQoL measures tested included the surveillance questions of the Behavioral Risk Factor Surveillance System (BRFSS), basic and instrumental activities of daily living (ADLs and IADLs), medical outcomes study short-form 36 and 12 (SF-36 and SF-12). A total of 131 index-proxy sets were completed. In general, agreement and reliability of proxy responses to the PWD tended to be best for relatives, with friends lower, and health care proxies lowest. For example, the ICC for the physical functioning scale of the SF-36 was 0.68 for relatives, 0.51 for friends, and 0.40 for healthcare proxies. There was a tendency for proxies to overestimate impairment and underestimate HRQoL. This pattern was reversed for measures of pain, which proxies consistently underestimated. The pattern among instruments, proxy types, and HRQoL domains was complex, and individual measures vary from these general results. We suggest caution when using proxy respondents for HRQoL, especially those measuring more subjective domains.