Meera Viswanathan’s research while affiliated with University of North Carolina at Chapel Hill and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (154)


Figure 1: Outline of study design comparing two data extraction processes.
Figure 2: Concordance, accuracy, recall, and precision by data categories.
Performance metrics of the AI-assisted approach for each review and the SWAR overall
AI-Assisted Data Extraction with a Large Language Model: A Study Within Reviews
  • Preprint
  • File available

March 2025

·

46 Reads

·

Shannon Kugley

·

Karen Crotty

·

[...]

·

Background. Data extraction is a critical but error-prone and labor-intensive task in evidence synthesis. Unlike other artificial intelligence (AI) technologies, large language models (LLMs) do not require labeled training data for data extraction. Objective. To compare an AI-assisted to a traditional y data extraction process. Design. Study within reviews (SWAR) utilizing a prospective, parallel group comparison with blinded data adjudicators. Setting. Workflow validation within six ongoing systematic reviews of interventions under real-world conditions. Intervention. Initial data extraction using an LLM (Claude versions 2.1, 3.0 Opus, and 3.5 Sonnet) verified by a human reviewer. Measurements: Concordance, time on task, accuracy, recall, precision, and error analysis. Results. The six systematic reviews of the SWAR contributed 9,341 data elements, extracted from 63 studies. Concordance between the two methods was 77.2%. The accuracy of the AI-assisted approach compared with enhanced human data extraction was 91.0%, with a recall of 89.4% and a precision of 98.9%. The AI-assisted approach had fewer incorrect extractions (9.0% vs. 11.0%) and similar risks of major errors (2.5% vs. 2.7%) compared to the traditional human-only method, with a median time saving of 41 minutes per study. Missed data items were the most frequent errors in both approaches. Limitations. Assessing the concordance of data extractions and classifying errors required subjective judgment. Tracking time on task consistently was challenging. Conclusion. The use of an LLM can improve accuracy of data extraction and save time in evidence synthesis. Results reinforce previous findings that human-only data extraction is prone to errors. Primary Funding Source: US Agency for Healthcare Research and Quality, RTI International Registration: SWAR28 Gerald Gartlehner (2023 FEB 11 2102).pdf

Download

Screening for Osteoporosis to Prevent Fractures: A Systematic Evidence Review for the US Preventive Services Task Force

January 2025

·

18 Reads

·

4 Citations

JAMA The Journal of the American Medical Association

Importance Fragility fractures result in significant morbidity. Objective To review evidence on osteoporosis screening to inform the US Preventive Services Task Force. Data Sources PubMed, Embase, Cochrane Library, and trial registries through January 9, 2024; references, experts, and literature surveillance through July 31, 2024. Study Selection Randomized clinical trials (RCTs) and systematic reviews of screening; pharmacotherapy studies for primary osteoporosis; predictive and diagnostic accuracy studies. Data Extraction and Synthesis Two reviewers assessed titles/abstracts, full-text articles, study quality, and extracted data; when at least 2 similar studies were available, meta-analyses were conducted. Main Outcomes and Measures Hip, clinical vertebral, major osteoporotic, and total fractures; mortality; harms; accuracy. Results Three RCTs and 3 systematic reviews reported benefits of screening in older, higher-risk women. Two RCTs used 2-stage screening: Fracture Risk Assessment Tool estimate with bone mineral density (BMD) testing if risk threshold exceeded. One RCT used BMD plus additional tests. Screening was associated with reduced hip (pooled relative risk [RR], 0.83 [95% CI, 0.73-0.93]; 3 RCTs; 42 009 participants) and major osteoporotic fracture (pooled RR, 0.94 [95% CI, 0.88-0.99]; 3 RCTs; 42 009 participants) compared with usual care. Corresponding absolute risk differences were 5 to 6 fewer fractures per 1000 participants screened. The discriminative accuracy of risk assessment instruments to predict fracture or identify osteoporosis varied by instrument and fracture type; most had an area under the curve between 0.60 and 0.80 to predict major osteoporotic fracture, hip fracture, or both. Calibration outcomes were limited. Compared with placebo, bisphosphonates (pooled RR, 0.67 [95% CI, 0.45-1.00]; 6 RCTs; 12 055 participants) and denosumab (RR, 0.60 [95% CI, 0.37-0.97] from the largest RCT [7808 participants]) were associated with reduced hip fractures. Compared with placebo, no statistically significant associations were observed for adverse events. Conclusions and Relevance Screening in higher-risk women 65 years or older was associated with a small absolute risk reduction in hip and major fractures compared with usual care. No evidence evaluated screening with BMD alone or screening in men or younger women. Risk assessment instruments, BMD alone, or both have poor to modest discrimination for predicting fracture. Osteoporosis treatment with bisphosphonates or denosumab over several years was associated with fracture reductions and no meaningful increase in adverse events.


Summary of Evidence for Graded Outcomes
(continued)
Audio-Based Care for Managing Mental Health and Substance Use Disorders in Adults: A Systematic Review

Medical Care

Background Telehealth services can increase access to care by reducing barriers. Telephone-administered care, in particular, requires few resources and may be preferred by communities in areas that are systemically underserved. Understanding the effectiveness of audio-based care is important to combat the current mental health crisis and inform discussions related to reimbursement privileges. Objectives We compared the effectiveness of audio-based care to usual care for managing mental health and substance use disorders (MHSUD). Design We used systematic review methods to synthesize available evidence. Studies We searched for English-language articles reporting randomized controlled trials (RCTs) of adults diagnosed with MHSUD published since 2012. Outcomes We abstracted data on clinical outcomes, patient-reported health and quality of life, health care access and utilization, care quality and experience, and patient safety. Results We included 31 RCTs of participants diagnosed with depression, post-traumatic stress disorder (PTSD), other serious mental illness (SMI), anxiety, insomnia, or substance use disorder (SUD). Most of the evidence was for interventions targeting depression, PTSD, and SUD. The evidence demonstrates promise for: (1) replacing in-person care with audio care for depression, other SMI, and SUD (very low to moderate certainty of comparable effectiveness); and (2) adding audio care to monitor or treat depression, PTSD, anxiety, insomnia, and SUD (low to moderate certainty of evidence favoring audio care for clinical outcomes). Conclusions MHSUD can be managed with audio care in certain situations. However, more evidence is needed across conditions, and specifically for anxiety and other conditions for which no research was identified.



Response to Commentary by

November 2024

·

15 Reads

Journal of Consulting and Clinical Psychology

Replies to comments made by Mattos et al. (see record 2025-49982-003) on the original article (see record 2024-19816-001). Mattos et al. critiqued our assessments of the certainty of evidence as being overly permissive and not adhering to the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) Working Group's guidelines. GRADE has become an international standard to describe the level of confidence that investigators have in estimates of effects. Like the risk of bias evaluations, determining the certainty of evidence involves subjective judgment. The true value of GRADE is not in yielding a definitive evidence certainty rating but in its emphasis on transparency. While we acknowledge and respect the differing viewpoints of Mattos et al. regarding our ratings, we caution against the rigid and formulaic use of the GRADE methodology. (PsycInfo Database Record (c) 2024 APA, all rights reserved).







Citations (71)


... Compared with model validation studies that examine the performance and reliability of an LLM under specific, controlled conditions using already existing datasets, workflow validation studies integrate the use of the LLM into the workflow of an ongoing review, providing a detailed perspective on the model's practical effectiveness, efficiency, and utility. [21] The prospective design protects against data contamination, which can arise when the dataset used to evaluate performance also contributed to LLM training, potentially inflating its capabilities. [22] Given that developers of LLMs rarely disclose their training datasets, the prospective design of our study utilizing ongoing, unpublished reviews provides a rigorous safeguard against this risk. ...

Reference:

AI-Assisted Data Extraction with a Large Language Model: A Study Within Reviews
From promise to practice: challenges and pitfalls in the evaluation of large language models for data extraction in evidence synthesis

BMJ evidence-based medicine

... Data included survey results, claims, clinic notes, and semi-structured interviews. The significant heterogeneity in methodological and measure approaches across these studies is consistent with findings from a previous review of interventions to address social determinants, which found that most social care interventions were tested via cross-sectional or quasi-experimental designs, and findings were often incomparable across studies [19]. This stems partly from the fact that many social care interventions address only one or a limited number of HRSN [20] although many individuals face a wide range of health and social needs [21]. ...

Evaluating Intensity, Complexity, and Potential for Causal Inference in Social Needs Interventions: A Review of a Scoping Review
  • Citing Article
  • June 2024

JAMA Network Open

... Initial evaluations of LLMs for data extraction demonstrated variable accuracy, ranging from 72% to 100%, compared with human reference standards. [7][8][9][10][11][12] However, their reliance on controlled experimental conditions and the use of pre-existing review datasets as benchmarks limit their generalizability to realworld applications. Furthermore, these studies assessed fully automated approaches without human involvement and evaluated the LLMs outside of the actual workflow of an evidence synthesis. ...

Performance of two large language models for data extraction in evidence synthesis
  • Citing Article
  • June 2024

Research Synthesis Methods

... Persaud's central message is clear: we must move beyond superficial "checkbox" equity and adopt systematic, equity-focused strategies throughout the guideline development process. 2 These commentaries echo the Cochrane Health Equity Thematic Group's call to action, which urges for the integration of equity considerations across all stages of research and suggests approaches to do so. 3 To support this, Dewidar, Darzi, and colleagues, offer a clear roadmap for developers, policymakers, and clinicians by outlining seven guiding principles for embedding health equity throughout the guideline enterprise. 4 These include defining equity in context, planning apriori for its integration at every stage, allocating adequate resources, and involving relevant interest-holders including individuals with lived experience. ...

Equity in evidence synthesis: You can't play on broken strings

... The MuSE Consortium has now adopted the term "interest-holders" in its publications on engagement in guidelines and in evidence syntheses in the health field. Other groups have also adopted the use of this term [24,25]. We hope that "interest-holders" will reduce confusion related to the multitude of terms used and convey the intended meaning without any negative connotations. ...

Centering Racial Health Equity in Systematic Reviews Paper 6: Engaging racially and ethnically diverse stakeholders in evidence syntheses

SSRN Electronic Journal

... The quality of studies was assessed by two authors (AAK and FK) independently, utilizing the Newcastle-Ottawa scale for cross-sectional and cohort studies (Supplementary Table 3) [30]. Additionally, the Risk of Bias in Non-Randomized Studies of Environmental Exposures (ROBINS-E) tool was used to assess the risk of bias and the quality of the included cohort studies (Supplementary Table 4) [31]. Disagreements were resolved with the assistance of FH. ...

A tool to assess risk of bias in non-randomized follow-up studies of exposure effects (ROBINS-E)

Environment International

... These prevention programs also emphasize emotional regulation, particularly through the promotion of coping strategies when dealing with stress or frustration in front of infants' persistent crying [16]. The effectiveness of these strategies has been evaluated in several controlled trials, and although some individual studies have reported reductions in AHT incidence [18][19][20], meta-analyses and the US Preventive Services Task Force have concluded no significant summed preventive effects [16,[21][22][23][24]. This discrepancy may be due to differences in the study design used to measure the impact of the intervention and the level of proof used to define effectiveness [24]. ...

Primary Care Interventions to Prevent Child Maltreatment: Evidence Report and Systematic Review for the US Preventive Services Task Force

JAMA The Journal of the American Medical Association

... For example, Reichenpfader et al. [47] show how LLMs are being used to extract clinical data from radiology reports, solving the challenge of dealing with free-text documentation. These models are also being used to pull information from clinical trial reports, significantly speeding up the data extraction phase of meta-analyses [48]. Another major advantage of LLMs is their ability to continuously update evidence as new studies are published [42]. ...

Data extraction for evidence synthesis using a large language model: A proof‐of‐concept study
  • Citing Article
  • March 2024

Research Synthesis Methods

... In contrast, Tang et al. found that general-purpose LLM like ChatGPT (Open AI, San Francisco, CA) can summarize text but can generate inconsistent and overly convincing summaries [12]. In contrast, Ovelman et al. reported that an LLM -Claude 2 can generate PLS with accuracy and with minor errors [13]. As the field of LLM is expanding and there are several freely accessible chatbots, it is necessary to explore their comparative capability in generating PLS. ...

The use of a large language model to create plain language summaries of evidence reviews in healthcare: A feasibility study

... However, some of the early iterations of PMI have been ineffective [6][7][8][9]. In response to the limitations of PMI and a growing interest in reducing health disparities, other groups such as researchers, health systems, and patient advocacy organizations have begun to call for and develop tools to fill these gaps [10][11][12]. ...

Advancing health equity through social care interventions

Health Services Research