Paul Watters’s research while affiliated with Simplot Australia Pty. Ltd. and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (269)


Number of search results on Google Scholar with different keywords by year (as collected in January 2025). (The legend entries correspond to the keywords used in the search query, which was constructed as “(AI OR artificial OR (machine learning) OR (neural network) OR computer OR software) AND ([specific keyword])”).
Annual number of preprints posted under the cs.AI category on arXiv.org.
Timeline of key developments in language model evolution.
Conceptual diagram of MoE innovation.
Conceptual diagram of the revised Q* capabilities.

+3

From Google Gemini to OpenAI Q* (Q-Star): A Survey on Reshaping the Generative Artificial Intelligence (AI) Research Landscape
  • Article
  • Full-text available

January 2025

·

127 Reads

·

66 Citations

·

Teo Susnjak

·

Tong Liu

·

[...]

·

This comprehensive survey explored the evolving landscape of generative Artificial Intelligence (AI), with a specific focus on the recent technological breakthroughs and the gathering advancements toward possible Artificial General Intelligence (AGI). It critically examined the current state and future trajectory of generative AI, exploring how innovations in developing actionable and multimodal AI agents with the ability scale their “thinking” in solving complex reasoning tasks are reshaping research priorities and applications across various domains, while the survey also offers an impact analysis on the generative AI research taxonomy. This work has assessed the computational challenges, scalability, and real-world implications of these technologies while highlighting their potential in driving significant progress in fields like healthcare, finance, and education. Our study also addressed the emerging academic challenges posed by the proliferation of both AI-themed and AI-generated preprints, examining their impact on the peer-review process and scholarly communication. The study highlighted the importance of incorporating ethical and human-centric methods in AI development, ensuring alignment with societal norms and welfare, and outlined a strategy for future AI research that focuses on a balanced and conscientious use of generative AI as its capabilities continue to scale.

Download

Inadequacies of Large Language Model Benchmarks in the Era of Generative Artificial Intelligence

January 2025

·

2 Reads

·

25 Citations

IEEE Transactions on Artificial Intelligence

The rapid rise in popularity of Large Language Models (LLMs) with emerging capabilities has spurred public curiosity to evaluate and compare different LLMs, leading many researchers to propose their own LLM benchmarks. Noticing preliminary inadequacies in those benchmarks, we embarked on a study to critically assess 23 state-of-the-art LLM benchmarks, using our novel unified evaluation framework through the lenses of people, process, and technology, under the pillars of benchmark functionality and integrity. Our research uncovered significant limitations, including biases, difficulties in measuring genuine reasoning, adaptability, implementation inconsistencies, prompt engineering complexity, evaluator diversity, and the overlooking of cultural and ideological norms in one comprehensive assessment. Our discussions emphasized the urgent need for standardized methodologies, regulatory certainties, and ethical guidelines in light of Artificial Intelligence (AI) advancements, including advocating for an evolution from static benchmarks to dynamic behavioral profiling to accurately capture LLMs’ complex behaviors and potential risks. Our study highlighted the necessity for a paradigm shift in LLM evaluation methodologies, underlining the importance of collaborative efforts for the development of universally accepted benchmarks and the enhancement of AI systems’ integration into society.





Ransomware Reloaded: Re-examining Its Trend, Research and Mitigation in the Era of Data Exfiltration

August 2024

·

220 Reads

·

68 Citations

ACM Computing Surveys

Ransomware has grown to be a dominant cybersecurity threat, by exiltrating, encrypting or destroying valuable user data, and causing numerous disruptions to victims. The severity of the ransomware endemic has generated research interest from both the academia and the industry. However, many studies held stereotypical assumptions about ransomware, used unveriied, outdated and limited self-collected ransomware samples, and did not consider government strategies, industry guidelines or cyber intelligence. We observed that ransomware no longer exists simply as an executable ile or limits to encrypting iles (data loss); data exiltration (data breach) is the new norm, espionage is an emerging theme, and the industry is shifting focus from technical advancements to cyber governance and resilience. We created a ransomware innovation adoption curve, critically evaluated 212 academic studies published during 2020 and 2023, and cross-veriied them against various government strategies, industry reports and cyber intelligence on ransomware. We concluded that many studies were becoming irrelevant to the contemporary ransomware reality, and called for the redirection of ransomware research to align with the continuous ransomware evolution in the industry. We proposed to address data exiltration as priority over data encryption, to consider ransomware in a business-practical manner, and recommended research collaboration with the industry.



The effect of therapeutic and deterrent messages on Internet users attempting to access 'barely legal' pornography

August 2024

·

38 Reads

·

1 Citation

Child Abuse & Neglect

Online child sexual abuse material (CSAM) is a growing problem. Prevention charities, such as Stop It Now! UK, use online messaging to dissuade users from viewing CSAM and to encourage them to consider anonymous therapeutic interventions. This experiment used a honeypot website that purported to contain barely legal pornography, which we treated as a proxy for CSAM. We examined whether warnings would dissuade males (18-30 years) from visiting the website. Participants (n = 474) who attempted to access the site were randomly allocated to one of four conditions. The control group went straight to the landing page (control; n = 100). The experimental groups encountered different warning messages: deterrence-themed with an image (D3; n = 117); therapeutic-themed (T1; n = 120); and therapeutic-themed with an image (T3; n = 137). We measured the click through to the site. Three quarters of the control group attempted to enter the pornography site, compared with 35 % to 47 % of the experimental groups. All messages were effective: D3 (odds ratio [OR] = 5.02), T1 (OR = 4.06) and T2 (OR = 3.05). Images did not enhance warning effectiveness. We argue that therapeutic and deterrent warnings are useful for CSAM-prevention.


Figure 1: Aggregate performances across all simulations, indicating the criticality point to which all simulations have been aligned and an predetermined criticality threshold.
Figure 2: Variance of all performances across all simulations, indicating the criticality point to which all simulations have been aligned and a predetermined criticality threshold.
Figure 3: Variance of all performances across all simulations, indicating the criticality point to which all simulations have been aligned and a predetermined criticality threshold.
Figure 4: Histograms depicting the distribution of time points at which criticality was detected on the test dataset with the shaded area indicating the beginning of the actual criticality and the end of the detection window period denoting correct classifications.
Over the Edge of Chaos? Excess Complexity as a Roadblock to Artificial General Intelligence

July 2024

·

74 Reads

In this study, we explored the progression trajectories of artificial intelligence (AI) systems through the lens of complexity theory. We challenged the conventional linear and exponential projections of AI advancement toward Artificial General Intelligence (AGI) underpinned by transformer-based architectures, and posited the existence of critical points, akin to phase transitions in complex systems, where AI performance might plateau or regress into instability upon exceeding a critical complexity threshold. We employed agent-based modelling (ABM) to simulate hypothetical scenarios of AI systems' evolution under specific assumptions, using benchmark performance as a proxy for capability and complexity. Our simulations demonstrated how increasing the complexity of the AI system could exceed an upper criticality threshold, leading to unpredictable performance behaviours. Additionally, we developed a practical methodology for detecting these critical thresholds using simulation data and stochastic gradient descent to fine-tune detection thresholds. This research offers a novel perspective on AI advancement that has a particular relevance to Large Language Models (LLMs), emphasising the need for a tempered approach to extrapolating AI's growth potential and underscoring the importance of developing more robust and comprehensive AI performance benchmarks.


From COBIT to ISO 42001: Evaluating cybersecurity frameworks for opportunities, risks, and regulatory compliance in commercializing large language models

June 2024

·

270 Reads

·

42 Citations

Computers & Security

This study investigated the integration readiness of four predominant cybersecurity Governance, Risk and Compliance (GRC) frameworks – NIST CSF 2.0, COBIT 2019, ISO 27001:2022, and the latest ISO 42001:2023 – for the opportunities, risks, and regulatory compliance when adopting Large Language Models (LLMs), using qualitative content analysis and expert validation. Our analysis, with both LLMs and human experts in the loop, uncovered potential for LLM integration together with inadequacies in LLM risk oversight of those frameworks. Comparative gap analysis has highlighted that the new ISO 42001:2023, specifically designed for Artificial Intelligence (AI) management systems, provided most comprehensive facilitation for LLM opportunities, whereas COBIT 2019 aligned most closely with the European Union AI Act. Nonetheless, our findings suggested that all evaluated frameworks would benefit from enhancements to more effectively and more comprehensively address the multifaceted risks associated with LLMs, indicating a critical and time-sensitive need for their continuous evolution. We propose integrating human-expert-in-the-loop validation processes as crucial for enhancing cybersecurity frameworks to support secure and compliant LLM integration, and discuss implications for the continuous evolution of cybersecurity GRC frameworks to support the secure integration of LLMs.


Citations (70)


... Approaches to contextual conditioning have focused on augmenting the initial input prompt, constructing complex chain-of-thought templates, or leveraging structured metadata to shape the trajectory of LLM output [20]- [22]. Prompt engineering practices have demonstrated measurable influences on the coherence, accuracy, and specificity of LLM-generated responses through careful design of initial contextual cues [23]. Hierarchical prompt schemas have provided means to guide multi-stage reasoning in LLMs, allowing context-dependent task decomposition and solution synthesis [24], [25]. ...

Reference:

Stochastic Prompt Scaffolded Contextual Self-Regulation in Large Language Models: Technical Conceptions and Empirical Studies
Inadequacies of Large Language Model Benchmarks in the Era of Generative Artificial Intelligence
  • Citing Article
  • January 2025

IEEE Transactions on Artificial Intelligence

... Autoregressive generation in LLMs has exhibited properties such as self-correction, paraphrasing, and semantic interpolation, with the capacity to construct novel and coherent sequences from incomplete or ambiguous prompts [44]. Scaling experiments have uncovered transitions in model behavior, where qualitative leaps in reasoning, synthesis, and abstraction arise as a function of model size, data diversity, and training duration [45], [46]. The capacity for few-shot learning, enabled through limited contextual exemplification, has allowed LLMs to infer novel tasks and instructions through exposure to minimal cues within the context window [47]. ...

From Google Gemini to OpenAI Q* (Q-Star): A Survey on Reshaping the Generative Artificial Intelligence (AI) Research Landscape

... The effectiveness of such hybrid arrangements is increasingly documented. The concept of "centaur" or human-AI teaming is being actively explored in high-stakes domains like cybersecurity (Sarker et al., 2023) and military operations (Konaev et al., 2021). A recent systematic review and meta-analysis confirmed that combinations of humans and AI can be significantly more effective than either working alone, particularly for tasks involving deliberation and insight (Vaccaro et al., 2024). ...

AI Potentiality and Awareness: A Position Paper from the Perspective of Human-AI Teaming in Cybersecurity
  • Citing Chapter
  • December 2024

... women = 22.22%). Sexual attraction to minors observed among all of our participants, and more particularly children, may contribute to blurring the boundaries for other sexual interests that are also taboo (e.g., zoophilia, necrophilia, uro/scatophilia) (Gane et al., 2024;Yakeley & Wood, 2014). It would be interesting to see if this relationship was bidirectional, for example, that individuals interested in content depicting zoophilia or other taboo subjects are more likely to report fantasies about children. ...

Blurring the lines: the vague boundary between mainstream and deviant internet pornography tags for at-risk viewers
  • Citing Article
  • October 2024

Journal of Sexual Aggression

... Meskipun chatbot memiliki beberapa keterbatasan, mereka dapat secara signifikan melengkapi layanan kesehatan mental tradisional dengan memperluas dukungan kepada lebih banyak individu. Sangat penting bagi pengembang chatbot untuk memprioritaskan privasi dan keamanan data pengguna [2]. ...

Adolescent-Centric Design of an Online Safety Chatbot
  • Citing Article
  • September 2024

Journal of Computer Information Systems

... The interaction between user-provided context, learned priors, and the statistical properties of the pretraining corpus has shaped the boundaries of effective context management in LLMs [36]. Prompt injection attacks and adversarial context manipulation have revealed vulnerabilities in context handling, motivating research into safeguards and robust prompt design [37], [38]. Theoretical explorations of context mixing, information bottlenecks, and latent space conditioning have provided insight into the underlying mechanisms driving contextual sensitivity in LLMs [39], [40]. ...

Ransomware Reloaded: Re-examining Its Trend, Research and Mitigation in the Era of Data Exfiltration

ACM Computing Surveys

... ten ist die Studie allerdings nur bedingt interpretierbar, und es ist unklar, ob der beobachtete Effekt tatsächlich auf die Warnhinweise zurückzuführen ist. Prichard et al. (2024) greifen diese Problematik in ihrem kürzlich in Child Sexual Abuse and Neglect veröffentlichten Artikel "The effect of therapeutic and deterrent messages on Internet users attempting to access ,barely legal' pornography" auf. Übergeordnete Ziele ihrer Studie sind die Reduktion der Anzahl der Personen, die solche Inhalte aufrufen, sowie die Förderung therapeutischer Interventionen. ...

The effect of therapeutic and deterrent messages on Internet users attempting to access 'barely legal' pornography
  • Citing Article
  • August 2024

Child Abuse & Neglect

... This digital platform provides tools and resources for conducting data analysis and communicating with clients in the field of Medical Technology. McIntosh, T. R., et al. [18] have discussed Evaluating cybersecurity frameworks in commercializing large language models. This involves assessing the potential opportunities, risks, and regulatory compliance implications. ...

From COBIT to ISO 42001: Evaluating cybersecurity frameworks for opportunities, risks, and regulatory compliance in commercializing large language models

Computers & Security

... Studies of failure cases, such as hallucinations or incoherent completions, have informed the boundaries and failure modes of emergent properties in LLMs [50]. Self-consistency protocols and ensemble generation strategies have been examined to quantify and leverage the diversity of responses arising from stochastic sampling in LLMs [51], [52]. Unsupervised adaptation to novel domains or tasks has been observed, with LLMs exploiting latent structures in contextual cues to align their output distributions with new requirements [53], [54]. ...

A Reasoning and Value Alignment Test to Assess Advanced GPT Reasoning

The ACM Transactions on Interactive Intelligent Systems

... Investigations of parameter sharing across layers and conditional computation have aimed to strike a balance between computational efficiency and the flexibility required to capture multifaceted linguistic phenomena [12]. Residual connections and layer normalization have played essential roles in maintaining gradient flow and enabling deep stacking of transformer blocks, supporting effective training of very large LLMs [13], [14]. Inference acceleration has been achieved through architectural optimizations that reduce latency and memory requirements, facilitating deployment in both research and production settings [15], [16]. ...

A Game-Theoretic Approach to Containing Artificial General Intelligence: Insights From Highly Autonomous Aggressive Malware

IEEE Transactions on Artificial Intelligence