PreprintPDF Available

Democratization of Mathematics: AI Math Agents assembling Knowledge Fragments

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract

AI Math Agents are introduced as a novel ontological emergence for the creation of the digital mathematical infrastructure, to explicitly extend human intelligence. Mathematics (the elaboration of mathematical structures in the world) is crucial but stuck in non-evaluable formats. Further scientific progress is not possible by advance in computing alone, a smart network transport layer of mathematics is needed as "truthier" content for audiences of human and computational entities alike. The language of science is math. The new idea is treating the entirety of thus-far discovered though possibly-arbitrarily transcribed mathematics as itself the data corpus that is digitized and AI-learned for insight. The Moore's Law of Mathematics tool-of-the-trade for data analysis and high-dimensionality management is embedding (containing one mathematical structure within another). Known as a first-pass method for characterizing unstructured data, embeddings are now emerging as a standard digital infrastructural element, mobilizing large data corpora as abstracted strings. Embeddings are to LLMs (large language models (GPT, PaLM)) as hashes are to blockchains, fixed-length units serving as digital reality access interfaces. One project has announced embeddings for all 4 million arXiv papers (title and abstract), with 600 mn tokens and 3.07 bn vector dimensions. Embeddings vary based on parameter selections for tokens (data-subset parsing), dimensionality reduction (UMAP, PCA, tSNE), and normalization. Embeddings produce abstraction strings that can be joined with other abstraction strings for analysis. The implication is that mathematical equations and data can be studied together as two different modes of one system. Since mathematics is already a contiguous corpus, a mathematical mode may call abstractedly-many levels of mathematics, for example, in set, category, and type theory, to accompany one or more data modes. An abstraction level of descriptive mathematics and data together in one more foundational yet higher-level view could help in problem resolution, particularly in complex domains such as multiscalar biosystems, for example applying AdS/CFT (anti-de Sitter space/conformal field theory) holographic physics math to Alzheimer's transposon dynamics math to gene variant, expression, and blood data. AI Math Agent results are produced as Math Graphs, graph-based visualizations of equation and data embeddings (as distinct from the usual "mathematical graph" of the graphed output of solved equation values). Math Graphs are a human-accessible view of a mathematical ecology (set of equations), to highlight its well-formedness (a heuristic for computational proof systems), and assess model-testing, fit with other mathematical ecologies, and data interactions. The AI Revolution entails building new forms of robust digital infrastructure at scale with standard data structuring tools used by LLMs (tokens, embeddings, vector dimensions). As any other language, mathematics can be instantiated in digital embeddings and produced in "Twitter-like" easy-deploy easy-consume tools. Data embeddings are routine, but the concept of the mathematical embedding is new as the embedded form of an equation. "Low-hanging fruit" mathematical ecologies for mathematical embedding are represented digitally (LaTeX, PDF, computational proof systems (e.g. 71 billion lemmas at current max)), but are not readily evaluable.
Page 1
Democratization of Mathematics: AI Math Agents assembling Knowledge Fragments
Melanie Swan and Renato P. dos Santos, 5/31/23, Accepted Conference Abstract SPCW
Intelligent actors have representations of reality: human agents through Kantian space-
time goggles, and AI reinforcement learning agents through iterative feedback-loops, action-
taking policies, and reward functions. Both are simply datapoints as each entity reaches out to
describe “the elephant” of the encountered world. A moment of intelligence speciation may be
underway as AI agents knit together and expand the fragments of knowledge in the science
canon to enter a new era of human-benefitting progress but also risk.
AI Math Agents are introduced as a novel ontological emergence for the creation of the
digital mathematical infrastructure, to explicitly extend human intelligence. Mathematics (the
elaboration of mathematical structures in the world) is crucial but stuck in non-evaluable
formats. Further scientific progress is not possible by advance in computing alone, a smart
network transport layer of mathematics is needed as “truthier” content for audiences of human
and computational entities alike.
The language of science is math. The new idea is treating the entirety of thus-far
discovered though possibly-arbitrarily transcribed mathematics as itself the data corpus that is
digitized and AI-learned for insight. The Moore’s Law of Mathematics tool-of-the-trade for data
analysis and high-dimensionality management is embedding (containing one mathematical
structure within another). Known as a first-pass method for characterizing unstructured data,
embeddings are now emerging as a standard digital infrastructural element, mobilizing large data
corpora as abstracted strings. Embeddings are to LLMs (large language models (GPT, PaLM)) as
hashes are to blockchains, fixed-length units serving as digital reality access interfaces. One
project has announced embeddings for all 4 million arXiv papers (title and abstract), with 600
mn tokens and 3.07 bn vector dimensions. Embeddings vary based on parameter selections for
tokens (data-subset parsing), dimensionality reduction (UMAP, PCA, tSNE), and normalization.
Embeddings produce abstraction strings that can be joined with other abstraction strings
for analysis. The implication is that mathematical equations and data can be studied together as
two different modes of one system. Since mathematics is already a contiguous corpus, a
mathematical mode may call abstractedly-many levels of mathematics, for example, in set,
category, and type theory, to accompany one or more data modes. An abstraction level of
descriptive mathematics and data together in one more foundational yet higher-level view could
help in problem resolution, particularly in complex domains such as multiscalar biosystems, for
example applying AdS/CFT (anti-de Sitter space/conformal field theory) holographic physics
math to Alzheimer’s transposon dynamics math to gene variant, expression, and blood data.
AI Math Agent results are produced as Math Graphs, graph-based visualizations of
equation and data embeddings (as distinct from the usual “mathematical graph” of the graphed
output of solved equation values). Math Graphs are a human-accessible view of a mathematical
ecology (set of equations), to highlight its well-formedness (a heuristic for computational proof
systems), and assess model-testing, fit with other mathematical ecologies, and data interactions.
The AI Revolution entails building new forms of robust digital infrastructure at scale
with standard data structuring tools used by LLMs (tokens, embeddings, vector dimensions). As
any other language, mathematics can be instantiated in digital embeddings and produced in
“Twitter-like” easy-deploy easy-consume tools. Data embeddings are routine, but the concept of
the mathematical embedding is new as the embedded form of an equation.
“Low-hanging fruit” mathematical ecologies for mathematical embedding are represented
digitally (LaTeX, PDF, computational proof systems (e.g. 71 billion lemmas at current max)),
Page 2
but they are not evaluable. These unmoored knowledge fragments could be merged into the
whole of a modern digital computational infrastructure and made solvable with AI Math Agents.
Strikingly, cognition joins other natural science puzzles in a philosophy-to-engineering
transition. Multiscalar dimensionality abstraction is the key as the full scope of neuronal activity
is unclear but unnecessary as the brain’s information processing is represented in neural
networks. Likewise in quantum computing, Copenhagen-interpretation debate gives way to
routine entanglement generation in satellite qubit transmission.
The AI Revolution could be as universally transformative to human enlightenment and
well-being as the Industrial Revolution. LLMs are becoming the go-to resource for all forms of
on-demand expanded or reduced content, and are implicated in LLM-LLM communication. The
audience for the new tools is AI first, humans second. Although humans are familiar with chat
prompt models, the bigger potential use of AI is for science. Math Agents offer a new level of
predictive capability to extend the observe-explain-test scientific method. An immediate use case
is the realization of precision health genomic medicine (DNA-RNA-protein), possibly hastened
by quantum computing. Math Agents can use mathematical embeddings as a holographic view-
finder to identify the right level of study and intervention in complex multiscalar biosystems and
their pathologies (e.g. Alzheimer’s disease is the only top-five killer without a cure).
The further impact is that AI Math Agents and related technologies may feature in
constructing a digitally integrated quantum-classical mathematical infrastructure. Math Agents
could allow newly discovered mathematics (e.g. in physics) to deploy quickly to other fields and
facilitate AI-QC (quantum computing) convergence. Embeddings are already used to model data
in quantum-mechanical systems (chemical reactions), paired with DMRG (density matrix
renormalization group) techniques to calculate lowest-energy wavefunctions. Embeddings can
help in normalization and also renormalization (the ability to view a multiscalar system at
different levels per a salient conserved quantity across system tiers (e.g. symmetry in the
universe, free energy in biological systems)). AdS/CFT holography and Chern-Simons topology
are related methods for bridging classical-quantum systems in describing a messy bulk volume
with a boundary field theory in one fewer dimensions and curve max-min as system events (e.g.
cancer, tau phosphorylation). Such high dimensionality management is routine for embeddings.
In the trajectory of open-source and open-access comes the idea of openly-usable as the
property needed to mobilize mathematics. Just as open datasets available on the internet allowed
human agents from various backgrounds to apply their expertise to solving problems (e.g.
Kaggle data competitions), open mathematics in usable formats could similarly inspire a new
wave of upleveled investigation. AI tools enhance human intelligence, including by possibly
democratizing mathematics and making it portable to a much wider range of situations.
Human-AI entities acting in partnership thus constitutes an intelligence speciation. AI
improves on the human-evolved Kantian goggles limited to perception in 3D space and 1D time
by providing access to a much higher-dimensional reality. New ideas of mathematical and
computational intelligence (the ability to learn and problem-solve systematically in formal
environments) join quantum, relativistic, and scale-free intelligence (platform-agnostic human-
machinic-hybrid capability to operate in quantum-classical-relativistic time-space domains) in
the social-technological objective of AI Alignment (compatibility with human values). Such
beyond-classical formulations could be relevant in promoting AI identity as a Hegel-inspired
self-knowing time series, an aligned form of AI intelligence to overcome slaughterbench-of-
history human power struggles and have a positive net impact on the world.
ResearchGate has not been able to resolve any citations for this publication.
ResearchGate has not been able to resolve any references for this publication.