Christian Szegedy’s research while affiliated with Mountain View College and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (39)


Figure A.3. Example intermediate propositions highlighted in red. Note: not all propositions were highlighted.
Figure A.4 contains a multi-step proof of the irrationality of √ 2 written in Isabelle. The proof contains multiple usages of tactics that require premises.
Relation between the training data and the proof rate discussed in Section 5.3 and Figure 4. DATASET FRACTION PRE-TRAINED PROOF RATE (%)
Proof rate on PISA for different models discussed in Section 5.4 and Figure 5. We vary the number of layers L and the embedding dimension D of the Transformer model. TRANSFORMER (L, D) #PARAMETERS PRE-TRAINED PROOF RATE (%)
Magnushammer: A Transformer-Based Approach to Premise Selection
  • Preprint
  • File available

March 2023

·

37 Reads

·

3 Citations

Maciej Mikuła

·

Szymon Antoniak

·

Szymon Tworkowski

·

[...]

·

Premise selection is a fundamental problem of automated theorem proving. Previous works often use intricate symbolic methods, rely on domain knowledge, and require significant engineering effort to solve this task. In this work, we show that Magnushammer, a neural transformer-based approach, can outperform traditional symbolic systems by a large margin. Tested on the PISA benchmark, Magnushammer achieves 59.5%59.5\% proof rate compared to a 38.3%38.3\% proof rate of Sledgehammer, the most mature and popular symbolic-based solver. Furthermore, by combining Magnushammer with a neural formal prover based on a language model, we significantly improve the previous state-of-the-art proof rate from 57.0%57.0\% to 71.0%71.0\%.

Download

Figure 3: Autoformalizations from natural language to Isabelle code. Left: Case study 2 -perfect formalization by PaLM. Right: Case study 3 -incorrect formalization by Codex.
Figure 5: Two perfect translations from Isabelle code to natural language by Codex.
Failure case study of 150 problems formalized by Codex.
Autoformalization with Large Language Models

May 2022

·

161 Reads

·

7 Citations

Autoformalization is the process of automatically translating from natural language mathematics to formal specifications and proofs. A successful autoformalization system could advance the fields of formal verification, program synthesis, and artificial intelligence. While the long-term goal of autoformalization seemed elusive for a long time, we show large language models provide new prospects towards this goal. We make the surprising observation that LLMs can correctly translate a significant portion (25.3%25.3\%) of mathematical competition problems perfectly to formal specifications in Isabelle/HOL. We demonstrate the usefulness of this process by improving a previously introduced neural theorem prover via training on these autoformalized theorems. Our methodology results in a new state-of-the-art result on the MiniF2F theorem proving benchmark, improving the proof rate from 29.6%29.6\% to 35.2%35.2\%.


Figure 6: Finetuning a 1B vanilla Transformer model to use external memory of size 65K.
Number of neighbors.
Memorizing Transformers

March 2022

·

211 Reads

Language models typically need to be trained or finetuned in order to acquire new knowledge, which involves updating their weights. We instead envision language models that can simply read and memorize new data at inference time, thus acquiring new knowledge immediately. In this work, we extend language models with the ability to memorize the internal representations of past inputs. We demonstrate that an approximate kNN lookup into a non-differentiable memory of recent (key, value) pairs improves language modeling across various benchmarks and tasks, including generic webtext (C4), math papers (arXiv), books (PG-19), code (Github), as well as formal theorems (Isabelle). We show that the performance steadily improves when we increase the size of memory up to 262K tokens. On benchmarks including code and mathematics, we find that the model is capable of making use of newly defined functions and theorems during test time.



Hierarchical Transformers Are More Efficient Language Models

October 2021

·

39 Reads

Transformer models yield impressive results on many NLP and sequence modeling tasks. Remarkably, Transformers can handle long sequences which allows them to produce long coherent outputs: full paragraphs produced by GPT-3 or well-structured images produced by DALL-E. These large language models are impressive but also very inefficient and costly, which limits their applications and accessibility. We postulate that having an explicit hierarchical architecture is the key to Transformers that efficiently handle long sequences. To verify this claim, we first study different ways to downsample and upsample activations in Transformers so as to make them hierarchical. We use the best performing upsampling and downsampling layers to create Hourglass - a hierarchical Transformer language model. Hourglass improves upon the Transformer baseline given the same amount of computation and can yield the same results as Transformers more efficiently. In particular, Hourglass sets new state-of-the-art for Transformer models on the ImageNet32 generation task and improves language modeling efficiency on the widely studied enwik8 benchmark.


Towards the Automatic Mathematician

July 2021

·

89 Reads

·

12 Citations

Lecture Notes in Computer Science

Over the recent years deep learning has found successful applications in mathematical reasoning. Today, we can predict fine-grained proof steps, relevant premises, and even useful conjectures using neural networks. This extended abstract summarizes recent developments of machine learning in mathematical reasoning and the vision of the N2Formal group at Google Research to create an automatic mathematician. The second part discusses the key challenges on the road ahead.


LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning

January 2021

·

164 Reads

·

1 Citation

While designing inductive bias in neural architectures has been widely studied, we hypothesize that transformer networks are flexible enough to learn inductive bias from suitable generic tasks. Here, we replace architecture engineering by encoding inductive bias in the form of datasets. Inspired by Peirce's view that deduction, induction, and abduction form an irreducible set of reasoning primitives, we design three synthetic tasks that are intended to require the model to have these three abilities. We specifically design these synthetic tasks in a way that they are devoid of mathematical knowledge to ensure that only the fundamental reasoning biases can be learned from these tasks. This defines a new pre-training methodology called "LIME" (Learning Inductive bias for Mathematical rEasoning). Models trained with LIME significantly outperform vanilla transformers on three very different large mathematical reasoning benchmarks. Unlike dominating the computation cost as traditional pre-training approaches, LIME requires only a small fraction of the computation cost of the typical downstream task.


A Promising Path Towards Autoformalization and General Artificial Intelligence

July 2020

·

368 Reads

·

33 Citations

Lecture Notes in Computer Science

An autoformalization system is an AI that learns to read natural language content and to turn it into an abstract, machine verifiable formalization, ideally by bootstrapping from unlabeled training data with minimum human interaction. This is a difficult task in general, one that would require strong automated reasoning and automated natural language processing capabilities. In this paper, it is argued that autoformalization is a promising path for systems to learn sophisticated, general purpose reasoning in all domains of mathematics and computer science. This could have far reaching implications not just for mathematical research, but also for software synthesis. Here I provide the outline for a realistic path towards those goals and give a survey of recent results that support the feasibility of this direction.


Text Embeddings for Retrieval from a Large Knowledge Base

June 2020

·

34 Reads

·

8 Citations

Lecture Notes in Business Information Processing

Text embedding representing natural language documents in a semantic vector space can be used for document retrieval using nearest neighbor lookup. In order to study the feasibility of neural models specialized for retrieval in a semantically meaningful way, we suggest the use of the Stanford Question Answering Dataset (SQuAD) in an open-domain question answering context, where the first task is to find paragraphs useful for answering a given question. First, we compare the quality of various text-embedding methods on the performance of retrieval and give an extensive empirical comparison on the performance of various non-augmented base embedding with, and without IDF weighting. Our main results are that by training deep residual neural models, specifically for retrieval purposes, can yield significant gains when it is used to augment existing embeddings. We also establish that deeper models are superior to this task. The best base baseline embeddings augmented by our learned neural approach improves the top-1 paragraph recall of the system by 14%14\%.


Language Modeling for Formal Mathematics

June 2020

·

66 Reads

We examine whether language modeling applied to mathematical formulas enables logical reasoning. We suggest several logical reasoning tasks that can be used to evaluate language models trained on formal mathematical statements, such as type inference, suggesting missing assumptions and completing equalities. To train language models for formal mathematics, we propose a novel skip-tree task, which outperforms standard language modeling tasks on our reasoning benchmarks. We also analyze the models' ability to formulate new conjectures by measuring how often the predictions that do not fit the ground truth or any training data turn out to be true and useful statements.


Citations (31)


... The first step involved using CNN models, which were trained solely on raw ECG signals. We implemented CNN models using various architectures, including Le-Net, 33 AlexNet, 34 35 ResNet50, 36 ResNet, 37 Inception, 38 FCN, 39 Encoder, Separable CNN, GRU 40 and Transformers. 41 Transformers, initially designed for natural language processing, have recently been deployed for ECG classification. ...

Reference:

Integrating deep learning with ECG, heart rate variability and demographic data for improved detection of atrial fibrillation
Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning
  • Citing Preprint
  • February 2016

... To address this trade-off, we propose an interleaving autoregressive mesh generation framework that combines the efficiency of linear attention [33] with the expressive power of full attention mechanisms [39]. By further integrating our interleaving approach into an hourglass architecture [18,27], we achieve even greater resource efficiency. The hourglass structure enables multi-scale processing through systematic downsampling and upsampling operations, effectively capturing information at various representational scales critical for coherent mesh generation. ...

Hierarchical Transformers Are More Efficient Language Models
  • Citing Conference Paper
  • January 2022

... At first, the experiences reported in Avigad et al. [1] with using large language models for autoformalization appear to be discouraging for this plan: Only about 11 percent of the natural language inputs were formalized correctly ([1], p. 3). Much better results were reported in [12], where more than 25 percent of the natural language inputs (which were problems for math competitions) were translated correctly into Isabelle (ibid., p. 1). Still, for reliably checking even a simple natural language argument consisting of typically more than 10 sentences with sufficient reliability to be of didactical use to beginner's students, anything considerably below a hundred percent is not good enough. ...

Autoformalization with Large Language Models

... If the military tends to sponsor work on computer-assisted proofs under the rubric of "cybersecurity", those firms have branded such work more often as "AI", focusing lately on projects involving "deep learning" and "large language models". Unsurprisingly, public-relations teams and corporate-sponsored researchers deploy hyperbolic language to publicize their work, for example, by framing their research as an effort to "create an automatic mathematician" (Google) or even asking sensationally, "Is human-led mathematics over?" (Meta) [53,62]. It remains to be seen whether academic mathematicians will embrace or refuse the modus operandi of hype-driven stock markets and venture capital. ...

Towards the Automatic Mathematician

Lecture Notes in Computer Science

... It is an overview of humans by way of guidance and leadership to help and direct the religious nature of students towards the formation of the main personality in accordance with Islamic teachings. When it comes to aspects of Islamic religious education, Habi As-Shidiqi elaborates as follows: (1) tarbiyah jasmiyah, namely all matters of education whose form nourishes the body and enforces it, so that it can hinder the difficulties faced in its practice, (2) tarbiyah aqliyah, namely all matters of education and lessons which consequently educate the mind and sharpen the brain, and (3) tarbiyah adabiyah, namely all practical matters as well as in the form of theory whose form is to increase the mind and improve temperament which must always be possessed and implemented (Prahara, 2015;Wu et al., 2021;Fallah et al., 2015). Islamic religious education is designed to grow and increase the faith of students through giving and cultivating knowledge, appreciation, experience, and experiences of students about the religion of Islam. ...

LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning

... Formalizing theories in proof assistants is however a non-trivial task, which has sparked interest in autoformalization: translating definitions, theorems and lemmas written in natural language to the formal language of a proof assistant (Wang et al., 2018;Szegedy, 2020;Wu et al., 2022). Furthermore, even with definitions and statements formalized, constructing the proofs from various tactics is again non-trivial, even with the help of tools like Sledgehammer. ...

A Promising Path Towards Autoformalization and General Artificial Intelligence
  • Citing Chapter
  • July 2020

Lecture Notes in Computer Science

... Marchesin et al. [26] introduced a semantic web scheme on the basis of automatically created documents to improve the effectiveness of retrieval systems. Cakaloglu et al. [27] compared the various effects of different textual vectorization techniques on search performance. Yao [28] proposed an intelligent retrieval scheme combining natural language processing, fuzzy database technology, and artificial intelligence. ...

Text Embeddings for Retrieval from a Large Knowledge Base
  • Citing Chapter
  • June 2020

Lecture Notes in Business Information Processing

... The early efforts towards automated mathematics [1] have been based on a collection of hand-crafted heuristics for the discovery of new mathematical concepts and conjectures related to them. More recently, there has been a growing interest in using machine learning to address the task of premise selection [2][3][4][5][6][7], i.e., training models for the recommendation of theorems that are useful for proving a given statement in proof assistants, which verify the formal correctness of mathematical proofs and constructions. Also, machine learning has been used for discovering patterns and predictive models in databases of mathematical objects that lead to novel conjectures proved by mathematicians [8,9]. ...

Graph Representations for Higher-Order Logic and Theorem Proving
  • Citing Article
  • April 2020

Proceedings of the AAAI Conference on Artificial Intelligence

... Mathematical reasoning has also been tackled through automatic proof generation [44]. More general applications of deep learning to theorem proving are guiding the proof search with clause selection for CNF formulas [45] and tactic and premise selection/prediction for Coq and HOL light [5,6,34,48]. In contrast to proof guidance, LLMs can be used for end-to-end generation and repair of proofs in Isabelle/HOL [31]. ...

Learning to Reason in Large Theories without Imitation
  • Citing Preprint
  • May 2019

... Given a model is trained on a dataset { , } , where ∈ X is input and ∈ {1, 2, ..., }, ∈ is a categorized label, the goal of an adversarial attack is to produce an adversarial example ′ ∈ X, such that ( ′ ) ≠ . In this paper, we adopt FGSM [8], the most popular adversarial attack method that creates adversarial noise by the direction of the sign of gradient at each pixel as follows: ...

Explaining and harnessing adversarial examples
  • Citing Conference Paper
  • January 2015