Christian Szegedy's research while affiliated with Mountain View College and other places

Publications (37)

Preprint
Full-text available
Autoformalization is the process of automatically translating from natural language mathematics to formal specifications and proofs. A successful autoformalization system could advance the fields of formal verification, program synthesis, and artificial intelligence. While the long-term goal of autoformalization seemed elusive for a long time, we s...
Preprint
Full-text available
Language models typically need to be trained or finetuned in order to acquire new knowledge, which involves updating their weights. We instead envision language models that can simply read and memorize new data at inference time, thus acquiring new knowledge immediately. In this work, we extend language models with the ability to memorize the inter...
Preprint
Transformer models yield impressive results on many NLP and sequence modeling tasks. Remarkably, Transformers can handle long sequences which allows them to produce long coherent outputs: full paragraphs produced by GPT-3 or well-structured images produced by DALL-E. These large language models are impressive but also very inefficient and costly, w...
Chapter
Full-text available
Over the recent years deep learning has found successful applications in mathematical reasoning. Today, we can predict fine-grained proof steps, relevant premises, and even useful conjectures using neural networks. This extended abstract summarizes recent developments of machine learning in mathematical reasoning and the vision of the N2Formal grou...
Preprint
Full-text available
While designing inductive bias in neural architectures has been widely studied, we hypothesize that transformer networks are flexible enough to learn inductive bias from suitable generic tasks. Here, we replace architecture engineering by encoding inductive bias in the form of datasets. Inspired by Peirce's view that deduction, induction, and abduc...
Chapter
An autoformalization system is an AI that learns to read natural language content and to turn it into an abstract, machine verifiable formalization, ideally by bootstrapping from unlabeled training data with minimum human interaction. This is a difficult task in general, one that would require strong automated reasoning and automated natural langua...
Chapter
Text embedding representing natural language documents in a semantic vector space can be used for document retrieval using nearest neighbor lookup. In order to study the feasibility of neural models specialized for retrieval in a semantically meaningful way, we suggest the use of the Stanford Question Answering Dataset (SQuAD) in an open-domain que...
Preprint
We examine whether language modeling applied to mathematical formulas enables logical reasoning. We suggest several logical reasoning tasks that can be used to evaluate language models trained on formal mathematical statements, such as type inference, suggesting missing assumptions and completing equalities. To train language models for formal math...
Article
This paper presents the first use of graph neural networks (GNNs) for higher-order proof search and demonstrates that GNNs can improve upon state-of-the-art results in this domain. Interactive, higher-order theorem provers allow for the formalization of most mathematical theories and have been shown to pose a significant challenge for deep learning...
Preprint
We design and conduct a simple experiment to study whether neural networks can perform several steps of approximate reasoning in a fixed dimensional latent space. The set of rewrites (i.e. transformations) that can be successfully performed on a statement represents essential semantic features of the statement. We can compress this information by e...
Preprint
Automated theorem proving in large theories can be learned via reinforcement learning over an indefinitely growing action space. In order to select actions, one performs nearest neighbor lookups in the knowledge base to find premises to be applied. Here we address the exploration for reinforcement learning in this space. Approaches (like epsilon-gr...
Preprint
This paper presents the first use of graph neural networks (GNNs) for higher-order proof search and demonstrates that GNNs can improve upon state-of-the-art results in this domain. Interactive, higher-order theorem provers allow for the formalization of most mathematical theories and have been shown to pose a significant challenge for deep learning...
Preprint
We present an environment, benchmark, and deep learning driven automated theorem prover for higher-order logic. Higher-order interactive theorem provers enable the formalization of arbitrary mathematical theories and thereby present an interesting, open-ended challenge for deep learning. We provide an open-source framework based on the HOL Light th...
Preprint
Full-text available
Text embedding representing natural language documents in a semantic vector space can be used for document retrieval using nearest neighbor lookup. In order to study the feasibility of neural models specialized for retrieval in a semantically meaningful way, we suggest the use of the Stanford Question Answering Dataset (SQuAD) in an open-domain que...
Article
Large computer-understandable proofs consist of millions of intermediate logical steps. The vast majority of such steps originate from manually selected and manually guided heuristics applied to intermediate goals. So far, machine learning has generally not been used to filter or generate these steps. In this paper, we introduce a new dataset based...
Article
Deep learning techniques lie at the heart of several significant AI advances in recent years including object recognition and detection, image captioning, machine translation, speech recognition and synthesis, and playing the game of Go. Automated first-order theorem provers can aid in the formalization and verification of mathematical theorems and...
Conference Paper
Full-text available
We present a method for detecting objects in images using a single deep neural network. Our approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location. At prediction time, the network generates scores for the presence of each object category in eac...
Conference Paper
Convolutional networks are at the core of most stateof-the-art computer vision solutions for a wide variety of tasks. Since 2014 very deep convolutional networks started to become mainstream, yielding substantial gains in various benchmarks. Although increased model size and computational cost tend to translate to immediate quality gains for most t...
Article
We study the effectiveness of neural sequence models for premise selection in automated theorem proving, one of the main bottlenecks in the formalization of mathematics. We propose a two stage approach for this task that yields good results for the premise selection task on the Mizar corpus while avoiding the hand-engineered features of existing st...
Article
Very deep convolutional networks have been central to the largest advances in image recognition performance in recent years. One example is the Inception architecture that has been shown to achieve very good performance at relatively low computational cost. Recently, the introduction of residual connections in conjunction with a more traditional ar...
Article
Full-text available
Search with local intent is becoming increasingly useful due to the popularity of the mobile device. The creation and maintenance of accurate listings of local businesses worldwide is time consuming and expensive. In this paper, we propose an approach to automatically discover businesses that are visible on street level imagery. Precise business st...
Article
Full-text available
We present a method for detecting objects in images using a single deep neural network. Our approach, named SSD, discretizes the output space of bounding boxes into a set of bounding box priors over different aspect ratios and scales per feature map location. At prediction time, the network generates confidences that each prior corresponds to objec...
Article
Convolutional networks are at the core of most state-of-the-art computer vision solutions for a wide variety of tasks. Since 2014 very deep convolutional networks started to become mainstream, yielding substantial gains in various benchmarks. Although increased model size and computational cost tend to translate to immediate quality gains for most...
Article
Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearit...
Conference Paper
Several machine learning models, including neural networks, consistently mis- classify adversarial examples—inputs formed by applying small but intentionally worst-case perturbations to examples from the dataset, such that the perturbed in- put results in the model outputting an incorrect answer with high confidence. Early attempts at explaining th...
Article
Full-text available
Current state-of-the-art deep learning systems for visual object recognition and detection use purely supervised training with regularization such as dropout to avoid overfitting. The performance depends critically on the amount of labeled examples, and in current practice the labels are assumed to be unambiguous and accurate. However, this assumpt...
Article
Full-text available
Several machine learning models, including neural networks, consistently misclassify adversarial examples---inputs formed by applying small but intentionally worst-case perturbations to examples from the dataset, such that the perturbed input results in the model outputting an incorrect answer with high confidence. Early attempts at explaining this...
Article
Full-text available
Most high quality object detection approaches use the same scheme: salience-based object proposal methods followed by post-classification using deep convolutional features. In this work, we demonstrate that fully learnt, data-driven proposal generation methods can effectively match the accuracy of their hand engineered counterparts, while allowing...
Article
Full-text available
We propose a deep convolutional neural network architecture codenamed "Inception", which was responsible for setting the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC 2014). The main hallmark of this architecture is the improved utilization of the computing resources insi...
Article
Full-text available
Deep neural networks are highly expressive models that have recently achieved state of the art performance on speech and visual recognition tasks. While their expressiveness is the reason they succeed, it also causes them to learn uninterpretable solutions that could have counter-intuitive properties. In this paper we report two such properties. Fi...
Article
We propose a method for human pose estimation based on Deep Neural Networks (DNNs). The pose estimation is formulated as a DNN-based regression problem towards body joints. We present a cascade of such DNN regressors which results in high precision pose estimates. The approach has the advantage of reasoning about pose in a holistic fashion and has...
Article
Full-text available
Deep convolutional neural networks have recently achieved state-of-the-art performance on a number of image recognition benchmarks, including the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC-2012). The winning model on the localization sub-task was a network that predicts a single bounding box and a confidence score for each object cat...
Article
Full-text available
Deep Neural Networks (DNNs) have recently shown outstanding performance on image classification tasks [14]. In this paper we go one step further and address the problem of object detection using DNNs, that is not only classifying but also precisely localizing objects of various classes. We present a simple and yet powerful formulation of object det...
Conference Paper
Full-text available
Deep Neural Networks (DNNs) have recently shown outstanding performance on image classification tasks [14]. In this paper we go one step further and address the problem of object detection using DNNs, that is not only classifying but also precisely localizing objects of various classes. We present a simple and yet powerful formulation of object det...

Citations

... Unlike the equally popular ConvNets, most Transformerbased models are flat, having a constant number of activations and channels throughout. Hierarchical variations have been recently proposed for NLP (Dai et al., 2020;Nawrot et al., 2021) and for vision, building upon ViT (Fan et al., 2021;Liu et al., 2021;Ranftl et al., 2021). Here we propose a hierarchical version of Perceiver that reuses its technique of cross-attending to learned latents, but now also for adapting the internal resolution and number of channels between layers. ...
... Datasets for mathematical reasoning. Other works have studied datasets derived from automated theorem provers [19,146,94], interactive theorem provers [115,121,13,112,224,168,221,140,137,205,172,159] (see [173] for a survey), symbolic mathematics [134], and mathematical problems in natural language [178,181]. Close work to this thesis are the applications of Transformers to directly solve differential equations [134] and directly predict missing assumptions and types of formal mathematical statements [172]. ...
... Autoformalization refers to the task of automatically translating from natural language mathematics to a formal language [48,44]. The implication of a successful autoformalization tool is huge in both practical and philosophical terms. ...
... The recall@5 result is calculated for SQuAD in order to compare with the document retrieval component of the multi-layer recurrent neural network [13] (DrQA) and the convolutional residual retrieval network (ConvRR) [14] [44]. The comparisons are shown in Table IV. ...
... Graph neural networks (GNNs) [24,5] and transformers [27] are two commonly used networks to extract features from mathematical theorems. The hierarchical structure of mathematical expressions can be viewed as trees, so that GNNs are employed in many works [28,18,30,2]. Besides, mathematical expressions can also be regarded as sequences of tokens. Thanks to the strong expressive ability of transformers in natural language processing, many works use transformers to extra features from formulas [21,11,23]. ...
... In most cases, robustness issues arise because of the distributional shift problem, i.e. when the training distribution the model was trained on is different from the deployment distribution. The most notorious examples of such phenomena are the adversarial attacks, where carefully crafted perturbations can deceive a ML model (Goodfellow et al. 2014). Formally, given a sound input x, an adversarial example is defined as a crafted input x such as ‖x −x‖ ≤ and f (x) ≠ f (x) where f(x) is the ML model prediction for a given input and is small enough to ensure that x is indistinguishable from x from a human perspective. ...
... With the rapid development of neural network, many researchers have applied neural networks to computer vision and have achieved many successes, which included image classification [6], object detection [2], semantic analysis [19], etc. Due to the ability to extract advanced and multi-scale features [4,16], Convolutional Neural Network (CNN) [7] eliminated the dependence on manual feature extraction and has been widely used in various computer vision tasks in recent years. CNN-based salient object detection methods have more excellent performance than traditional methods and have become the mainstream research direction. ...
... [ Alemi et al. 2016], [Bridge et al. 2014] and [Kaliszyk et al. 2017] show some approaches of applying machine learning methods in the field of automated theorem proving. [Alemi et al. 2016] introduced neural sequence models for premise selection in automated theorem proving, while [Bridge et al. 2014] applied two state of the art machine learning techniques to the problem of selecting a good heuristic in a first-order theorem prover. ...
... Early applications of machine learning in theorem proving include the works by Schulz [42] and Urban [45], and later, directly guiding interactive proof assistants using machine learning techniques [14]. The revolution of deep learning then kicked off a new wave of interest in the topic starting with DeepMath [1,33]. ...
... State-of-the-art object detection methods can be broadly classified into two categories, namely, one-stage and twostage methods. The representative one-stage detectors include YOLO (which is an acronym for You Only Look Once) [4,[8][9][10][11], single-shot detector (SSD) [12], RetinaNet [13], and CenterNet [5,6], which methods can perform nearly real-time detection, do not need proposal generation procedure, and directly conduct object detection in images. The YOLO families achieve state-of-the-art performance by integrating bounding boxes and subsequent feature resampling in a single stage. ...