# Christian Szegedy's research while affiliated with Mountain View College and other places

**What is this page?**

This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.

It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.

If you're a ResearchGate member, you can follow this page to keep up with this author's work.

If you are this author, and you don't want us to display this page anymore, please let us know.

It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.

If you're a ResearchGate member, you can follow this page to keep up with this author's work.

If you are this author, and you don't want us to display this page anymore, please let us know.

## Publications (37)

Autoformalization is the process of automatically translating from natural language mathematics to formal specifications and proofs. A successful autoformalization system could advance the fields of formal verification, program synthesis, and artificial intelligence. While the long-term goal of autoformalization seemed elusive for a long time, we s...

Language models typically need to be trained or finetuned in order to acquire new knowledge, which involves updating their weights. We instead envision language models that can simply read and memorize new data at inference time, thus acquiring new knowledge immediately. In this work, we extend language models with the ability to memorize the inter...

Transformer models yield impressive results on many NLP and sequence modeling tasks. Remarkably, Transformers can handle long sequences which allows them to produce long coherent outputs: full paragraphs produced by GPT-3 or well-structured images produced by DALL-E. These large language models are impressive but also very inefficient and costly, w...

Over the recent years deep learning has found successful applications in mathematical reasoning. Today, we can predict fine-grained proof steps, relevant premises, and even useful conjectures using neural networks. This extended abstract summarizes recent developments of machine learning in mathematical reasoning and the vision of the N2Formal grou...

While designing inductive bias in neural architectures has been widely studied, we hypothesize that transformer networks are flexible enough to learn inductive bias from suitable generic tasks. Here, we replace architecture engineering by encoding inductive bias in the form of datasets. Inspired by Peirce's view that deduction, induction, and abduc...

An autoformalization system is an AI that learns to read natural language content and to turn it into an abstract, machine verifiable formalization, ideally by bootstrapping from unlabeled training data with minimum human interaction. This is a difficult task in general, one that would require strong automated reasoning and automated natural langua...

Text embedding representing natural language documents in a semantic vector space can be used for document retrieval using nearest neighbor lookup. In order to study the feasibility of neural models specialized for retrieval in a semantically meaningful way, we suggest the use of the Stanford Question Answering Dataset (SQuAD) in an open-domain que...

We examine whether language modeling applied to mathematical formulas enables logical reasoning. We suggest several logical reasoning tasks that can be used to evaluate language models trained on formal mathematical statements, such as type inference, suggesting missing assumptions and completing equalities. To train language models for formal math...

This paper presents the first use of graph neural networks (GNNs) for higher-order proof search and demonstrates that GNNs can improve upon state-of-the-art results in this domain. Interactive, higher-order theorem provers allow for the formalization of most mathematical theories and have been shown to pose a significant challenge for deep learning...

We design and conduct a simple experiment to study whether neural networks can perform several steps of approximate reasoning in a fixed dimensional latent space. The set of rewrites (i.e. transformations) that can be successfully performed on a statement represents essential semantic features of the statement. We can compress this information by e...

Automated theorem proving in large theories can be learned via reinforcement learning over an indefinitely growing action space. In order to select actions, one performs nearest neighbor lookups in the knowledge base to find premises to be applied. Here we address the exploration for reinforcement learning in this space. Approaches (like epsilon-gr...

This paper presents the first use of graph neural networks (GNNs) for higher-order proof search and demonstrates that GNNs can improve upon state-of-the-art results in this domain. Interactive, higher-order theorem provers allow for the formalization of most mathematical theories and have been shown to pose a significant challenge for deep learning...

We present an environment, benchmark, and deep learning driven automated theorem prover for higher-order logic. Higher-order interactive theorem provers enable the formalization of arbitrary mathematical theories and thereby present an interesting, open-ended challenge for deep learning. We provide an open-source framework based on the HOL Light th...

Text embedding representing natural language documents in a semantic vector space can be used for document retrieval using nearest neighbor lookup. In order to study the feasibility of neural models specialized for retrieval in a semantically meaningful way, we suggest the use of the Stanford Question Answering Dataset (SQuAD) in an open-domain que...

Large computer-understandable proofs consist of millions of intermediate logical steps. The vast majority of such steps originate from manually selected and manually guided heuristics applied to intermediate goals. So far, machine learning has generally not been used to filter or generate these steps. In this paper, we introduce a new dataset based...

Deep learning techniques lie at the heart of several significant AI advances in recent years including object recognition and detection, image captioning, machine translation, speech recognition and synthesis, and playing the game of Go. Automated first-order theorem provers can aid in the formalization and verification of mathematical theorems and...

We present a method for detecting objects in images using a single deep neural network. Our approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location. At prediction time, the network generates scores for the presence of each object category in eac...

Convolutional networks are at the core of most stateof-the-art computer vision solutions for a wide variety of tasks. Since 2014 very deep convolutional networks started to become mainstream, yielding substantial gains in various benchmarks. Although increased model size and computational cost tend to translate to immediate quality gains for most t...

We study the effectiveness of neural sequence models for premise selection in automated theorem proving, one of the main bottlenecks in the formalization of mathematics. We propose a two stage approach for this task that yields good results for the premise selection task on the Mizar corpus while avoiding the hand-engineered features of existing st...

Very deep convolutional networks have been central to the largest advances in image recognition performance in recent years. One example is the Inception architecture that has been shown to achieve very good performance at relatively low computational cost. Recently, the introduction of residual connections in conjunction with a more traditional ar...

Search with local intent is becoming increasingly useful due to the
popularity of the mobile device. The creation and maintenance of accurate
listings of local businesses worldwide is time consuming and expensive. In this
paper, we propose an approach to automatically discover businesses that are
visible on street level imagery. Precise business st...

We present a method for detecting objects in images using a single deep
neural network. Our approach, named SSD, discretizes the output space of
bounding boxes into a set of bounding box priors over different aspect ratios
and scales per feature map location. At prediction time, the network generates
confidences that each prior corresponds to objec...

Convolutional networks are at the core of most state-of-the-art computer
vision solutions for a wide variety of tasks. Since 2014 very deep
convolutional networks started to become mainstream, yielding substantial gains
in various benchmarks. Although increased model size and computational cost
tend to translate to immediate quality gains for most...

Training Deep Neural Networks is complicated by the fact that the
distribution of each layer's inputs changes during training, as the parameters
of the previous layers change. This slows down the training by requiring lower
learning rates and careful parameter initialization, and makes it notoriously
hard to train models with saturating nonlinearit...

Several machine learning models, including neural networks, consistently mis- classify adversarial examples—inputs formed by applying small but intentionally worst-case perturbations to examples from the dataset, such that the perturbed in- put results in the model outputting an incorrect answer with high confidence. Early attempts at explaining th...

Current state-of-the-art deep learning systems for visual object recognition
and detection use purely supervised training with regularization such as
dropout to avoid overfitting. The performance depends critically on the amount
of labeled examples, and in current practice the labels are assumed to be
unambiguous and accurate. However, this assumpt...

Several machine learning models, including neural networks, consistently
misclassify adversarial examples---inputs formed by applying small but
intentionally worst-case perturbations to examples from the dataset, such that
the perturbed input results in the model outputting an incorrect answer with
high confidence. Early attempts at explaining this...

Most high quality object detection approaches use the same scheme: salience-based object proposal methods followed by post-classification using deep convolutional features. In this work, we demonstrate that fully learnt, data-driven proposal generation methods can effectively match the accuracy of their hand engineered counterparts, while allowing...

We propose a deep convolutional neural network architecture codenamed
"Inception", which was responsible for setting the new state of the art for
classification and detection in the ImageNet Large-Scale Visual Recognition
Challenge 2014 (ILSVRC 2014). The main hallmark of this architecture is the
improved utilization of the computing resources insi...

Deep neural networks are highly expressive models that have recently achieved
state of the art performance on speech and visual recognition tasks. While
their expressiveness is the reason they succeed, it also causes them to learn
uninterpretable solutions that could have counter-intuitive properties. In this
paper we report two such properties.
Fi...

We propose a method for human pose estimation based on Deep Neural Networks
(DNNs). The pose estimation is formulated as a DNN-based regression problem
towards body joints. We present a cascade of such DNN regressors which results
in high precision pose estimates. The approach has the advantage of reasoning
about pose in a holistic fashion and has...

Deep convolutional neural networks have recently achieved state-of-the-art
performance on a number of image recognition benchmarks, including the ImageNet
Large-Scale Visual Recognition Challenge (ILSVRC-2012). The winning model on
the localization sub-task was a network that predicts a single bounding box and
a confidence score for each object cat...

Deep Neural Networks (DNNs) have recently shown outstanding performance on image classification tasks [14]. In this paper we go one step further and address the problem of object detection using DNNs, that is not only classifying but also precisely localizing objects of various classes. We present a simple and yet powerful formulation of object det...

Deep Neural Networks (DNNs) have recently shown outstanding performance on image classification tasks [14]. In this paper we go one step further and address the problem of object detection using DNNs, that is not only classifying but also precisely localizing objects of various classes. We present a simple and yet powerful formulation of object det...

## Citations

... Unlike the equally popular ConvNets, most Transformerbased models are flat, having a constant number of activations and channels throughout. Hierarchical variations have been recently proposed for NLP (Dai et al., 2020;Nawrot et al., 2021) and for vision, building upon ViT (Fan et al., 2021;Liu et al., 2021;Ranftl et al., 2021). Here we propose a hierarchical version of Perceiver that reuses its technique of cross-attending to learned latents, but now also for adapting the internal resolution and number of channels between layers. ...

Reference: Hierarchical Perceiver

... Datasets for mathematical reasoning. Other works have studied datasets derived from automated theorem provers [19,146,94], interactive theorem provers [115,121,13,112,224,168,221,140,137,205,172,159] (see [173] for a survey), symbolic mathematics [134], and mathematical problems in natural language [178,181]. Close work to this thesis are the applications of Transformers to directly solve differential equations [134] and directly predict missing assumptions and types of formal mathematical statements [172]. ...

... Autoformalization refers to the task of automatically translating from natural language mathematics to a formal language [48,44]. The implication of a successful autoformalization tool is huge in both practical and philosophical terms. ...

Reference: Autoformalization with Large Language Models

... The recall@5 result is calculated for SQuAD in order to compare with the document retrieval component of the multi-layer recurrent neural network [13] (DrQA) and the convolutional residual retrieval network (ConvRR) [14] [44]. The comparisons are shown in Table IV. ...

... Graph neural networks (GNNs) [24,5] and transformers [27] are two commonly used networks to extract features from mathematical theorems. The hierarchical structure of mathematical expressions can be viewed as trees, so that GNNs are employed in many works [28,18,30,2]. Besides, mathematical expressions can also be regarded as sequences of tokens. Thanks to the strong expressive ability of transformers in natural language processing, many works use transformers to extra features from formulas [21,11,23]. ...

Reference: Learning to Prove Trigonometric Identities

... In most cases, robustness issues arise because of the distributional shift problem, i.e. when the training distribution the model was trained on is different from the deployment distribution. The most notorious examples of such phenomena are the adversarial attacks, where carefully crafted perturbations can deceive a ML model (Goodfellow et al. 2014). Formally, given a sound input x, an adversarial example is defined as a crafted input x such as ‖x −x‖ ≤ and f (x) ≠ f (x) where f(x) is the ML model prediction for a given input and is small enough to ensure that x is indistinguishable from x from a human perspective. ...

... With the rapid development of neural network, many researchers have applied neural networks to computer vision and have achieved many successes, which included image classification [6], object detection [2], semantic analysis [19], etc. Due to the ability to extract advanced and multi-scale features [4,16], Convolutional Neural Network (CNN) [7] eliminated the dependence on manual feature extraction and has been widely used in various computer vision tasks in recent years. CNN-based salient object detection methods have more excellent performance than traditional methods and have become the mainstream research direction. ...

... [ Alemi et al. 2016], [Bridge et al. 2014] and [Kaliszyk et al. 2017] show some approaches of applying machine learning methods in the field of automated theorem proving. [Alemi et al. 2016] introduced neural sequence models for premise selection in automated theorem proving, while [Bridge et al. 2014] applied two state of the art machine learning techniques to the problem of selecting a good heuristic in a first-order theorem prover. ...

... Early applications of machine learning in theorem proving include the works by Schulz [42] and Urban [45], and later, directly guiding interactive proof assistants using machine learning techniques [14]. The revolution of deep learning then kicked off a new wave of interest in the topic starting with DeepMath [1,33]. ...

Reference: Autoformalization with Large Language Models

... State-of-the-art object detection methods can be broadly classified into two categories, namely, one-stage and twostage methods. The representative one-stage detectors include YOLO (which is an acronym for You Only Look Once) [4,[8][9][10][11], single-shot detector (SSD) [12], RetinaNet [13], and CenterNet [5,6], which methods can perform nearly real-time detection, do not need proposal generation procedure, and directly conduct object detection in images. The YOLO families achieve state-of-the-art performance by integrating bounding boxes and subsequent feature resampling in a single stage. ...