Jie S. Li’s scientific contributions

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (3)


LLM-Generated Passphrases That Are Secure and Easy to Remember
  • Conference Paper

January 2025

Jie S. Li

·

·

·

[...]

·

Tom Goldstein

Figure 2: Top row: "angora" in "angora city" with gold image and SD Sampling predicted candidate image. Bottom row: Examples from SD Sampling, with various views of a city. A view of a side street of a city matches more closely an incorrect candidate image of a natural wall.
Figure 3: Top row: "router" in "internet router" with gold image and SD Sampling predicted candidate image. Bottom row: Examples from SD Sampling, with various views of a router. A broad view of a router does not match the close up view presented in the gold image.
Augment-CLIP systems performance.
SD Sampling versus Base-CLIP test dataset confusion matrix: count of test instances.
SD Sampling systems performance.
Augmenters at SemEval-2023 Task 1: Enhancing CLIP in Handling Compositionality and Ambiguity for Zero-Shot Visual WSD through Prompt Augmentation and Text-To-Image Diffusion
  • Preprint
  • File available

July 2023

·

30 Reads

This paper describes our zero-shot approaches for the Visual Word Sense Disambiguation (VWSD) Task in English. Our preliminary study shows that the simple approach of matching candidate images with the phrase using CLIP suffers from the many-to-many nature of image-text pairs. We find that the CLIP text encoder may have limited abilities in capturing the compositionality in natural language. Conversely, the descriptive focus of the phrase varies from instance to instance. We address these issues in our two systems, Augment-CLIP and Stable Diffusion Sampling (SD Sampling). Augment-CLIP augments the text prompt by generating sentences that contain the context phrase with the help of large language models (LLMs). We further explore CLIP models in other languages, as the an ambiguous word may be translated into an unambiguous one in the other language. SD Sampling uses text-to-image Stable Diffusion to generate multiple images from the given phrase, increasing the likelihood that a subset of images match the one that paired with the text.

Download

Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise

August 2022

·

51 Reads

·

10 Citations

Standard diffusion models involve an image transform -- adding Gaussian noise -- and an image restoration operator that inverts this degradation. We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact an entire family of generative models can be constructed by varying this choice. Even when using completely deterministic degradations (e.g., blur, masking, and more), the training and test-time update rules that underlie diffusion models can be easily generalized to create generative models. The success of these fully deterministic models calls into question the community's understanding of diffusion models, which relies on noise in either gradient Langevin dynamics or variational inference, and paves the way for generalized diffusion models that invert arbitrary processes. Our code is available at https://github.com/arpitbansal297/Cold-Diffusion-Models

Citations (1)


... While there are other approaches [4,11], adding Gaussian noise has some advantages, as it allows us to directly generate a noisy image at an arbitrary timestep. Given some noise schedule, with β t being the noise added at step t, ...

Reference:

Higher fidelity perceptual image and video compression with a latent conditioned residual denoising diffusion model
Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise
  • Citing Preprint
  • August 2022