Article

A Computational Approach to Deciphering Unknown Scripts

12/2002;
Source: CiteSeer

ABSTRACT We propose and evaluate computational techniques for deciphering unknown scripts. We focus on the case in which an unfamiliar script encodes a known language. The decipherment of a brief document or inscription is driven by data about the spoken language. We consider which scripts are easy or hard to decipher, how much data is required, and whether the techniques are robust against language change over time.

0 Bookmarks
 · 
71 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We present a simple objective function that when optimized yields accurate solutions to both decipherment and cognate pair identification problems. The objective simultaneously scores a matching between two alphabets and a matching between two lexicons, each in a different language. We introduce a simple coordinate descent procedure that efficiently finds effective solutions to the resulting combinatorial optimization problem. Our system requires only a list of words in both languages as input, yet it competes with and surpasses several state-of-the-art systems that are both substantially more complex and make use of more information.
    Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, 27-31 July 2011, John McIntyre Conference Centre, Edinburgh, UK, A meeting of SIGDAT, a Special Interest Group of the ACL; 01/2011
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Steganography, or information hiding, is to conceal the existence of messages so as to protect their confidentiality. We consider deciphering a stegoscript, a text with secret messages embedded within a covertext, and identifying the vocabularies used in the messages, with no knowledge of the vocabularies and grammar in which the script was written. Our research was motivated by the problem of identifying conserved non-coding functional elements (motifs) in regulatory regions of genome sequences, which we view as stegoscripts constructed by nature with a statistical model consisting of a dictionary and a grammar. We develop an iterative learning algorithm, WordSpy, to learn such a model from a stegoscript. The model then can be applied to identify the embedded secret messages, i.e., the functional motifs. Our algorithm can successfully recover the most possible text of the first ten chapters of a novel embedded in a stegoscript and identify the transcription factor binding motifs in the upstream regions of ~800 yeast genes. Abstract Steganography, or information hiding, is to conceal the existence of messages so as to protect their confidentiality. We consider de-ciphering a stegoscript, a text with secret messages embedded within a covertext, and identifying the vocabularies used in the mes-sages, with no knowledge of the vocabularies and grammar in which the script was writ-ten. Our research was motivated by the prob-lem of identifying conserved non-coding func-tional elements (motifs) in regulatory regions of genome sequences, which we view as stego-scripts constructed by nature with a statis-tical model consisting of a dictionary and a grammar. We develop an iterative learning algorithm, WordSpy, to learn such a model from a stegoscript. The model then can be applied to identify the embedded secret mes-sages, i.e., the functional motifs. Our algo-rithm can successfully recover the most pos-sible text of the first ten chapters of a novel embedded in a stegoscript and identify the transcription factor binding motifs in the up-stream regions of ∼ 800 yeast genes.
    05/2005;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper attacks a Japanese syllable-substitution cipher. We use a probabilistic, noisy-channel framework, exploiting various Japanese language models to drive the decipherment. We describe several innova- tions, including a new objective function for searching for the highest- scoring decipherment. We include empirical studies of the relevant phe- nomena, and we give improved decipherment accuracy rates.
    Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy, 22nd International Conference, ICCPOL 2009, Hong Kong, March 26-27, 2009. Proceedings; 01/2009

Full-text (2 Sources)

Download
0 Downloads
Available from