Conference Paper

Using CAPTCHAs to Index Cultural Artifacts

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Rock art, human-made markings on stone, is an important cultural artifact and the earliest expression of abstract thinking. While there are tens of millions of photographs of rock art in existence, there have been no large-scale attempts to organize, classify or cluster them. This omission is not due to a lack of interest, but reflects the extraordinary difficultly of extracting useful data from an incredibly heterogeneous and noisy dataset. As we shall show, rock art is likely to resist efforts of automatic extraction from images for a long time. In this work we show that we can use CAPTCHAs, puzzles designed to tell hu- mans and computers apart, to segment and index rock art. Unlike other CAPT- CHAs which operate on inherently discrete data and expect discrete responses, our method considers inherently real-valued data and expects real-valued re- sponses. This creates a challenge which we have overcome by using a recently introduced distance measure. We demonstrate our system is capable of acting as a secure CAPTCHA, while producing data that allows for indexing the rock art.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... It is true that certain degree of subjectivity seems unavoidable in rock art classification. In order to minimize it, some attempts have been made with automatic classification (i.e., Zhu and Keogh 2010). But here, visual image retrieval problems arise. ...
... But here, visual image retrieval problems arise. As Zhu and Keogh (2010) stated, it is difficult to extract automatic data from rock art images since the resulting dataset is both heterogeneous and noisy. CAPTCHAs (puzzles designed to tell humans and computers apart) have been used to segment and index rock art. ...
... CAPTCHAs (puzzles designed to tell humans and computers apart) have been used to segment and index rock art. By experimenting with human drawing based on actual photographs of petroglyphs and considering the whole variation obtained, they have arrived at good results (Zhu and Keogh 2010). The main benefit of that type of study is that the criteria employed in the coding and classification process must be made explicit. ...
Article
Full-text available
In spite of its importance for rock art studies, rock art motifs coding and classification process is not always made explicit. In this discussion of the process of rock art classification, we consider an exploratory research employing two different criteria for the coding of a northwestern Patagonian rock art motifs database. One coding makes use of a ‘lumping’ criterion, and the other uses a ‘splitting’ criterion. Each of these criteria will be evaluated using cladistic analysis, recording how each coding criterion affects results. As a conclusion, and given our results, the use of more than one coding criterion is suggested when classifying rock art.
... Recent developments have shown that labeled datasets are not only important to enable the objective comparison of ML approaches but are further necessary to successfully train today's complex classifiers, such as deep neural networks (DNNs) [LBH15]. To reduce the effort of labeling large datasets (of potentially millions of instances) different strategies have been developed (e.g., captcha-based information collection [ZK10], web-based annotation systems [RTMF08], paid micro tasks on platforms like mechanical turk [BKG11], and game-based approaches [VAD04,ROR11]). ...
... In addition, knowledge generation by users is not part of the model to enable the objective comparison of ML approaches but are further necessary to successfully train today's complex classifiers, such as deep neural networks (DNNs) [46]. To reduce the effort of labeling large datasets (of potentially millions of instances) different strategies have been developed (e.g., captcha-based information collection [106], web-based annotation systems [62], paid micro-tasks on platforms like mechanical turk [9], and game-based approaches [61,89]). ...
Article
Full-text available
The assignment of labels to data instances is a fundamental prerequisite for many machine learning tasks. Moreover, labeling is a frequently applied process in visual interactive analysis approaches and visual analytics. However, the strategies for creating labels usually differ between these two fields. This raises the question whether synergies between the different approaches can be attained. In this paper, we study the process of labeling data instances with the user in the loop, from both the machine learning and visual interactive perspective. Based on a review of differences and commonalities, we propose the “visual interactive labeling” (VIAL) process that unifies both approaches. We describe the six major steps of the process and discuss their specific challenges. Additionally, we present two heterogeneous usage scenarios from the novel VIAL perspective, one on metric distance learning and one on object detection in videos. Finally, we discuss general challenges to VIAL and point out necessary work for the realization of future VIAL approaches.
Conference Paper
Initiatives such as the Google Print Library Project and the Million Book Project have already archived more than ten million books in digital format, and within the next decade the majority of world's books will be online. Although most of the data will naturally be text, there will also be tens of millions of pages of images, many in color. While there is an active research community pursuing data mining of text from historical manuscripts, there has been very little work that exploits the rich color information which is often present. In this work we introduce a simple color measure which both addresses and exploits typical features of historical manuscripts. To enable the efficient mining of massive archives, we propose a tight lower bound to the measure. Beyond the fast similarity search, we show how this lower bound allows us to build several higher-level data mining tools, including motif discovery and link analyses. We demonstrate our ideas in several data mining tasks on manuscripts dating back to the fifteenth century.
Article
Initiatives such as the Google Print Library Project and the Million Book Project have already archived more than twelve million books in digital format, and within the next decade, the majority of world’s books will be online. Although most of the data will naturally be text, there will also be tens of millions of pages of images, many in color. While there is an active research community pursuing data mining of text from historical manuscripts, there has been very little work that exploits the rich color information which is often present. In this work, we introduce a simple color measure which both addresses and exploits typical features of historical manuscripts. To enable the efficient mining of massive archives, we propose a tight lower bound to the measure. Beyond the fast similarity search, we show how this lower bound allows us to build several higher-level data mining tools, including motif discovery and link analyses. We demonstrate our ideas in several data mining tasks on manuscripts dating back to the fifteenth century.
Conference Paper
Full-text available
HIPs, or Human Interactive Proofs, are challenges meant to be easily solved by humans, while remaining too hard to be economically solved by computers. HIPs are increasingly used to protect services against automatic script attacks. To be effective, a HIP must be difficult enough to discourage script attacks by raising the computation and/or development cost of breaking the HIP to an unprofitable level. At the same time, the HIP must be easy enough to solve in order to not discourage humans from using the service. Early HIP designs have successfully met these criteria [1]. However, the growing sophistication of attackers and correspondingly increasing profit incentives have rendered most of the currently deployed HIPs vulnerable to attack [2,7,12]. Yet, most companies have been reluctant to increase the difficulty of their HIPs for fear of making them too complex or unappealing to humans. The purpose of this study is to find the visual distortions that are most effective at foiling computer attacks without hindering humans. The contribution of this research is that we discovered that 1) automatically generating HIPs by varying particular distortion parameters renders HIPs that are too easy for computer hackers to break, yet humans still have difficulty recognizing them, and 2) it is possible to build segmentation-based HIPs that are extremely difficult and expensive for computers to solve, while remaining relatively easy for humans.
Article
Full-text available
Sophisticated examples of European palaeolithic parietal art can be seen in the caves of Altamira, Lascaux and Niaux near the Pyrenees, which date to the Magdalenian period (12,000-17,000 years ago), but paintings of comparable skill and complexity were created much earlier, some possibly more than 30,000 years ago. We have derived new radiocarbon dates for the drawings that decorate the Chauvet cave in Vallon-Pont-d'Arc, Ardèche, France, which confirm that even 30,000 years ago Aurignacian artists, already known as accomplished carvers, could create masterpieces comparable to the best Magdalenian art. Prehistorians, who have traditionally interpreted the evolution of prehistoric art as a steady progression from simple to more complex representations, may have to reconsider existing theories of the origins of art.
Article
Full-text available
In the Eurasian Upper Paleolithic after about 35,000 years ago, abstract or depictional images provide evidence for cognitive abilities considered integral to modern human behavior. Here we report on two abstract representations engraved on pieces of red ochre recovered from the Middle Stone Age layers at Blombos Cave in South Africa. A mean date of 77,000 years was obtained for the layers containing the engraved ochres by thermoluminescence dating of burnt lithics, and the stratigraphic integrity was confirmed by an optically stimulated luminescence age of 70,000 years on an overlying dune. These engravings support the emergence of modern human behavior in Africa at least 35,000 years before the start of the Upper Paleolithic.
Article
A representation of a horseman incised on a fossilized ostrich eggshell fragment found among eolian deposits in the Gobi Desert, Mongolia, is analyzed. The representation is paralleled by petroglyphs of the Turkic period in Mongolia, the Baikal area, and in the Altai, and evidently dates back to the same period (no later than the 6th cent. AD). It was probably included among ritual items related to shamanism, and, given its small size and fragility, apparently an apotropaic.
Article
The Hough transform is a method for detecting curves by exploiting the duality between points on a curve and parameters of that curve. The initial work showed how to detect both analytic curves(1,2) and non-analytic curves,(3) but these methods were restricted to binary edge images. This work was generalized to the detection of some analytic curves in grey level images, specifically lines,(4) circles(5) and parabolas.(6) The line detection case is the best known of these and has been ingeniously exploited in several applications.(7,8,9)We show how the boundaries of an arbitrary non-analytic shape can be used to construct a mapping between image space and Hough transform space. Such a mapping can be exploited to detect instances of that particular shape in an image. Furthermore, variations in the shape such as rotations, scale changes or figure ground reversals correspond to straightforward transformations of this mapping. However, the most remarkable property is that such mappings can be composed to build mappings for complex shapes from the mappings of simpler component shapes. This makes the generalized Hough transform a kind of universal transform which can be used to find arbitrarily complex shapes.
Conference Paper
Rock art is an archaeological term for human-made markings on stone. It is believed that there are millions of petroglyphs in North America alone, and the study of this valued cultural resource has implications even beyond anthropology and history. Surprisingly, although image processing, information retrieval and data mining have had large impacts on many human endeavors, they have had essentially zero impact on the study of rock art. In this work we identify the reasons for this, and introduce a novel distance measure and algorithms which allow efficient and effective data mining of large collections of rock art.
Conference Paper
We introduce captcha, an automated test that humans can pass, but current computer programs can't pass: any program that has high success over a captcha can be used to solve an unsolved Arti- cial Intelligence (AI) problem. We provide several novel constructions of captchas. Since captchas have many applications in practical secu- rity, our approach introduces a new class of hard problems that can be exploited for security purposes. Much like research in cryptography has had a positive impact on algorithms for factoring and discrete log, we hope that the use of hard AI problems for security purposes allows us to advance the eld of Articial Intelligence. We introduce two families of AI problems that can be used to construct captchas and we show that solutions to such problems can be used for steganographic commu- nication. captchas based on these AI problem families, then, imply a win-win situation: either the problems remain unsolved and there is a way to dieren tiate humans from computers, or the problems are solved and there is a way to communicate covertly on some channels.
Article
The digitization of antiquities is facilitating a renaissance for scholars who have unprecedented access to rich representations of objects. Cultural Heritage digitization is a central challenge, and its subtleties are intertwined with object properties and the constraints of physical access and handling. In this paper, we present the design and analysis of a system built for the digitization of Puerto Rican petroglyphic iconography. The petroglyphs exhibit unique properties (shape, size, surface) that determine system design choices. The 3D models obtained with the system support new scholarly and educational activities, including interactive surface lighting, feature highlighting and annotation through mark-up, and immersive viewing using large-scale displays.
Article
CAPTCHAs (Completely Automated Public Turing test to tell Computers and Humans Apart) are widespread security measures on the World Wide Web that prevent automated programs from abusing online services. They do so by asking humans to perform a task that computers cannot yet perform, such as deciphering distorted characters. Our research explored whether such human effort can be channeled into a useful purpose: helping to digitize old printed material by asking users to decipher scanned words from books that computerized optical character recognition failed to recognize. We showed that this method can transcribe text with a word accuracy exceeding 99%, matching the guarantee of professional human transcribers. Our apparatus is deployed in more than 40,000 Web sites and has transcribed over 440 million words.
Article
In Hough [1], Duda and Hart [2], and Griffith [3] procedures were proposed for detecting lines in pictures and in [2] Duda and Hart extended their method for more general algebraic curve, fitting. This correspondence shows how this method can be used to detect any given curve in a specific orientation. The procedure presented here con be easily implemented and can be efficiently implemented in a parallel machine.
Article
Through online games, people can collectively solve large-scale computational problems. Such games constitute a general mechanism for using brain power to solve open problems. In fact, designing such a game is much like designing an algorithm - it must be proven correct, its efficiency can be analyzed, a more efficient version can supersede a less efficient one, and so on. "Games with a purpose" have a vast range of applications in areas as diverse as security, computer vision, Internet accessibility, adult content filtering, and Internet search. Any game designed to address these and other problems must ensure that game play results in a correct solution and, at the same time, is enjoyable. People will play such games to be entertained, not to solve a problem - no matter how laudable the objective
CAPTCHA: Using Hard AI Problems for Security. Advances in Cryptology
  • Von
  • L Ahn
  • M Blum
  • N Hopper
  • J Langford
von Ahn,L., Blum,M., Hopper,N. and Langford, J. CAPTCHA: Using Hard AI Problems for Security. Advances in Cryptology, Lecture Notes in Computer Science, pp. 294-311.
Advances in Cryptology
  • For
  • Security
for Security. Advances in Cryptology, Lecture Notes in Computer Science, pp. 294-311.
under review): Evoking the Sacred: Commercial Appropriations of Nature in “The Petroglyphs
  • E A Dickinson
An Unexpected, Stripe-faced Flying Fox in Ice Age Rock Art of Australia’s Kimberley
  • J Pettigrew
  • M Nugent
  • A Mcphee
  • J Wallman
  • I V Aseyev
  • Horseman
  • On
  • Ostrich
  • Fragment
Aseyev, I. V. HORSEMAN IMAGE ON AN OSTRICH EGGSHELL FRAGMENT. Archaeology Ethnology & Anthropology of Eurasia 34/2, 96–99. (2008)
Horseman Image on an Ostrich Eggshell Fragment Archaeology Ethnology & Anthropology of Eurasia 34
  • I V Aseyev