Article

Reply to Ostarek et al.: Language, but not co-occurrence statistics, is useful for learning animal appearance

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... particular object properties that are learnable via word co-occurrences (for example, refs. [104][105][106][107][108]. It is possible that some aspects of semantic knowledge may be captured to different degrees in the embedding space, depending on a variety of factors. ...
Article
Full-text available
The words of a language reflect the structure of the human mind, allowing us to transmit thoughts between individuals. However, language can represent only a subset of our rich and detailed cognitive architecture. Here, we ask what kinds of common knowledge (semantic memory) are captured by word meanings (lexical semantics). We examine a prominent computational model that represents words as vectors in a multidimensional space, such that proximity between word-vectors approximates semantic relatedness. Because related words appear in similar con-texts, such spaces - called "word embeddings" - can be learned from patterns of lexical co-occurrences in natural language. Despite their popularity, a fundamental concern about word embeddings is that they appear to be semantically "rigid": inter-word proximity captures only overall similarity, yet human judgments about object similarities are highly context-dependent and involve multiple, distinct semantic features. For example, dolphins and alligators appear similar in size, but differ in intelligence and aggressiveness. Could such context-dependent relationships be recovered from word embeddings? To address this issue, we introduce a powerful, domain-general solution: "semantic projection" of word-vectors onto lines that represent various object features, like size (the line extending from the word "small" to "big"), intelligence (from "dumb" to "smart"), or danger (from "safe" to "dangerous"). This method, which is intuitively analogous to placing objects "on a mental scale" between two extremes, recovers human judgments across a range of object categories and properties. We thus show that word embeddings inherit a wealth of common knowledge from word co-occurrence statistics and can be flexibly manipulated to express context-dependent meanings.
Article
Significance We learn in a variety of ways: through direct sensory experience, by talking with others, and by thinking. Disentangling how these sources contribute to what we know is challenging. A wedge into this puzzle was suggested by empiricist philosophers, who hypothesized that people born blind would lack deep knowledge of “visual” phenomena such as color. We find that, contrary to this prediction, congenitally blind and sighted individuals share in-depth understanding of object color. Blind and sighted people share similar intuitions about which objects will have consistent colors, make similar predictions for novel objects, and give similar explanations. Living among people who talk about color is sufficient for color understanding, highlighting the efficiency of linguistic communication as a source of knowledge.
Article
Full-text available
How does first-person sensory experience contribute to knowledge? Contrary to the suppositions of early empiricist philosophers, people who are born blind know about phenomena that cannot be perceived directly, such as color and light. Exactly what is learned and how remains an open question. We compared knowledge of animal appearance across congenitally blind ( n = 20) and sighted individuals (two groups, n = 20 and n = 35) using a battery of tasks, including ordering (size and height), sorting (shape, skin texture, and color), odd-one-out (shape), and feature choice (texture). On all tested dimensions apart from color, sighted and blind individuals showed substantial albeit imperfect agreement, suggesting that linguistic communication and visual perception convey partially redundant appearance information. To test the hypothesis that blind individuals learn about appearance primarily by remembering sighted people’s descriptions of what they see (e.g., “elephants are gray”), we measured verbalizability of animal shape, texture, and color in the sighted. Contrary to the learn-from-description hypothesis, blind and sighted groups disagreed most about the appearance dimension that was easiest for sighted people to verbalize: color. Analysis of disagreement patterns across all tasks suggest that blind individuals infer physical features from non-appearance properties of animals such as folk taxonomy and habitat (e.g., bats are textured like mammals but shaped like birds). These findings suggest that in the absence of sensory access, structured appearance knowledge is acquired through inference from ontological kind.
Article
Full-text available
Sentences that refer to categories - generic sentences (e.g., "Dogs are friendly") - are frequent in speech addressed to young children and constitute an important means of knowledge transmission. However, detecting generic meaning may be challenging for young children, since it requires attention to a multitude of morphosyntactic, semantic, and pragmatic cues. The first three experiments tested whether 3- and 4-year-olds use (a) the immediate linguistic context, (b) their previous knowledge, and (c) the social context to determine whether an utterance with ambiguous scope (e.g., "They are afraid of mice", spoken while pointing to 2 birds) is generic. Four-year-olds were able to take advantage of all the cues provided, but 3-year-olds were sensitive only to the first two. In Experiment 4, we tested the relative strength of linguistic-context cues and previous-knowledge cues by putting them in conflict; in this task, 4-year-olds, but not 3-year-olds, preferred to base their interpretations on the explicit noun phrase cues from the linguistic context. These studies indicate that, from early on, children can use contextual and semantic information to construe sentences as generic, thus taking advantage of the category knowledge conveyed in these sentences.
Article
I make three related proposals concerning the development of receptive communication in human infants. First, I propose that the presence of communicative intentions can be recognized in others' behaviour before the content of these intentions is accessed or inferred. Second, I claim that such recognition can be achieved by decoding specialized ostensive signals. Third, I argue on empirical bases that, by decoding ostensive signals, human infants are capable of recognizing communicative intentions addressed to them. Thus, learning about actual modes of communication benefits from, and is guided by, infants' preparedness to detect infant-directed ostensive communication.
Article
Many adult beliefs are based on the testimony provided by other people rather than on firsthand observation. Children also learn from other people's testimony. For example, they learn that mental processes depend on the brain, that the earth is spherical, and that hidden bodily organs constrain life and death. Such learning might indicate that other people's testimony simply amplifies children's access to empirical data. However, children's understanding of God's special powers and the afterlife shows that their acceptance of others' testimony extends beyond the empirical domain. Thus, children appear to conceptualize unobservable scientific and religious entities similarly. Nevertheless, some children distinguish between the 2 domains, arguably because a different pattern of discourse surrounds scientific as compared to religious entities.
OpenSubtitles2016: Extracting large parallel corpora from movie and TV subtitles
  • P Lison
  • J Tiedemann
  • Lison P.
P. Lison, J. Tiedemann, "OpenSubtitles2016: Extracting large parallel corpora from movie and TV subtitles" in Proceedings of the 10th Annual Conference on Language Resources and Evaluation, N. Calzolari et al., Eds. (European Language Resources Association, Paris, 2016), pp. 923-929.