Article
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Whether a word was bawled, whispered, or yelped, captions will typically represent it in the same way. If they are your only way to access what is being said, subjective nuances expressed in the voice will be lost. Since so much of communication is carried by these nuances, we posit that if captions are to be used as an accurate representation of speech, embedding visual representations of paralinguistic qualities into captions could help readers use them to better understand speech beyond its mere textual content. This paper presents a model for processing vocal prosody (its loudness, pitch, and duration) and mapping it into visual dimensions of typography (respectively, font-weight, baseline shift, and letter-spacing), creating a visual representation of these lost vocal subtleties that can be embedded directly into the typographical form of text. An evaluation was carried out where participants were exposed to this speech-modulated typography and asked to match it to its originating audio, presented between similar alternatives. Participants (n=117) were able to correctly identify the original audios with an average accuracy of 65%, with no significant difference when showing them modulations as animated or static text. Additionally, participants’ comments showed their mental models of speech-modulated typography varied widely.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Previous research has highlighted the adverse impact that this absence of paralinguistic cues has on the viewing experience of captioned content among dhh individuals [24,32,39]. Addressing this, different researchers have explored approaches to convey paralinguistic information through stylistic modulations in typography [9,22,23,62,79]. Much of this prior work has focused on conveying aspects of speech such as pitch, rhythm, or loudness, i.e., prosody. ...
... Yet, little is known about how to actually design affective captions. This stems from there being a gap in systematic explorations of their design space, which is different from that of prosodic captions, e.g., [22,79]. What little research there is (e.g., [39]) does not focus primarily on dhh individuals' perspectives on what caption styles are preferred and perform better at conveying emotions. ...
... This gave us a timestamp for when each word in the transcript starts and ends. With these timestamps, we isolated the audio excerpts for each word, which were processed through a transformer-based neural network [70] to obtain values of valence and arousal, emotional components as defined by the circumplex model of emotion [60]. 1 These two values, which are the final output of this process, as illustrated in Figure 2, were then normalized [22] and annotated in a caption file [21]. ...
Conference Paper
Full-text available
Affective captions employ visual typographic modulations to convey a speaker's emotions, improving speech accessibility for Deaf and Hard-of-Hearing (DHH) individuals. However, the most effective visual modulations for expressing emotions remain uncertain. Bridging this gap, we ran three studies with 39 DHH participants, exploring the design space of affective captions, which include parameters like text color, boldness, size, and so on. Study 1 assessed preferences for nine of these styles, each conveying either valence or arousal separately. Study 2 combined Study 1's top-performing styles and measured preferences for captions depicting both valence and arousal simultaneously. Participants outlined readability, minimal distraction, intuitiveness, and emotional clarity as key factors behind their choices. In Study 3, these factors and an emotion-recognition task were used to compare how Study 2's winning styles performed versus a non-styled baseline. Based on our findings, we present the two best-performing styles as design recommendations for applications employing affective captions.
... In human-computer interaction (HCI), typography and typefaces have played crucial roles in elevating visual interest in B Weitao You Weitao_you@zju.edu.cn 1 graphic design. Recently, sound visualization (e.g., voice and speech) through typography modulation has garnered significant attention [1,[3][4][5][6][7]. This approach can provide a more profound interpretation of sound, potentially serving as a tool for hard-of-hearing individuals to experience auditory content [1,8]. ...
... Research exploring the connections between sound and typography and bridging the gap between auditory and visual perception is expanding. Prior studies have primarily focused on establishing connections between speech and typography, such as transforming acoustic cues in speech (or voice) into visual modulations of typography [3,4,7,9,10]. To modulate typography, typeface features (i.e., character size and character distance) are adjusted based on speech features (i.e., loudness and pitch). ...
... Thus, speech features and character features may possess a natural mapping relationship. Indeed, several previous studies have demonstrated how mapping speech audio features to typography can evoke people's perception of auditory and visual senses, e.g., mapping loudness, pitch, and duration (speech audio feature) to font weight, baseline shift, and letter spacing, respectively [4]; mapping loudness, pitch, and speed (voice audio feature) to vertical stroke weight, horizontal stroke weight, and character width, respectively [7]. The mapping connections from sound features to typography features are readily identifiable, and the relationship between auditory and visual perception is evident. ...
Article
Full-text available
In human–computer interaction (HCI), typography was initially used for visual communication, which enhanced visual interest in graphic design. The investigation of how modulating visual elements (e.g., typography) to visualize sound (e.g., voice) has received substantial attention. Musical lyrics typography is a commonly used form of visual communication. However, the mapping of musical features to lyrics typography features is rarely diverse studied. And modulating lyrics typography from musical features by a certain model may not strongly arouse people’s perception of auditory and visual connections. In this paper, we first proposed several models of modulating typography from musical features. Then, we investigated which model can modulate lyrics typography well to visualize music. Finally, the experiment results show that the lyrics typography by the mapping of musical features (loudness, pitch, and duration) and typography feature (character size, baseline shift, and character width) can better arouse people’s perception of auditory and visual connection. And the lyrics typography modulated by a moderate mapping parameter can evoke people’s high visual aesthetic preference. We aspire for our work to offer a novel perspective on music visualization, assisting the hard-of-hearing people to experience musical content.
Conference Paper
Speech is expressive in ways that caption text does not capture, with emotion or emphasis information not conveyed. We interviewed eight Deaf and Hard-of-Hearing (DHH) individuals to understand if and how captions’ inexpressiveness impacts them in online meetings with hearing peers. Automatically captioned speech, we found, lacks affective depth, lending it a hard-to-parse ambiguity and general dullness. Interviewees regularly feel excluded, which some understand is an inherent quality of these types of meetings rather than a consequence of current caption text design. Next, we developed three novel captioning models that depicted, beyond words, features from prosody, emotions, and a mix of both. In an empirical study, 16 DHH participants compared these models with conventional captions. The emotion-based model outperformed traditional captions in depicting emotions and emphasis, with only a moderate loss in legibility, suggesting its potential as a more inclusive design for captions.
Article
The papers in this special section focus on affective speech and language synthesis, generation, and conversion. As an inseparable and crucial part of spoken language, emotions play a substantial role in human-human and human-technology conversation. They convey information about a person’s needs, how one feels about the objectives of a conversation, the trustworthiness of one’s verbal communication, and more. Accordingly, substantial efforts have been made to generate affective text and speech for conversational AI, artificial storytelling, and machine translation. Similarly, there is a push for converting the affect in text and speech, ideally, in real-time and fully preserving intelligibility, e. g., to hide one’s emotion, for creative applications and in entertainment, or even to augment training data for affect analyzing AI.
Conference Paper
Full-text available
Diversification of fonts in video captions based on the voice characteristics, namely loudness, speed and pauses, can affect the viewer receiving the content. This study evaluates a new method, WaveFont, which visualizes the voice characteristics for captions in an intuitive way. The study was specifically designed to test captions, which aims to add a new experience for Arabic viewers. The results indicate that our visualization is comprehensible and acceptable and provides significant added value—for hearing-impaired and non-hearing impaired participants: Significantly more participants stated that WaveFont improves their watching experience more than standard captions.
Article
Full-text available
The ability to write, hence to preserve and share arbitrary words and thoughts, was one of the most important breakthroughs in the history of mankind. It laid the technological basis for what we perceive today as culture, science and, in good part, economy. Nonetheless, writing can encompass much more than just words, and this is an integral, but often overlooked part of it. Until very recently, writing was necessarily bound to the physical medium on which it was written or into which it was inscribed. The physicality of the medium interacted with and often enhanced the purely textual message. These features, which go beyond the encoding of words, are the secondary characteristics of writing systems. They include, but are not limited to typography, and often serve, consciously or not, the transmission of additional messages beyond the purely textual content. If the study of writing itself is still largely in its infancy, this is even more true for the study of secondary characteristics, which is an integral part of grammatology. Beginning with a taxonomy of these secondary characteristics, this article looks in more detail at two non-typographical characteristics, namely ordering and punctuation. This short sketch of a cultural history of ordering and punctuation begins with the role of ordering in the initial invention of writing over its use across the millennia. It ends with the contemporary use of special punctuation marks to encode emotions.
Conference Paper
Full-text available
Typography is considered by many authors the visual representation of language, and through writing, the human being found a way to register and share information. Throughout the history of typography, there were many authors that explored this connection between words and their sounds, trying to narrow the gap between what we say, how we hear it and how it should be read. We introduce “Máquina de Ouver”, a system that analyses speech recordings and creates a visual representation for its expressiveness, using typographic variables and composition. Our system takes advantage of the scripting capabilities of Praat, Adobe's InDesign and Adobe's After Effects software to retrieve the sound features from speech recordings and dynamically creates a typographic composition that results in a static artefact as a poster or a dynamic one, such as a video. The majority of our experimentation process uses poetry performances as the system input, since this can be one of the most dynamic and richest forms of speech in terms of expressiveness.
Article
Full-text available
Type is not expressive enough. Even the youngest speakers are able to express a full range of emotions with their voice, while young readers read aloud monotonically as if to convey robotic boredom. We augmented type to convey expression similarly to our voices. Specifically, we wanted to convey in text words that are spoken louder, words that drawn out and spoken longer, and words that are spoken at a higher pitch. We then asked children to read sentences with these new kinds of type to see if children would read these with greater expression. We found that children would ignore the augmentation if they weren’t explicitly told about it. But when children were told about the augmentation, they were able to read aloud with greater vocal inflection. This innovation holds great promise for helping both children and adults to read aloud with greater expression and fluency.
Article
Full-text available
Human emotions unfold over time, and more affective computing research has to prioritize capturing this crucial component of real-world affect. Modeling dynamic emotional stimuli requires solving the twin challenges of time-series modeling and of collecting high-quality time-series datasets. We begin by assessing the state-of-the-art in time-series emotion recognition, and we review contemporary time-series approaches in affective computing, including discriminative and generative models. We then introduce the first version of the Stanford Emotional Narratives Dataset (SENDv1): a set of rich, multimodal videos of self-paced, unscripted emotional narratives, annotated for emotional valence over time. The complex narratives and naturalistic expressions in this dataset provide a challenging test for contemporary time-series emotion recognition models. We demonstrate several baseline and state-of-the-art modeling approaches on the SEND, including a Long Short-Term Memory model and a multimodal Variational Recurrent Neural Network, which perform comparably to the human-benchmark. We end by discussing the implications for future research in time-series affective computing.
Conference Paper
Full-text available
This paper aims to strengthen the link between acoustic and perceptual representations of intonation, a link that has been weakened by the over-reliance on the F0 trajectory, which can only be interpreted in relation to landmarks in the segmental string, placed manually or semi-automatically at a separate stage in the analysis. Only then can F0 events be identified as linguistically relevant (e.g. early, medial or late peaks, accentual tones or edge tones etc.). We provide an analysis and visualization of two acoustic dimensions contributing towards the perceived pitch contour, F0 over time and, crucially, periodic energy. Periodic energy reflects the degree to which pitch is intelligible, a higher value representing a stronger F0 signal that is consequently more easily perceived. A representation of F0 that includes periodic energy is thus able to flag portions of the speech signal that are relevant for the analysis of intonation, without the need for a separate segmentation of the signal into phones and syllables. Index Terms: intonation, pitch perception, periodic energy, tonal alignment, segmentation, data visualization, sonority
Conference Paper
Full-text available
This paper presents an open-source tool that has been developed to visualize a speech corpus with its transcript and prosodic features aligned at word level. In particular, the tool is aimed at providing a simple and clear way to visualize prosodic patterns along large segments of speech corpora, and can be applied in any research that involves prosody analysis.
Article
Full-text available
This study was conducted to investigate whether the listeners' culture and mother language influence the perception of emotions through speech and which acoustic cues listeners use in this process. Swedish and Brazilian listeners were presented with authentic emotional speech samples of Brazilian Portuguese and Swedish. They judged on 5-point Likert scales the expression of basic emotions as described by eight adjectives in the utterances in Brazilian Portuguese and the expression of five emotional dimensions in the utterances in Swedish. The PCA technique revealed that two components explain more than 94% of the variance of the judges' responses in both experiments. These components were predicted through multiple linear regressions from twelve acoustic parameters automatically computed from the utterances. The results point to a similar perception of the emotions between both cultures.
Article
Full-text available
Video captions, also known as same-language subtitles, benefit everyone who watches videos (children, adolescents, college students, and adults). More than 100 empirical studies document that captioning a video improves comprehension of, attention to, and memory for the video. Captions are particularly beneficial for persons watching videos in their non-native language, for children and adults learning to read, and for persons who are D/deaf or hard of hearing. However, despite U.S. laws, which require captioning in most workplace and educational contexts, many video audiences and video creators are naïve about the legal mandate to caption, much less the empirical benefit of captions.
Article
Full-text available
Systems and services developed for elderly or disabled people often find useful applications for their able-bodied counterparts—a few examples are mobile amplification control, which was originally developed for people with hearing problems but is helpful in noisy environments, audio cassette versions of books originally developed for blind people, standards of subtitling in television for deaf users and so on. In this study, we evaluate how prediction from a user model developed for physically impaired users can work in situational impairment.
Conference Paper
Full-text available
With voice driven type design (VDTD), we introduce a novel concept to present written information in the digital age. While the shape of a single typographical character has been treated as an unchangeable property until today, we present an innovative method to adjust the shape of each single character according to particular acoustic features in the spoken reference. Thereby, we allow to keep some individuality and to gain additional value in written text which offers different applications – providing meta-information in subtitles and chats, supporting deaf and hearing impaired people, illustrating intonation and accentuation in books for language learners, giving hints how to sing – up to artistic expression. By conducting a user study we have demonstrated that – using our proposed approach – loudness, pitch and speed can be represented visually by changing the shape of each character. By complementing homogeneous type design with these parameters, the original intention and characteristics of the speaker (personal expression and intonation) are better supported.
Article
Full-text available
Work on voice sciences over recent decades has led to a proliferation of acoustic parameters that are used quite selectively and are not always extracted in a similar fashion. With many independent teams working in different research areas, shared standards become an essential safeguard to ensure compliance with state-of-the-art methods allowing appropriate comparison of results across studies and potential integration and combination of extraction and recognition systems. In this paper we propose a basic standard acoustic parameter set for various areas of automatic voice analysis, such as paralinguistic or clinical speech analysis. In contrast to a large brute-force parameter set, we present a minimalistic set of voice parameters here. These were selected based on a) their potential to index affective physiological changes in voice production, b) their proven value in former studies as well as their automatic extractability, and c) their theoretical significance. The set is intended to provide a common baseline for evaluation of future research and eliminate differences caused by varying parameter sets or even different implementations of the same parameters. Our implementation is publicly available with the openSMILE toolkit. Comparative evaluations of the proposed feature set and large baseline feature sets of INTERSPEECH challenges show a high performance of the proposed set in relation to its size.
Article
Full-text available
O trabalho tem a finalidade de apresentar aspectos teóricos emetodológicos da área de prosódia da fala para servir de ponto departida para o pesquisador iniciante. A partir da significaçãoplatônica do termo “prosódia” desenvolve as acepções ligadas aotermo na pesquisa científica contemporânea, apresentando noçõescomo proeminência, fronteira prosódica, acento frasal, grupoacentual, foco, ênfase, entoação e ritmo. Aspectos metodológicosda área como a montagem de corpora de fala de laboratório comrelação com a fala espontânea, a normalização da duração silábicae a diferenciação da prosódia na produção da prosódia na percepçãoda fala são discutidos com o fim de apresentar algumas questõesde relevo para a área de pesquisa.
Article
Full-text available
This paper introduces a new vowel notation system aimed at aiding the teaching of English pronunciation. This notation system, designed as an enhancement to orthographic text, was designed to use concepts borrowed from the representation of musical notes and is also linked to the acoustic characteristics of vowel sounds. Vowel timbre is represented in terms of the height of the symbol and vowel duration in terms of the length of the symbol. The Speechant system was evaluated in EFL adult education classes in Portugal. A formal assessment that measured the impact of a term's tuition by looking at changes in accent ratings of the learners over that period showed that the group taught using the Speechant system showed greater improvements in pronunciation than the control group. Speechant may be an especially useful aid to pronunciation teaching in situations in which foreign languages are taught without the benefit of technological support.
Article
Full-text available
Thematic analysis is a poorly demarcated, rarely acknowledged, yet widely used qualitative analytic method within psychology. In this paper, we argue that it offers an accessible and theoretically flexible approach to analysing qualitative data. We outline what thematic analysis is, locating it in relation to other qualitative analytic methods that search for themes or patterns, and in relation to different epistemological and ontological positions. We then provide clear guidelines to those wanting to start thematic analysis, or conduct it in a more deliberate and rigorous way, and consider potential pitfalls in conducting thematic analysis. Finally, we outline the disadvantages and advantages of thematic analysis. We conclude by advocating thematic analysis as a useful and flexible method for qualitative research in and beyond psychology.
Article
Full-text available
Prosodic elements such as stress and intonation are generally seen as providing both ‘natural’ and properly linguistic input to utterance comprehension. They contribute not only to overt communication but to more covert or accidental forms of information transmission. They typically create impressions, convey information about emotions or attitudes, or alter the salience of linguistically-possible interpretations rather than conveying distinct propositions or concepts in their own right. These aspects of communication present a challenge to pragmatic theory: how should they be described and explained? This paper is an attempt to explore how the wealth of insights provided by the literature on the interpretation of prosody might be integrated into the relevance-theoretic framework (Sperber and Wilson, 1986/1995; Blakemore, 2002; Carston, 2002). We will focus on four main issues. First, how should the communication of emotions, attitudes and impressions be analysed? Second, how might prosodic elements function as ‘natural’ communicative devices? Third, what (if anything) do prosodic elements encode? Fourth, what light can the study of prosody shed on the place of pragmatics in the architecture of the mind? In each case, we hope to show that the study of prosody and the study of pragmatics can interact in ways that benefit both disciplines.
Conference Paper
Full-text available
While affective computing explicitly challenges the primacy of rationality in cognitivist accounts of human activity, at a deeper level it relies on and reproduces the same information-processing model of cognition. In affective computing, affect is often seen as another kind of information - discrete units or states internal to an individual that can be transmitted in a loss-free manner from people to computational systems and back. Drawing on cultural, social, and interactional critiques of cognition which have arisen in HCI, we introduce and explore an alternative model of emotion as interaction: dynamic, culturally mediated, and socially constructed and experienced. This model leads to new goals for the design and evaluation of affective systems - instead of sensing and transmitting emotion, systems should support human users in understanding, interpreting, and experiencing emotion in its full complexity and ambiguity.
Article
Full-text available
The proliferation of speech recognition as input to Computer Mediated Communication (CMC) systems opens up new possibilities for the design of typographic forms. Designers can use the musical expressiveness of the speaking voice to shape letterforms in real time. Letters formed by speech are more representative of the emotional and contextualized person speaking than are fonts now. Prosodic Font is an object-oriented font that assumes a dynamic, temporal form. It emulates the tonal and rhythmic motion in the speaking voice. Preliminary user testing results show that people are able to identify Prosodic Fonts as representative of particular prosodic variations.
Article
Full-text available
We present a straightforward and robust algorithm for periodicity detection, working in the lag (autocorrelation) domain. When it is tested for periodic signals and for signals with additive noise or jitter, it proves to be several orders of magnitude more accurate than the methods commonly used for speech analysis. This makes our method capable of measuring harmonics-to-noise ratios in the lag domain with an accuracy and reliability much greater than that of any of the usual frequency-domain methods. By definition, the best candidate for the acoustic pitch period of a sound can be found from the position of the maximum of the autocorrelation function of the sound, while the degree of periodicity (the harmonics-to-noise ratio) of the sound can be found from the relative height of this maximum. However, sampling and windowing cause problems in accurately determining the position and height of the maximum. These problems have led to inaccurate timedomain and cepstral methods for p...
Article
Real-time captioning is a critical accessibility tool for many d/Deaf and hard of hearing (DHH) people. While the vast majority of captioning work has focused on formal settings and technical innovations, in contrast, we investigate captioning for informal, interactive small-group conversations, which have a high degree of spontaneity and foster dynamic social interactions. This paper reports on semi-structured interviews and design probe activities we conducted with 15 DHH participants to understand their use of existing real-time captioning services and future design preferences for both in-person and remote small-group communication. We found that our participants' experiences of captioned small-group conversations are shaped by social, environmental, and technical considerations (e.g., interlocutors' pre-established relationships, the type of captioning displays available, and how far captions lag behind speech). When considering future captioning tools, participants were interested in greater feedback on non-speech elements of conversation (e.g., speaker identity, speech rate, volume) both for their personal use and to guide hearing interlocutors toward more accessible communication. We contribute a qualitative account of DHH people's real-time captioning experiences during small-group conversation and future design considerations to better support the groups being captioned, both in person and online.?
Article
The acoustic analysis helps to discriminate emotions according to non-verbal information, while linguistics aims to capture verbal information from written sources. Acoustic and linguistic analyses can be addressed for different applications, where information related to emotions, mood, or affect are involved. The Arousal-Valence plane is commonly used to model emotional states in a multidimensional space. This study proposes a methodology focused on modeling the user's state based on the Arousal-Valence plane in different scenarios. Acoustic and linguistic information are used as input to feed different deep learning architectures mainly based on convolutional and recurrent neural networks, which are trained to model the Arousal-Valence plane. The proposed approach is used for the evaluation of customer satisfaction in call-centers and for health-care applications in the assessment of depression in Parkinson's disease and the discrimination of Alzheimer's disease. F-scores of up to 0.89 are obtained for customer satisfaction, of up to 0.82 for depression in Parkinson's patients, and of up to 0.80 for Alzheimer's patients. The proposed approach confirms that there is information embedded in the Arousal-Valence plane that can be used for different purposes.
Chapter
Developmental dyslexia is a specific learning disability that is characterized by severe difficulties in learning to read. Amongst various supporting technologies, there are typefaces specially designed for readers with dyslexia. Although recent research shows the effectiveness of these typefaces, the visual characteristics of these typefaces that are good for readers with dyslexia are yet to be revealed.
Conference Paper
Deaf and hard of hearing (DHH) individuals face barriers to communication in small-group meetings with hearing peers; we examine generation of captions on mobile devices by automatic speech recognition (ASR). While ASR output displays errors, we study whether such tools benefit users and influence conversational behaviors. An experiment was conducted where DHH and hearing individuals collaborated in discussions in three conditions (without an ASR-based application, with the application, and with a version indicating words for which the ASR has low confidence). An analysis of audio recordings, from each participant across conditions, revealed significant differences in speech features. When using the ASR-based automatic captioning application, hearing individuals spoke more loudly, with improved voice quality (harmonics-to-noise ratio), with a non-standard articulation (changes in F1 and F2 formants), and at a faster rate. Identifying non-standard speech in this setting has implications on the composition of data used for ASR training/testing, which should be representative of its usage context. Understanding these behavioral influences may also enable designers of ASR captioning systems to leverage these effects, to promote communication success.
Conference Paper
Automatic Speech Recognition (ASR) and the wide use of smart phones and their apps have allowed huge inroads when preparing deaf and hard-of-hearing (D/HH) students to be effective and productive in the hearing workplace. This paper presents both a hearing instructor's experiences and a deaf researcher's observations when preparing deaf and hard of hearing students as computer technicians for the hearing work place.
Chapter
Intonational Grammar in Ibero-Romance: Approaches across linguistic subfields is a volume of empirical research papers incorporating recent theoretical, methodological, and interdisciplinary advances in the field of intonation, as they relate to the Ibero-Romance languages. The volume brings together leading experts in Catalan, Portuguese, and Spanish, as well as in the intonation of Spanish in contact situations. The common thread is that each paper examines a specific topic related to the intonation of at least one Ibero-Romance language, framing the analysis in an experimental setting. The novel findings of each chapter hinge on critical connections that are made between the study of intonation and its related fields of linguistic inquiry, including syntax, pragmatics, sociophonetics, language acquisition and special populations. In this sense, the volume expands the traditional scope of Ibero-Romance intonation, including in it work on signed languages (LSC), individuals with autism spectrum disorder and individuals with Williams Syndrome. This volume establishes the precedent for researchers and advanced students who wish to explore the complexities of Ibero-Romance intonation. It also serves as a showcase of the most up-to-date methodologies in intonational research.
Chapter
The first part of this chapter (Hearers and Readers) gives a pr?cis of the prevalent reading habits and how they determined the way Greek literature reached its audience. Questions include: silent reading vs. reading aloud, the relevance of delivery, the phonetic quality of literature, the difficulties of reconstructing ancient reader(ship)s, reading as entertainment and instruction. The second part (Scholars and Interpreters) focuses on a particular group of readers and summarises the main factors that enabled scholarship. The chapter is rounded off by a selection of twenty tenets which intends to give a brief overview of major principles of ancient literary criticism.
Article
Communication in the twenty-first century is increasingly devoid of the verbal and visual cues that have been proven to be critical in conveying and interpreting meaning. Communicating with clarity is the challenge imposed with text-based forms of dialogues, and this becomes more and more important as a greater number of our personal and professional exchanges are transacted via text-based methods of delivery. Use of email, text messaging (texting), instant messaging (IM), discussion forums, and social media platforms such as Facebook and Twitter are increasingly routine and rapidly becoming the norm for many day-to-day interactions. This article proposes that a new set of typographic elements needs to be developed that extends our written vocabulary, complementing and improving the communication opportunities that technology – and its increasingly text-based forms of interactions – offers. Voice-less and face-less interactions create enormous challenges when expressing and interpreting meaning and intent. A viable system of new punctuation that supports brevity and clarity can be developed utilizing existing typographic glyphs, making implementation of new marks convenient and immediate. Unlike emoticons, and texting acronyms/abbreviations, new punctuation marks might have the advantage of being appropriate for personal and professional dialogues. This argument introduces evidence across a broad temporal canvas. It examines our current communication environment, follows with a historical review of punctuation; continues through to a 2013 survey of high school students; and concludes with a recommendation around the number and type of new punctuations necessary to advance the clarity of our text-based dialogues.
Article
Book historians have long insisted that silent reading was a rare or nonexistent practice in classical antiquity. This belief runs counter to much research in the field of Classics. In this paper, I offer a critical review of the evidence and the scholarship concerning reading in antiquity before offering an explanation for these conflicting views on the subject. I argue that this debate reveals much about the epistemology of book history as a discipline; in addition, I demonstrate how a more nuanced understanding of ancient reading culture can have surprising implications for the future of the book.
Conference Paper
In this paper the dynamics of prosodic parameters are explored for recognizing the emotions from speech. The dynamics of prosodic parameters refer to local or fine variations in prosodic parameters with respect to time. The proposed dynamic features of prosody are represented by : (1) sequence of durations of syllables in the utterance (duration contour), (2) sequence of fundamental frequency values (pitch contour) and (3) sequence of frame energy values (energy contour). Indian Institute of Technology Kharagpur Simulated Emotion Speech Corpus (IITKGP-SESC) is used for analyzing the proposed prosodic features for recognizing the emotions [1]. The emotions considered in this work are anger, disgust, fear, happiness neutral and sadness. Support vector machines (SVM) are explored to discriminate the emotions using the proposed prosodic features. Emotion recognition performance is analyzed separately, using duration patterns of the sequence of syllables, pitch contours and energy contours, and their recognition performance is observed to be 64%, 67% and 53% respectively. Fusion techniques are explored at feature and score levels. The performance of the fusion-based emotion recognition systems is observed to be 69% and 74% for feature and score level fusions,respectively.
Article
Emotion recognition from speech has emerged as an important research area in the recent past. In this regard, review of existing work on emotional speech processing is useful for carrying out further research. In this paper, the recent literature on speech emotion recognition has been presented considering the issues related to emotional speech corpora, different types of speech features and models used for recognition of emotions from speech. Thirty two representative speech databases are reviewed in this work from point of view of their language, number of speakers, number of emotions, and purpose of collection. The issues related to emotional speech databases used in emotional speech recognition are also briefly discussed. Literature on different features used in the task of emotion recognition from speech is presented. The importance of choosing different classification models has been discussed along with the review. The important issues to be considered for further emotion recognition research in general and in specific to the Indian context have been highlighted where ever necessary.
Article
An assessment was made of the impact of captions on hearing-impaired students' affective reactions to a version of a popular children's television program, Shazam. Forty-two hearing-impaired children ranging in age from 8 to 12 years old were randomly divided into 2 groups; 1 group viewed the program without captions and the other viewed it with captions at a very simple language level. During and following the presentation the children rated their perception of the characters' emotions and personality traits. They also indicated their liking of various program scenes, and gave their predictions concerning how characters would behave in new but similar situations. Results indicated that captions seemed to enhance hearing-impaired children's abilities to perceive the emotional complexity of presented information. The implications are discussed in the following review.
Article
by Tara Michelle Graber Rosenberger.
Article
Email has become a central communication channel for private and professional exchange. Its format remains equally neutral regardless of the relation to the recipient. While writing remains an excellent vehicle to communicate tone and emotion, this can sometimes be a painstaking and tedious process, and requires considerable skill.EmoteMail is an email client that is augmented to convey aspects of the writing context to the recipient. The client captures facial expressions and typing speed and introduces them as design elements. These contextual cues provide extra information that can help the recipient decode the tone of the mail. Moreover, the contextual information is gathered and automatically embedded as the sender composes the email, allowing an additional channel of expression.Hugo Lius EmpathyBuddy [Liu et al. 2002] and the chillies in email client Eudora also attempt to give email contextual tone by analysing the textual content of the message. While these approaches succeed in affectively annotating the email message without any extra effort, we feel that they fail to consider the personality of the sender and have no comprehension of the writing context. For example, sarcasm could be misinterpreted by both of these approaches as negative affect.In contrast, our approach relies on the relationship between the recipient and sender. The recipient can examine the senders facial expressions as well as the paragraphs the sender spent the most time writing. We think that this extra information helps convey tone especially when the recipient is familiar with the senders non- verbal expressions.We use subtle graphic design elements to relate facial expression and typing speed, leaving the textual message as the main focus. We adopt Edward Tuftes approach in information design, suggesting that when reading data-rich content, one needs prior understanding of its language. In this case, we rely on commonly shared understanding of non-verbal language. One drawback of making the contextual features more subtle is that they will increasingly assume greater familiarity between communicators. It is interesting to consider how the choice of subtle cues compares to more attention grabbing techniques.Dynamic typography is one technique that is also used for attaching emotions and personality to written language. Text that scales, moves, changes colour, typeface and proportions, has been successfully demonstrated in a poetic context [Forlizzi et al. 2003; Bodine and Pignol 2003]. We think the drawback of this approach is that the written content of the text recedes to the background and becomes more difficult to access. We feel that in email the text should be at the forefront, with the contextual cues playing a supporting role.Instead of dynamically modifying individual characters, EmoteMail relates contextual cues to each paragraph. The prototype uses a camera and a timer as sensor inputs to capture additional information relating to the paragraph. Reminiscent of the commonly used smileys, the EmoteMail client annotates every paragraph with a small black and white thresholded image of the face of the writer. Each paragraph also includes a background colour representing how much time the paragraph took to compose.If the sender changes their facial expression (perhaps as part of an emotionally meaningful communication) then the small camera grab beside each paragraph reflects this. By capturing a snapshot of the face of the writer with every paragraph, the system attempts to display the fluctuation of the emotions throughout the message, rather than attempting to summarize the whole message as a certain mood.Likewise, the time spent on writing a paragraph is measured in relation to all the other paragraphs written. Typing speed may denote more thought having been put into crafting a paragraph. The typing speed can show what parts of the message have been copied and pasted and it might also suggest the attention level of the person writing the message. Ultimately, the interpretation is left to the recipient.Informal testing of EmoteMail has been informative. While there are some issues to be resolved, the initial response has been largely positive. The ease of usage and the automatic capturing of the context is appreciated. However, the clients live video preview depicting the senders face can be distracting. This raises some good design questions: should senders be given the possibility to edit the contextual features? Or should senders see only the plain text? We are also considering a large range of other contextual inputs for future versions, ranging from pressure and skin conductivity sensors to audio analysis and reading of the state of the computer at the time of writing. We are also considering rewriting EmoteMail as a plug-in to a popular email client to facilitate more real life user studies. The primary benefit of EmoteMail is that it will allow us to conduct further research to understand if contextual information in email improves the quality of communication in a measurable way.
Article
In September 2004 Punchcut worked with QUALCOMM to develop a typographic strategy with respect to QUALCOMM's custom user interfaces within its mobile operating system and applications. The strategy's first tangible expression was the design of a custom family of sans serif fonts to be used in current and future QUALCOMM mobile user interfaces.The four month project entailed assessing the impact of mobile devices on digital typography, identifying key requirements that would guide the design and application of QUALCOMM's custom interface font family, designing to meet business and customer needs within tight technical constraints, and testing to validate design decisions.The demand for custom fonts will increase apace with the desire to deliver more content and increasingly customized experiences on mobile devices. Significant user research on font usage and the impact of typography in the mobile user experience will be required to guide informed, user-centered font design.
Article
Recently, increasing attention has been directed to the study of the emotional content of speech signals, and hence, many systems have been proposed to identify the emotional content of a spoken utterance. This paper is a survey of speech emotion classification addressing three important aspects of the design of a speech emotion recognition system. The first one is the choice of suitable features for speech representation. The second issue is the design of an appropriate classification scheme and the third issue is the proper preparation of an emotional speech database for evaluating system performance. Conclusions about the performance and limitations of current speech emotion recognition systems are discussed in the last section of this survey. This section also suggests possible ways of improving speech emotion recognition systems.
Conference Paper
Kinetic (dynamic) typography has demonstrated the ability to add significant emotive content and appeal to expressive text, allowing some of the qualities normally found in film and the spoken word to be added to static text. Kinetic typography has been widely and successfully used in film title sequences as well as television and computer-based advertising. However, its communicative abilities have not been widely studied, and its potential has rarely been exploited outside these areas. This is partly due to the difficulty in creating kinetic typography with current tools, often requiring hours of work to animate a single sentence.In this paper, we present the Kinedit system, a basic authoring tool that takes initial steps toward remedying this situation and hence promoting exploration of the communicative potential of kinetic typography for personal communication. Kinedit is informed by systematic study and characterization of a corpus of examples, and iterative involvement and validation by designers throughout the development process. We describe the tool and its underlying technology, usage experiences, lessons learned, and next steps.
Conference Paper
Kinetic Typography, text whose appearance changes over time, is emerging as a new form of expression due to its ability to add emotional content to text. We explored the potential for kinetic typography to improve the way people communicate over the Internet using Instant Messaging (IM). Our Kinetic Instant Messenger (KIM) builds upon applications for rendering and editing kinetic typography effects and addresses several design issues that spring from integrating kinetic typography and IM.
Article
Closed captioning has suffered from a lack of innovation since its inception in the early 1970s. However, television and film technologies and user preferences have changed dramatically. Sound from music, sound effects, and speech prosody are essentially missing from current closed captions. We used animated text to represent emotions contained in music and speech as well as sound effects. Twenty-five hard of hearing and hearing participants watched two short television clips with three different types of captions—conventional, enhanced, and extreme. Hard of hearing and hearing participants preferred enhanced, animated text captions as they provide improved access to the emotive information contained in the content. Text-based animated sound effects confused participants and animated symbols were recommended as a replacement.
Article
How we design and evaluate for emotions depends crucially on what we take emotions to be. In affective computing, affect is often taken to be another kind of information - discrete units or states internal to an individual that can be transmitted in a loss-free manner from people to computational systems and back. While affective computing explicitly challenges the primacy of rationality in cognitivist accounts of human activity, at a deeper level it often relies on and reproduces the same information-processing model of cognition. Drawing on cultural, social, and interactional critiques of cognition which have arisen in HCI, as well as anthropological and historical accounts of emotion, we explore an alternative perspective on emotion as interaction: dynamic, culturally mediated, and socially constructed and experienced. We demonstrate how this model leads to new goals for affective systems - instead of sensing and transmitting emotion, systems should support human users in understanding, interpreting, and experiencing emotion in its full complexity and ambiguity. In developing from emotion as objective, externally measurable unit to emotion as experience, evaluation, too, alters focus from externally tracking the circulation of emotional information to co-interpreting emotions as they are made in interaction.
Article
Crowding, the adverse spatial interaction due to proximity of adjacent letters, has been suggested as an explanation for slow reading in peripheral vision. The purpose of this study was to examine whether reading speed can be improved in normal peripheral vision by increasing the letter spacing. Also tested was whether letter spacing imposes a different limit on reading speed of small versus large print. Six normal observers read aloud single, short sentences presented on a computer monitor, one word at a time, by rapid serial visual presentation (RSVP). Reading speeds were calculated based on the RSVP exposure durations yielding 80% correctly read words. Letters were rendered in Courier, a fixed-width font. Testing was conducted at the fovea, 5 degrees and 10 degrees in the inferior visual field. The critical print size (CPS) was first determined for each observer by measuring reading speeds for four print sizes, using the standard letter spacing (center-to-center separation of adjacent letters; standard Courier spacing: 1.16 times the width of the lowercase x). Text was then presented at 0.8 x or 1.5x CPS, and reading speed was measured for five letter spacings, ranging from 0.5 times to 2 times the standard spacing. As expected, reading speed was highest at the fovea, decreased with eccentricity, and was faster for the larger print size. At all eccentricities and for both print sizes, reading speed increased with letter spacing, up to a critical letter spacing, and then either remained constant at the same reading speed or decreased slightly for larger letter spacings. The value of the critical letter spacing was very close to the standard letter spacing and did not depend on eccentricity or print size. Increased letter spacing beyond the standard size, which presumably decreases the adverse effect of crowding, does not lead to an increase in reading speed in central or peripheral vision.
Can I use: Variable fonts?
  • Deveria
A. Deveria, "Can I use: Variable fonts?," 2021. Accessed: Dec. 20, 2021. [Online]. Available: https://caniuse.com/#feat¼variable-fonts
Opentype specification version 1.8
  • Jacobs
M. Jacobs and P. Constable, "Opentype specification version 1.8," 2018. Accessed: Jan. 5, 2021. [Online]. Available: https://docs. microsoft.com/en-us/typography/opentype/otspec180/
Recursive and inter typefaces
  • Andersson
R. Andersson and S. Nixon, "Recursive and inter typefaces," Accessed: Jan. 5, 2022. [Online]. Available: https://fonts.google. com/share?selection.family¼Inter%7CRecursive