FIGURE 3 - available via license: Creative Commons Attribution 4.0 International
Content may be subject to copyright.
The samples used for the user studies: business cards (top-left), document IDs (top-center), lottery cards (top-right), papers (bottom). Although texts are different among comparable samples, they have the same number of character and special character for any of the data to be transcribed.
Source publication
We present CameraKeyboard, a text entry technique that uses smartphone cameras to extract and digitalise text from physical sources such as business cards and identification documents. After taking a picture of the text of interest, the smartphone recognises the text through OCR technologies (specifically, Google Cloud Vision API) and organises it...
Contexts in source publication
Context 1
... designed three different business cards, document IDs, lottery cards, and papers, so that participants transcribed a different text in each of the conditions (Fig. 3). This prevented participants from remembering the text in the subsequent conditions. Although the sample documents were all different, they had the same number of characters and special characters so that they could be ...
Context 2
... surpasses the physical keyboard only when more than 72 characters of natural language (i.e., "s" of "semiotica" in Fig. 13) have to be transcribed. Moreover, CameraKeyboard surpasses Google Keyboard only when more than 69 characters of natural language (i.e., "o" of "interpretacion" in Fig. 13) have to be transcribed. Otherwise, the other keyboards -at least under the perspective of task completion time -are more convenient than CameraKeyboard (see Fig. ...
Context 3
... surpasses the physical keyboard only when more than 72 characters of natural language (i.e., "s" of "semiotica" in Fig. 13) have to be transcribed. Moreover, CameraKeyboard surpasses Google Keyboard only when more than 69 characters of natural language (i.e., "o" of "interpretacion" in Fig. 13) have to be transcribed. Otherwise, the other keyboards -at least under the perspective of task completion time -are more convenient than CameraKeyboard (see Fig. ...
Context 4
... in Fig. 13) have to be transcribed. Moreover, CameraKeyboard surpasses Google Keyboard only when more than 69 characters of natural language (i.e., "o" of "interpretacion" in Fig. 13) have to be transcribed. Otherwise, the other keyboards -at least under the perspective of task completion time -are more convenient than CameraKeyboard (see Fig. ...
Context 5
... of all, we must point out that CameraKeyboard makes sense only when the text to transcribe is not too short. CameraKeyboard is quite fast for text entry tasks (see Figures 7, 9, 11 and 13), but a significant amount of time is required to take the picture. Therefore, the amount of time required to write short texts is usually lower than the time required to (1) enable the camera, (2) take a picture and (3) wait for text extraction; and this (long) process causes a small advantage for the other keyboards in terms of time. ...
Similar publications
Programming screencasts have become a pervasive resource on the Internet, which help developers learn new programming technologies or skills. The source code in programming screencasts is an important and valuable information for developers. But the streaming nature of programming screencasts (i.e., a sequence of screen-captured images) limits the...
In this paper, we propose an approach named psc2code to denoise the process of extracting source code from programming screencasts. First, psc2code leverages the Convolutional Neural Network based image classification to remove non-code and noisy-code frames. Then, psc2code performs edge detection and clustering-based image segmentation to detect s...
Citations
... Unfortunately, this scheme is highly susceptible to variations in the environment such as human motion in surrounding areas, changes in orientation and distance of transceivers. The camera-based keyboard often uses the image processing technique to track finger motion and the location of the fingertip for precise identification [11]- [13]. However, changes in the tilt of the device placement would affect the camera's viewing angle, which impairs the accuracy of keystroke recognition. ...
... 2) Performance Analysis: In this section, in order to describe our experimental results in detail, we introduce precision, recall, F1-score, and confusion matrix to describe our experimental results. The mathematical description of the precision and recall involved in the confusion matrix can be described in formula (10) and (11). ...
... Recall(m) = T P (T P + F N) (11) here, m means different types of keystrokes, and the keystroke "A" was used as an example to describe the four parameters (TP, TN, FP, and FN) of the above formula. TP (true-positive) is correctly classified as "A". ...
In this paper, we present an improved ring-type virtual keyboard scheme that can achieve impressive performance with only one smart ring on a finger of each hand. The smart ring integrates a 6-DoF Inertial Measurement Unit (IMU) and a 3-DoF magnetometer sensor for collecting motion data during typing. First, a new keyboard layout is designed, by changing the previous rectangular layout to an arc structure, this method increases the difference in attitude angle between adjacent keys, which greatly improved the keystroke recognition accuracy. Secondly, other than the attitude angle feature, we also adopt acceleration data, gyroscope data and magnetometer data to describe the subtle differences between different keystrokes motion. Then, feature importance evaluation and feature correlation analysis were used to select features with high contribution rate and low similarity to describe keystrokes. Finally, nine effective features were selected from the attitude angle and magnetometer data for the final keystroke recognition. By weighing the number of selected features, recognition speed and recognition accuracy of training models, the keystroke recognition speed can increase by nearly 4 times while ensuring 98.53% of the keystroke recognition accuracy. This new ring-type virtual keyboard input scheme has the advantages in portability, small volume, and lower cost over many existing human-computer interface methods.
This research allows to have an overview of the different technologies that can be used to benefit people with visual disabilities. In the association "Sentir con los ojos del corazón" located in Tehuacán, Puebla, México, people with visual disabilities are served who do not have the technological tools available to understand their environment, such as restaurant menus, signs on doors, reading a book and any setting that contains a text, making life difficult in a world where most texts are oriented towards visual people. There are few applications for people with visual disabilities that allow them to improve their lives in the different areas in which they operate. Therefore, it is proposed to design a mobile application that interacts with a virtual assistant to translate the images into text to speech through optical character recognition (OCR), allowing them to function in different educational, work, social environments, among others. This project allows the Inclusion of people with visual disabilities to improve the quality of life using applications for mobile devices and to be self-sufficient in their daily life, later managing to translate in different languages, with different intensities and tone of voice, using different platforms.