Yosi Shrem's research while affiliated with Bar Ilan University and other places

Publications (5)

Preprint
Full-text available
Background - Social anxiety (SA) is a common and debilitating condition, negatively affecting life quality even at sub-diagnostic thresholds. We sought to characterize SA's acoustic signature using hypothesis-testing and machine learning (ML) approaches. Methods - Participants formed spontaneous utterances responding to instructions to refuse or co...
Preprint
Full-text available
Formants are the spectral maxima that result from acoustic resonances of the human vocal tract, and their accurate estimation is among the most fundamental speech processing problems. Recent work has been shown that those frequencies can accurately be estimated using deep learning techniques. However, when presented with a speech from a different d...
Article
Full-text available
Speakers learning a second language show systematic differences from native speakers in the retrieval, planning, and articulation of speech. A key challenge in examining the interrelationship between these differences at various stages of production is the need for manual annotation of fine-grained properties of speech. We introduce a new method fo...
Preprint
Full-text available
Voice Onset Time (VOT), a key measurement of speech for basic research and applied medical studies, is the time between the onset of a stop burst and the onset of voicing. When the voicing onset precedes burst onset the VOT is negative; if voicing onset follows the burst, it is positive. In this work, we present a deep-learning model for accurate a...

Citations

... Praat 55 69 84 130 230 267 70 94 154 64 105 125 54 81 112 57 75 95 semivowels 68 80 103 136 295 334 89 126 222 83 122 154 67 114 168 67 90 130 nasal 75 112 106 219 409 381 96 229 239 67 120 112 66 175 151 74 130 As seen in the results, replicating the performance obtained from the trained samples to a new domain is a critical obstacle for machine-learning-based methods [20,21]. There is a performance gap for DeepFormants and our model when we combine samples from the train set of Clopper and Hillenbrand during the optimization compared to train only with the VTR. ...
... The classifier is a fully-connected network that gets as input the representation and outputs a detection score for creaky voice along with two additional auxiliary tasks: voicing and pitch. These additional predictions steer the overall network toward a better solution in detecting creaky voice [21]. ...