Figure 3 - uploaded by Sachin Sharma
Content may be subject to copyright.
Source publication
Speech recognition system play an essential role in every human being life. It is a software that allows the user to interact with their mobile phones through speech. Speech recognition software splitting down the audio of a speech into various sound waves forms, analyzing each sound form, using various algorithms to find the most appropriate word...
Similar publications
Knowing the language of an input text/audio is a necessary first step for using almost every natural language processing (NLP) tool such as taggers, parsers, or translation systems. Language identification is a well-studied problem, sometimes even considered solved; in reality, most of the world's 7000 languages are not supported by current systems...
Attention is the core mechanism of today's most used architectures for natural language processing and has been analyzed from many perspectives, including its effectiveness for machine translation-related tasks. Among these studies, attention resulted to be a useful source of information to get insights about word alignment also when the input text...
Citations
... In terms of technological development, speech recognition has a long history in man technological applications. Recently, it has developed a lot when deep learning was used as well as when using big data, and it has been significantly improved not only in term of academic papers and research published in this regard but also in its use in the globa industry and information technology, such as in global companies such as Google, Face book, and Microsoft [29]. ...
... Therefore, scientists have been keen to change the policy and method o processing that deals with the nature of language [31], as shown in Figure 2. In terms of technological development, speech recognition has a long history in many technological applications. Recently, it has developed a lot when deep learning was used, as well as when using big data, and it has been significantly improved not only in terms of academic papers and research published in this regard but also in its use in the global industry and information technology, such as in global companies such as Google, Facebook, and Microsoft [29]. ...
... In terms of technological development, speech recognition has a long history in many technological applications. Recently, it has developed a lot when deep learning was used, as well as when using big data, and it has been significantly improved not only in terms of academic papers and research published in this regard but also in its use in the global industry and information technology, such as in global companies such as Google, Facebook, and Microsoft [29]. ...
This study investigates the enhancement of automated driving and command control through speech recognition using a Deep Neural Network (DNN). The method depends on some sequential stages such as noise removal, feature extraction from the audio file, and their classification using a neural network. In the proposed approach, the variables that affect the results in the hidden layers were extracted and stored in a vector to classify them and issue the most influential ones for feedback to the hidden layers in the neural network to increase the accuracy of the result. The result was 93% in terms of accuracy and with a very good response time of 0.75 s, with PSNR 78 dB. The proposed method is considered promising and is highly satisfactory to users. The results encouraged the use of more commands, more data processing, more future exploration, and the addition of sensors to increase the efficiency of the system and obtain more efficient and safe driving, which is the main goal of this research.
... In speech recognition [260], [261], [262], [263] , SSOD aids in identifying and classifying speech patterns and phonetic elements within audio data, even with limited labeled samples. By leveraging both labeled and unlabeled speech data, these models can better discern speech signals from background noise and accurately transcribe spoken words into text. ...
The impressive advancements in semi-supervised learning have driven researchers to explore its potential in object detection tasks within the field of computer vision. Semi-Supervised Object Detection (SSOD) leverages a combination of a small labeled dataset and a larger, unlabeled dataset. This approach effectively reduces the dependence on large labeled datasets, which are often expensive and time-consuming to obtain. Initially, SSOD models encountered challenges in effectively leveraging unlabeled data and managing noise in generated pseudo-labels for unlabeled data. However, numerous recent advancements have addressed these issues, resulting in substantial improvements in SSOD performance. This paper presents a comprehensive review of 27 cutting-edge developments in SSOD methodologies, from Convolutional Neural Networks (CNNs) to Transformers. We delve into the core components of semi-supervised learning and its integration into object detection frameworks, covering data augmentation techniques, pseudo-labeling strategies, consistency regularization, and adversarial training methods. Furthermore, we conduct a comparative analysis of various SSOD models, evaluating their performance and architectural differences. We aim to ignite further research interest in overcoming existing challenges and exploring new directions in semi-supervised learning for object detection.
Мова є найбільш природною формою людського спілкування, тому реалізація інтерфейсу, який базується на аналізі мовленнєвої інформації є перспективним напрямком розвитку інтелектуальних систем управління. Система автоматичного розпізнавання мовлення – це інформаційна система, що перетворює вхідний мовленнєвий сигнал на розпізнане повідомлення. Процес розпізнавання мовлення є складним і ресурсоємним завданням через високу варіативність промови, яка залежить від віку, статі та фізіологічних характеристик мовця. У статті представлено узагальнений опис задачі розпізнавання мовлення, що складається з етапів: передискретизація, кадрування та застосування вікон, виділення ознак, нормалізація довжини голосового тракту та шумопригнічення. Попередня обробка мовленнєвого сигналу є першим і ключовим етапом у процесі автоматичного розпізнавання мови, оскільки якість вхідного сигналу суттєво впливає на якість розпізнавання і кінцевий результат цього процесу. Попередня обробка мови складається з очищення вхідного сигналу від зовнішніх і небажаних шумів, виявлення мовленнєвої активності та нормалізації довжини голосового тракту. Метою попередньої обробки мовленнєвого сигналу є підвищення обчислювальної ефективності систем розпізнавання мови та систем керування із природньомовним інтерфейсом. У статті запропоновано використання швидкого перетворення Фур’є для описування вхідного аудіо сигналу; вікна Hamming для створення сегментів аудіосигналу з подальшим визначенням ознак засобами Mel-Frequency Cepstral Coefficients. Описано використання алгоритму динамічного трансформування часової шкали для нормалізації довжини голосового тракту та рекурентної нейронної мережі для шумопригнічення. Наведено результати експерименту щодо попередньої обробки аудіо сигналу голосових команд для керування застосунками мобільного телефону з оперативною системою Android.
The rapid evolution of technology has positioned Artificial intelligence (AI) at the forefront of innovation in various sectors, notably in healthcare. This review explores AI’s current and future roles in primary healthcare, highlighting its transformative impact and the challenges that accompany its integration. AI currently aids healthcare practitioners in a myriad of ways. Diagnostic procedures have been revolutionized by AI algorithms capable of analysing medical images with precision. Administrative tasks, such as patient scheduling and record-keeping, have become more efficient due to AI’s streamlining capabilities. Predictive analytics, a critical AI feature, plays a pivotal role in pre-empting health complications by analysing extensive patient data. Additionally, AI has significantly advanced telemedicine, offering wider access to healthcare, a crucial development amidst global health emergencies. Moreover, AI contributes substantially to personalized medicine, analysing large-scale data, including genetic information, to tailor treatment plans. Its integration into Electronic Health Records (EHR) systems enhances data processing, improving treatment outcomes and operational efficiency. Despite these advancements, challenges persist. Data privacy concerns, potential erosion of the human element in patient care, difficulties integrating AI with existing systems, and the risk of over-reliance on technology are pressing issues that require careful management. Ethical considerations, including algorithmic transparency and accountability, also pose significant challenges. Looking ahead, AI’s trajectory in primary healthcare is geared towards further advancements, with an emphasis on ethical implications and maintaining a balance between technology and human-centred care. The potential of AI to enhance areas like chronic disease management and mental health care is vast. As we approach an AI-driven era in primary healthcare, this review underscores the importance of merging technological innovation with empathetic patient care.
Speech Recognition is one of the prominent research topics in the field of Natural Language Processing (NLP). The Speech Recognition technique removes the barriers and makes the system ease for inter-communication between human beings and devices. The aim of this study is to analyze the Automatic Speech Recognition System (ASRS) proposed by different researchers using Machine learning and Deep Learning techniques. In this work, Indian and foreign languages speech recognition systems like Hindi, Marathi, Malayalam, Urdu, Sanskrit, Nepali, Kannada, Chinese, Japanese, Arabic, Italian, Turkish, French, and German are considered. An integrated framework is presented and elaborated with recent advancement. The various platform like Hidden Markov Model Toolkit (HMM Toolkit), CMU Sphinx, Kaldi toolkit are explained which is used for building the speech recognition model. Further, some applications are elaborated which depict the uses of ASRS.
In the realm of computer science, a huge number of studies have been done for speech application areas, particularly speech recognition, in the last few decades. Through the process of determining and interpreting, speech recognition enables the system to turn received speech signals into instructions. Speech recognition was not quite as impressive at the time of development or in its initial phases as it is now, so many studies focused on it and made it one of the unique qualities. In the subject of automatic voice recognition, a lot of excellent progress has been made on a range of problems, one of which is determining the speaker's surroundings from the audio around the speaker. The core contribution of this project is to explore how to use extensive speech recognition knowledge to extract the speaker's surrounding environment. Since its inception, the research also gives a brief review of voice recognition techniques and applications.