Over the past few decades, there has been significant progress in sentiment analysis, primarily focusing on analyzing text. However, the field of sentiment analysis linked to audio remains relatively undeveloped in the scientific community. This study aims to address this gap by introducing sentiment analysis applied to voice transcripts, specifically focusing on distinguishing emotions of individual speakers in conversations. The proposed research article seeks to develop a sentiment analysis system capable of rapidly interacting with multiple users and analyzing the sentiment of each user's audio input. Key components of this approach include speech recognition, Mel-frequency cepstral coefficients (MFCC), dynamic time warping (DTW), sentiment analysis, and speaker recognition. I. INTRODUCTION Sentiment analysis is used to analyze people's feelings or attitudes based on a conversation, topic, or general conversation. Sentiment analysis is used for various purposes such as in various applications and websites. We use our knowledge to create an assistant that understands and learns the human way of thinking based on holding each other. A machine that understands people's emotions/moods through these conversations and what keyword was used in the conversations. The combined sentiment analysis of the speaker and the speech is done with data and conversations extracted from previous conversations and various processes. Understanding people's thoughts and feelings has many applications. For example, technology that can understand and respond to a person's personal emotions will be important. Imagine a device that senses a person's mood and adjusts its settings based on the user's preferences and needs. Such innovations can improve user experience and satisfaction. In addition, research institutions are actively working to improve the quality and translation of audio content into text. This includes a variety of materials such as news reports, political speeches, social media, and music. By enabling this technology, they aim to make audio content more accessible and useful in many situations. Our research body has also worked on voice evaluation research [1,2,3] to study the conversation between the user and the assistance model and distinguish between each speaker and their emotions. Because there are several speakers in a conversation, it is difficult to analyze the text data of the recorded voice, so this paper proposes a model that can easily recognize the presence of different speakers and identify them as separate, and perform voice analysis; individuals speak and responds according to his feelings. We present an approach and viewpoint on investigating the hurdles and techniques involved in audio perception analysis of sound recordings through speech recognition. Our methodology utilizes a speech recognition model to interpret audio recordings, coupled with a suggested method for user discrimination based on a predetermined hypothesis to authenticate distinct speakers. Subsequently, each segment of speech data is scrutinized, enabling the system to accurately discern genuine emotions and topics of conversation. Part II, we delve into the underlying hypothesis regarding the speaker, explore speech recognition, and delve into sentiment analysis. Section III elaborates on the proposed system, while Section IV outlines the characteristics of the experimental configuration. Following that, Section V showcases the results obtained and provides an extensive analysis. Finally, Episode VI concludes the project.