Generative AI-Powered Framework for Audio Analysis and Conversational Exploration

Purshottam J. Assudani; Balakrishnan P; A. Anny Leema; Rajesh K Nasare

doi:10.63278/1425

Generative AI-Powered Framework for Audio Analysis and Conversational Exploration

Authors

Purshottam J. Assudani Assistant Professor, School of Computer Science and Engineering, Ramdeobaba University, India
Balakrishnan P Associate Professor Sr., Analytics Department, School of Computer Science and Engineering, VIT Vellore, India
A. Anny Leema Associate Professor Sr., Analytics Department, School of Computer Science and Engineering, VIT Vellore, India
Rajesh K Nasare Artificial intelligence, G H Raisoni college of engineering and Management Nagpur, India

DOI:

https://doi.org/10.63278/1425

Keywords:

Audio Interpretation, Generative AI, Large Language Models, CNN, Transformer, Spectrogram, Multimodal Fusion, Interactive AI.

Abstract

This paper introduces a hybrid deep learning system for complex audio interpretation and post time communication utilizing associated hidden Convolutional Neural Networks (CNNs) with transformer based Large Language Models (LLMs) over spectrogram. The system inputs raw audio input in the form of audio signals, and maps them into spectrograms, extracts high level features using CNNs, and asks for fusion of LLM-produced embeddings with it, for adding semantic understanding, and contextual discussions. The multimodal attention technique helps in crossing the audio-linguistic gap and therefore, it is possible that they can have meaningful and context-aware response. The release offers the apps for intelligent assistant, education, intelligent monitoring, and other. Github repository, experimental evaluation presents increase in performance over the state-of-the-art in both experiments, with accuracy at 93.8%, latency at 420 ms and high semantic coherence (BLEU score of 0.74 is obtained). This result proves that the proposed system is usable to offer both user-friendly and intelligent audio exploration.

Downloads

Published

2025-04-16

How to Cite

Purshottam J. Assudani, Balakrishnan P, A. Anny Leema, and Rajesh K Nasare. 2025. “Generative AI-Powered Framework for Audio Analysis and Conversational Exploration ”. Metallurgical and Materials Engineering 31 (4):206-11. https://doi.org/10.63278/1425.

Download Citation

Issue

Vol. 31 No. 4 (2025)

Section

Research

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Authors who publish with this journal agree to the following terms:

Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
Authors are permitted and encouraged to post their published articles online (e.g., in institutional repositories or on their website, social networks like ResearchGate or Academia), as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).

Except where otherwise noted, the content on this site is licensed under a Creative Commons Attribution 4.0 International License.