Generative AI-Powered Framework for Audio Analysis and Conversational Exploration

Authors

  • Purshottam J. Assudani Assistant Professor, School of Computer Science and Engineering, Ramdeobaba University, India
  • Balakrishnan P Associate Professor Sr., Analytics Department, School of Computer Science and Engineering, VIT Vellore, India
  • A. Anny Leema Associate Professor Sr., Analytics Department, School of Computer Science and Engineering, VIT Vellore, India
  • Rajesh K Nasare Artificial intelligence, G H Raisoni college of engineering and Management Nagpur, India

DOI:

https://doi.org/10.63278/1425

Keywords:

Audio Interpretation, Generative AI, Large Language Models, CNN, Transformer, Spectrogram, Multimodal Fusion, Interactive AI.

Abstract

This paper introduces a hybrid deep learning system for complex audio interpretation and post time communication utilizing associated hidden Convolutional Neural Networks (CNNs) with transformer based Large Language Models (LLMs) over spectrogram. The system inputs raw audio input in the form of audio signals, and maps them into spectrograms, extracts high level features using CNNs, and asks for fusion of LLM-produced embeddings with it, for adding semantic understanding, and contextual discussions. The multimodal attention technique helps in crossing the audio-linguistic gap and therefore, it is possible that they can have meaningful and context-aware response. The release offers the apps for intelligent assistant, education, intelligent monitoring, and other. Github repository, experimental evaluation presents increase in performance over the state-of-the-art in both experiments, with accuracy at 93.8%, latency at 420 ms and high semantic coherence (BLEU score of 0.74 is obtained). This result proves that the proposed system is usable to offer both user-friendly and intelligent audio exploration.

Downloads

Published

2025-04-16

How to Cite

Purshottam J. Assudani, Balakrishnan P, A. Anny Leema, and Rajesh K Nasare. 2025. “Generative AI-Powered Framework for Audio Analysis and Conversational Exploration ”. Metallurgical and Materials Engineering 31 (4):206-11. https://doi.org/10.63278/1425.

Issue

Section

Research