Scalability and Efficiency in Distributed Big Data Architectures: A Comparative Study

Authors

  • Manikandan K Assistant Professor, Dept. of EEE, Dr.Mahalingam College of Engineering and Technology, India
  • Vamsee Pamisetty Middleware Architect
  • Srinivas Rao Challa Sr. Manager
  • Venkata Bhardwaj Komaragiri Lead Data Engineer
  • Kishore Challa Lead Software Engineer
  • Karthik Chava Senior Software Engineer

DOI:

https://doi.org/10.63278/1318

Keywords:

Big Data, Distributed Computing, Apache Spark, Apache Flink, Machine Learning.

Abstract

With the rapid expansion of the size of data, there is a need for the development of scalable and efficient architectures for large scale data processing. This research conducts a comparative analysis between the performance, scalability and efficiency of the Apache Hadoop, Apache Spark, Apache Flink, and Google Bigtable big data frameworks. Finally, the experimental results indicate that the Apache Spark is faster in execution times by 3.5× than Hadoop, and the Apache Flink achieves 40% lower latency on real time analytics than Spark. Google Bigtable had good throughput at 5 million queries a second, but it was not flexible to computationally intense processes. Furthermore, this study examined the application of the machine learning and blockchain technologies in the implementation of the distributed systems for the unified backend that incorporates processing efficiency improvement by 25% and data integrity with the added computational overhead of 12%. The research demonstrates that Flink is most suitable for real time data streams, spark is the best tool for iterative workloads and bigtable is the most appropriate for structured high throughput storage. Nevertheless, questions remain on how to scale in the extreme workload case and balance security with performance. Finally, future research will focus on hybrid architectures that enable high speed and security performance for the next generation big data applications.

Downloads

How to Cite

Manikandan K, Vamsee Pamisetty, Srinivas Rao Challa, Venkata Bhardwaj Komaragiri, Kishore Challa, and Karthik Chava. 2025. “Scalability and Efficiency in Distributed Big Data Architectures: A Comparative Study”. Metallurgical and Materials Engineering 31 (3):40-49. https://doi.org/10.63278/1318.

Issue

Section

Research