Scalability and Efficiency in Distributed Big Data Architectures: A Comparative Study
DOI:
https://doi.org/10.63278/1318Keywords:
Big Data, Distributed Computing, Apache Spark, Apache Flink, Machine Learning.Abstract
With the rapid expansion of the size of data, there is a need for the development of scalable and efficient architectures for large scale data processing. This research conducts a comparative analysis between the performance, scalability and efficiency of the Apache Hadoop, Apache Spark, Apache Flink, and Google Bigtable big data frameworks. Finally, the experimental results indicate that the Apache Spark is faster in execution times by 3.5× than Hadoop, and the Apache Flink achieves 40% lower latency on real time analytics than Spark. Google Bigtable had good throughput at 5 million queries a second, but it was not flexible to computationally intense processes. Furthermore, this study examined the application of the machine learning and blockchain technologies in the implementation of the distributed systems for the unified backend that incorporates processing efficiency improvement by 25% and data integrity with the added computational overhead of 12%. The research demonstrates that Flink is most suitable for real time data streams, spark is the best tool for iterative workloads and bigtable is the most appropriate for structured high throughput storage. Nevertheless, questions remain on how to scale in the extreme workload case and balance security with performance. Finally, future research will focus on hybrid architectures that enable high speed and security performance for the next generation big data applications.
Downloads
How to Cite
Issue
Section
License
Copyright (c) 2025 Manikandan K, Vamsee Pamisetty, Srinivas Rao Challa, Venkata Bhardwaj Komaragiri, Kishore Challa, Karthik Chava

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their published articles online (e.g., in institutional repositories or on their website, social networks like ResearchGate or Academia), as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
Except where otherwise noted, the content on this site is licensed under a Creative Commons Attribution 4.0 International License.