Skip to main content


A Structured Review and Comparative Study of Big Data Processing Frameworks: Hadoop, Spark, and Flink

Issue Abstract

Abstract 

Big Data technology has become very important nowadays to deal with vast and complex datasets and to extract meaningful data. In order to manage data effectively, strong data processing frameworks are very important. This paper provides a Structured Review and Comparative Study of three famous big data processing frameworks, Apache Hadoop, Spark and Flink. The three selected frameworks have been analyzed in terms of its architecture, features, benefits, limitations and real-world use cases. Comparative analysis is conducted based on factors such as Processing Model, Performance and Latency, Fault Tolerance, Data Handling, Scalability, APIs, Ease of Use, Libraries, Community Adoption. The study highlights the strengths which is associated with each framework. Hadoop’s batch-processing reliability, Spark’s in-memory speed and Flink’s true stream processing capabilities are discussed. The work done in this paper aims to guide the researchers and practitioners in selecting the best suitable frameworks according to their use-case requirements.


Author Information
C. Firza Afreen - Assistant Professor PG Department of Computer Science Islamiah Women’s Arts and Science College Vaniyambadi
Issue No
5
Volume No
7
Issue Publish Date
05 May 2025
Issue Pages
40-47

Issue References

References

[1] M. Parsian, Data Algorithms with Spark: Recipes and Design Patterns for Scaling Up. 1st ed. Sebastopol, CA, USA: O’Reilly Media, 2022. ISBN: 978-1 4920-8238-5. 

[2] H. Karau, A. Konwinski, P. Wendell, and M. Zaharia, Learning Spark: Lightning-Fast Big Data Analysis. 1st ed. Sebastopol, CA, USA: O’Reilly Media, 2015. ISBN: 978-1-4493-5862-4. 

[3] S. Ryza, U. Laserson, S. Owen, and J. Wills, Advanced Analytics with Spark: Patterns for Learning from Data at Scale. 2nd ed. Sebastopol, CA, USA: O’Reilly Media, 2017. ISBN: 978-1-4919-7294-6. 

[4] F. Hueske and V. Kalavri, Stream Processing with Apache Flink. 1st ed. Sebastopol, CA, USA: O’Reilly Media, 2019. ISBN: 978-1-4919-7429-2. 

[5] M. Guller, Big Data Analytics with Spark: A Practitioner’s Guide to Using Spark for Large Scale Data Analysis. 1st ed. New York, NY, USA: Apress, 2016. ISBN: 978-1-4842-0964-6. 

[6] V. Ankam, Big Data Analytics: Real-Time Analytics Using Apache Spark and Hadoop. 1st ed. Birmingham, UK: Packt Publishing, 2016. ISBN: 978-1 78588-469-6.

[7]V. S. Agneeswaran, Big Data Analytics Beyond Hadoop: Real-Time Applications with Storm, Spark, and More Hadoop Alternatives. 1st ed. Upper Saddle River, NJ, USA: Pearson Education, 2014. ISBN: 978-0133838251. 

[8]N. Marz and J. Warren, Big Data: Principles and Best Practices of Scalable Real Time Data Systems. 1st ed. New Delhi, India: Wiley India, 2015. ISBN: 978 9351198062.