Cloud Infrastructure for Big Data Processing and Analytics
Keywords:
Cloud computing; Big data analytics; Systematic review; PRISMA; Scalability; Data security; Cloud-edge computing; Artificial intelligence; IoTAbstract
The rapid expansion of data-driven applications has intensified the need for scalable infrastructures capable of processing large and complex datasets. Cloud computing has emerged as a critical enabler of big data analytics, offering flexible and cost-efficient computational resources. However, existing research remains fragmented across domains and technologies. This study aims to provide a comprehensive synthesis of high-impact literature on the integration of cloud computing and big data analytics, identifying key trends, technological advancements, challenges, and application domains. A systematic review methodology was employed using the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) framework to ensure transparency and reproducibility. The study adopts a mixed-methods analytical approach based on John W. Creswell’s convergent design, integrating qualitative thematic analysis with quantitative descriptive evaluation. Literature published between 2018 and 2026 was retrieved from major academic databases, including Scopus, Web of Science, IEEE Xplore, ScienceDirect, and Google Scholar. Following a rigorous screening and eligibility process, 68 peer-reviewed studies were included for analysis. The findings indicate a clear shift from traditional centralized systems to distributed, hybrid, and cloud–edge architectures. Increasing integration of artificial intelligence and Internet of Things (IoT) technologies is evident across sectors such as healthcare, finance, retail, and smart cities. While cloud-based big data analytics enhances scalability, efficiency, and real-time processing, critical challenges persist, including data security, privacy, latency, and system interoperability. This study provides a holistic, evidence-based synthesis of cloud-enabled big data analytics without relying on primary data or experimental methods. It advances theoretical understanding, highlights research gaps, and offers strategic insights for future research and practical implementation in data-driven environments.
