Big Data with Hadoop
and Spark
Overview
Course Objective
Who Should Attend
This course is designed for individuals of all level in an organisation.
Prerequisites
Basic familiarity with databases and data management

Training Calendar
Intake
Duration
Program Fees
Module
Module 1 - Fundamental Terminology and Concepts (30 min)
• Veracity, variability, visualization, and value (and the 3 V’s)
• HDFS and MapReduce in Hadoop
• Unstructured, semi-structured, and structured data
Module 2 - Brief History of Big Data (30 min)
• Google and MapReduce
• Web 2.0
• Hadoop vendors
Module 3 - Business Drivers for Big Data (30 min)
• KYC (Know Your Customer)
• Sales & Marketing
• Financial forecasting
Module 4 - Characteristics of Big Data (30 min)
Module 5 - Benefits of Adopting Big Data (30 min)
Module 6 - Challenges and Limitations of Big Data (30 min)
Module 7 - HDFS and Distributed Storage (90 min)
• Installing Hadoop and types of installations
• HDFS and Data Ingestion using Sqoop or Flume
Module 8 - Hands-on Exercises (90 min)
• Installing Hadoop
• Working with Cloudera CDH
• Basic Data Ingestion with Sqoop and Flume
Module 9 - Introduction to MapReduce (90 min)
• Map and Reduce
• Partitioning mappers and reducers
Module 10 - MapReduce Architecture (90 min)
• Working with MapReduce
• Distributed Cacheing
• Input and output formatters
Module 11 - MapReduce Examples (90 min)
• Sample programs
Module 12 - Hands-on Exercises (90 min)
• Writing MapReduce Jobs
• Writing Partitioners
• Writing Input and Output Formatters
Module 13 - SparkSQL, DataFrames, and Datasets (90 min)
• SparkSQL
• Executing SQL commands on a dataframe
• Using Dataframes instead of RDD’s
Module 14 - Spark MLLib (90 min)
• Using MLLib to produce movie recommendations
• Analyzing ALS Recommendation results
• Using Dataframes with MLLib
Module 15 - Spark Streaming and GraphX (90 min)
• Streaming data and NRT processing
• VertexRDD and EdgeRDD
Module 16 - Hands-on Exercises (90 min)
• Sample streaming
• Sample GraphX script
FAQs
General Questions:
Q: What is the Big Data with Hadoop and Spark course about?
This course provides a comprehensive introduction to Big Data concepts and technologies, focusing on Apache Hadoop and Apache Spark. It combines theory with hands-on practice to help participants understand data storage, processing, and analytics using tools like HDFS, MapReduce, SparkSQL, MLLib, and Spark Streaming.
Q: Who should attend this course?
This course is suitable for individuals at all levels within an organization who are looking to build foundational knowledge and hands-on skills in Big Data technologies.
Q: What are the prerequisites for this course?
Participants should have a basic familiarity with databases and data management concepts.
Q: How long is the course?
The course lasts for 3 days.
Q: What key topics are covered in this course?
Understanding Big Data terminology and concepts
Overview of Hadoop and Spark ecosystems
Installing and working with HDFS, MapReduce, and Spark
Hands-on data ingestion using Sqoop and Flume
Writing MapReduce jobs and working with input/output formatters
Using SparkSQL, DataFrames, and Datasets
Implementing machine learning with Spark MLLib
Processing real-time data with Spark Streaming and GraphX
Q: Will I receive a certification after completing the course?
No formal certification is provided, but participants will gain practical skills and knowledge in Big Data technologies applicable to real-world data analytics and processing tasks.
Program Content & Skills:
Q: What foundational Big Data and processing concepts will I learn in this course?
You’ll learn the core principles of Big Data including the 4 V’s (volume, velocity, variety, veracity), understand structured and unstructured data, and explore how Hadoop and Spark frameworks handle large-scale data processing using HDFS, MapReduce, and Spark components.
Q: How does the course prepare me to align Big Data technologies with business goals?
The course explains how Big Data supports business initiatives like customer insights, marketing strategies, and financial forecasting. You’ll explore real-world drivers for Big Data adoption and how to apply technical solutions to meet business needs.
Q: What skills will I develop in managing and processing data?
You’ll gain practical skills in setting up Hadoop, ingesting data using Sqoop and Flume, writing and executing MapReduce jobs, working with Cloudera CDH, and leveraging Spark for data manipulation, machine learning, and stream processing.
Q: Will I learn how to work with both batch and real-time data processing?
Yes. The course covers traditional batch processing using MapReduce as well as real-time processing using Spark Streaming and GraphX, allowing you to handle various data processing scenarios effectively.
Q: How does the course address real-world data analytics and implementation needs?
You’ll work through hands-on exercises that simulate real use cases, such as writing MapReduce programs, running SparkSQL queries, building recommendation systems with MLLib, and processing streaming data — all using tools commonly used in enterprise environments.
Submit your interest today !