HIVE and HQL
Overview
Course Objective
• Understand Hive Data Types and File Formats
• HQL DDL, DML, and Queries
• HQL Views, Indexes, and Schema Design
• Hive Tuning
• Hive Developing, UDF’s, and Streaming
• Hive Security, Locking, and Cloud Computing
Who Should Attend
• Big Data Developer / Administrator / Architect / Analyst / Engineer
• Software Architect / Engineer/Developer
• Solution Delivery Consultant
• Senior BI / ETL Developer
• NoSQL Big Data Developer
Prerequisites
• RDMS and SQL

Training Calendar
Intake
Duration
Program Fees
Module
Module 1 - INTRODUCTION TO HADOOP, MAPREDUCE, AND HIVE
• What Is Big Data?
• Introducing Hadoop
• Hadoop Components
• What is Hive?
• Hive CLI
Module 2 - DATA TYPES AND FILE FORMATS
• Primitive Data Types
• Collection Data Types
• File Encoding
• Schema on Read
Module 3 - HQL: DATA DEFINITION
• Databases in Hive
• Alter Database
• Creating Tables – Managed vs. External
• Partitioned Tables
• Dropping and Altering Tables
Module 4 - HQL: DATA MANIPULATION
• Loading Data into Managed Tables
• Inserting Data from Queries
• Creating Tables and Loading Them in a Single Query
• Exporting Data
Module 5 - HQL: QUERIES
• Querying
• Casting
• Sampling
• Unions
Module 6 - HQL: VIEWS
• Views to Reduce Query Complexity
• Views that Restrict Data Based on Conditions
• Views and Map Type for Dynamic Tables
• View Odds and Ends
Module 7 - HQL: INDEXES
• Creating an index
• Rebuilding an index
• Showing an index
• Dropping an index
• Custom index handlers
Module 8 - SCHEMA DESIGN
• Table-by-Day
• Partitioning
• Keys and Normalization
• Bucketing Table Data Storage
• Adding Columns to a Table
• Using Columnar Tables
• Compression
Module 9 - TUNING
• Using EXPLAIN
• Limit Tuning
• Optimized Joins
• Local mode
• Parallel Execution
• Strict Mode
• Mappers and Reducers
• JVM Reuse
• Index
• Dynamic Partition Tuning
• Speculative Execution
• MapReduce
• Virtual Columns
Module 10 - FUNCTIONS
• Calling Functions
• Standard Functions
• Aggregate Functions
• Table Generating Functions
• UDFs
FAQs
General Questions:
Q: What is the HIVE and HQL course about?
This 2-day course introduces participants to Apache Hive, a distributed RDMS built on top of Hadoop HDFS, and its query language, Hive Query Language (HQL). The course covers Hive’s data types, file formats, and HQL commands, focusing on data manipulation, views, indexes, schema design, performance tuning, user-defined functions (UDFs), and security. Participants will gain hands-on experience with Hive queries and optimizations for big data analysis in batch-oriented systems.
Q: Who should attend this course?
Big Data Developer / Administrator / Architect / Analyst / Engineer
Software Architect / Engineer / Developer
Solution Delivery Consultant
Senior BI / ETL Developer
NoSQL Big Data Developer
Q: What are the prerequisites for this course?
Knowledge of RDMS and SQL is required.
Q: How long is the course?
The course spans 2 days.
Q: What key topics are covered in this course?
Introduction to Hadoop, MapReduce, and Hive
Data Types and File Formats
Hive Query Language (HQL): Data Definition, Data Manipulation, and Queries
Creating Views and Indexes
Schema Design and Optimization
Hive Query Tuning and Performance Enhancements
User-Defined Functions (UDFs)
Streaming Data in Hive
Security, Locking, and Cloud Computing with Hive
Case Studies and Real-World Applications of Hive
Q: Will I receive a certification after completing the course?
Yes, participants will receive a certificate of completion, recognizing their skills in using Hive effectively.
Program Content & Skills:
Q: What foundational Hive concepts will I learn in this course?
You’ll explore the core principles of Hive, understanding how it manages large-scale structured data on Hadoop HDFS. Key topics include Hive data types, file formats, schema design, querying data using Hive Query Language (HQL), and performance tuning. You’ll also learn how to integrate Hive with Hadoop to perform efficient data analysis and optimization.
Q: How does the course help me apply Hive to real-world data problems?
You’ll work with real-world datasets to design optimal Hive schemas, efficiently query data using HQL, and optimize queries for better performance. This practical approach equips you to handle big data challenges in various industries such as retail, healthcare, and finance.
Q: What skills will I develop in managing and optimizing Hive queries?
You’ll learn how to administer and optimize Hive queries for performance, including partitioning, bucketing, indexing, and tuning. You’ll also master best practices for improving query efficiency, managing large datasets, and troubleshooting performance issues in Hive.
Q: Will I learn how to handle different types of data in Hive?
Yes. The course covers working with various data types in Hive, including structured, semi-structured, and time-series data. Hands-on exercises will help you address different data storage and access needs, along with performance optimization techniques for large-scale datasets.
Q: How does this course prepare me for applying Hive in a professional context?
You’ll gain technical knowledge and practical skills to integrate Hive into big data applications, optimize queries, and leverage it for business solutions such as data analysis, reporting, and decision-making in industries like e-commerce, finance, and healthcare.
Submit your interest today !