All Tutorials

Apache Spark SQL – Bigdata In-Memory Analytics Master Course

Apache Spark SQL - Bigdata In-Memory Analytics Master Course
Apache Spark SQL - Bigdata In-Memory Analytics Master Course

Apache Spark SQL – Bigdata In-Memory Analytics Master Course

Master in-memory distributed computing with Apache Spark SQL. Leverage the power of Dataframe and Dataset Real-life demo

What you’ll learn

Apache Spark SQL – Bigdata In-Memory Analytics Master Course

  • Spark SQL Syntax, Component Architecture in Apache Spark
  • Dataset, Dataframes, RDD
  • Advanced features on the interaction of Spark SQL with other components
  • Using data from various data sources like MS Excel, RDBMS, AWS S3, No SQL Mongo DB,
  • Using the different format of files like Parquet, Avro, JSON
  • Table partitioning and Bucketing

Requirements

  • Introduction to Big Data ecosystem
  • Basics on SQL

Description

This course is designed for professionals from zero experience to already skilled professionals to enhance their Spark SQL Skills. The hands-on session covers on end to end setup of Spark Cluster in AWS and in local systems.

What students are saying:

  • 5 stars, “This is classic. Spark related concepts are clearly explained with real-life examples.  ” – Temitayo Joseph

At the final stage, we need to work with the structured data. SQL is a popular query language to do analysis on structured data.

Apache spark facilitates distributed in-memory computing. Spark has an inbuilt module called Spark-SQL for structured data processing. Users can mix SQL queries with Spark programs and seamlessly integrates with other constructs of Spark.

Spark SQL facilitates loading and writing data from various sources like RDBMSNoSQL databases, cloud storage like S3 and easily it can handle different format of data like ParquetAvroJSON and many more.

Spark Provides two types of APIs

Low-Level API – RDD

High-Level API – Dataframes and Datasets

Spark SQL amalgamates very well with various components of Spark like Spark Streaming, Spark Core and GraphX as it has good API integration between High level and low-level APIs.

Initial part of the course is on Introduction on Lambda Architecture and Big data ecosystem. The remaining section would concentrate on reading and writing data between Spark and various data sources.

Dataframe and Datasets are the basic building blocks for Spark SQL. We will learn how to work on Transformations and Actions with RDDs, Dataframes, and Datasets.

Optimization on the table with Partitioning and Bucketing.

1) NHL Dataset Analysis

2) Bay Area Bike Share Dataset Analysis

Updates:

++ Apache Zeppelin notebook (Installation, configuration, Dynamic Input)

++Spark Demo with Apache Zeppelin

Who this course is for:
  • Beginners who wanted to start with Spark SQL with Apache Spark
  • Data Analysts, Big data analysts
  • Those who wants to leverage in-memory computing against structured data.
  • Content From: http://www.udemy.com/course/apache-spark-sql-big-data-distributed-in-memory-analytics/
  • Local SQLite Database Course Free Download

Apache Spark SQL – Bigdata In-Memory Analytics Master Course

Download Tutorial (Size: 1.8 GB)

Advertisement

Categories