Hadoop Data Science Training

Microsoft consulted data scientists and the companies that employ them to identify the core skills they need to be successful. This informed the curriculum used to teach key functional and technical skills, combining highly rated online courses with hands-on labs, concluding in a final capstone project.

Register Today :

 

Trainer : Sanjay


Course Duration : 3 months

For Online Course Fee : 40,000/-

For Classroom Course Fee : 30,000/-

Training Highlights :

Daily Tasks Weekly Interviews
Real-time Project Resume Guidance
Certification Guidance Placement Services

Data Science Training Course Contents:

 

HADOOP ADMIN AND DEVELOPER COURSE CONTENT

CHAPTER 1 : INTRODUCTION TO BIG DATA

  • What is Big Data
  • Big Data Challenges
  • Big Data opportunities
  • Characteristics of Big Data
  • Introduction to Analytics and the need for big data analytics
  • Real Time Big Data Use Cases

CHAPTER 2 : THE MOTIVATION FOR HADOOP

  • Comparing Hadoop Vs. Traditional systems
  • Problems with traditional large-scale systems
  • Data Storage
  • Data Processing
  • Requirements for a new approach
  • History of Hadoop
  • Hadoop Solutions - Big Picture
  • Hadoop distributions

CHAPTER 3 : HADOOP BASIC CONCEPTS

  • What is Hadoop?
  • The Hadoop Distributed File System
  • How MapReduce Works
  • Anatomy of a Hadoop Cluster

CHAPTER 4 : HADOOP 1.0 DEMONS

  • Master Daemons
  • Name node
  • Job Tracker
  • Secondary name node
  • Slave Daemons
  • Job tracker
  • Task tracker

CHAPTER 5 : HDFS (HADOOP DISTRIBUTION FILE SYSTEM)

  • Blocks and Splits
  • Input Splits
  • HDFS Splits
  • Data Replication
  • Hadoop Rack Aware
  • Name node
  • Data Node
  • Secondary Name node
  • Metadata
  • FS Image and Edit log
  • Data high availability
  • Data Integrity
  • Cluster architecture and block placement

CHAPTER 6 : JAVA AND LINUX COMMANDS

  • Java basics
  • Linux basic commands

CHAPTER 7: HDFS COMMANDS

  •  ls,
  • Mv
  • copyFromLocal, copyFromLocal, put
  • Basic file system Operations
  • Hdfs admin related commands

CHAPTER 8 : PROGRAMMING PRACTICES

  • Developing MapReduce Programs in Local Mode
  • Running without HDFS and Mapreduce
  • Pseudo-distributed Mode
  • Running all daemons in a single node
  • Fully distributed mode

CHAPTER 9 : HADOOP ADMINISTATIVE TASKS - Setup Hadoop cluster of Apache, Cloudera and HortonWorks

  • Install and configure Apache Hadoop
  • Make a fully distributed Hadoop cluster on a single laptop/desktop (Psuedo Mode)
  • Install and configure Hadoop distribution in fully distributed mode
  • Monitoring the cluster
  • Getting used to management console of Cloudera and Horton Works
  • Name Node in Safe mode
  • Meta Data Backup
  • Introduction to Integrating Kerberos security in Hadoop
  • Commissioning/Decommissioning Nodes.
  • BUILDING and CONFIGURING SINGLE NODE AND MULTINODE CLUSTER

CHAPTER 10 : HAOOP DEVELOPER TASKS-Writing a Map Reduce Program

  • Examining a Sample Map Reduce Program
  • Word Count Program
  • Basic API Concepts
  • The Driver Code
  • The Mapper
  • The Reducer
  • Hadoop's Streaming API

CHAPTER 11 : Performing several Hadoop Jobs

  • Processing video files and audio files
  • Processing image files
  • Processing XML files
  • Processing Zip files
  • Directly Accessing HDFS

CHAPTER 12 : Common Map Reduce Algorithms

  • Sorting and Searching
  • Indexing
  • Hands-On Exercise
  • Identify Mapper
  • Identify Reducer
  • Exploring well known problems using Map Reduce applications.

CHAPTER 13 : Debugging Map Reduce Programs

  • Testing with MR Unit
  • Logging
  • Other Debugging Strategies.

CHAPTER 14 : Advanced Map Reduce Programming

  • A Recap of the Map Reduce Flow
  • Custom Writables and Writable Comparables
  • The Secondary Sort
  • Creating Input Formats and Output Formats
  • Pipelining Jobs With Oozie
  • Map-Side Joins
  • Reduce-Side Joins.

CHAPTER 15 : Monitoring and debugging on a Production Cluster

  • Counters
  • Skipping Bad Records
  • Rerunning failed tasks

CHAPTER 16 : Tuning for Performance

  • Reducing network traffic with combiner
  • Reducing the amount of input data
  • Using Compression
  • Running with speculative execution
  • Refactoring code and rewriting algorithms Parameters affecting Performance
  • Other Performance Aspects

CHAPTER 17 : Hadoop Yarn

  • Hadoop 1.X vs Hadoop 2.X
  • Yarn basics
  • Resource Manager
  • Scheduler

Chapter 18 : Hadoop Ecosystem- Hive

  • Hive concepts
  • Hive architecture
  • Hive shell
  • Hive server
  • Hive metastore
  • Install and configure hive on cluster
  • Create database, access it console
  • Buckets,Partitions
  • Joins in Hive
  • Inner joins
  • Outer joins
  • Hive UDF
  • Hive UDAF
  • Hive UDTF
  • Develop and run sample applications in Java to access hive
  • Load Data into Hive and process it using Hive

CHAPTER 19 : PIG

  • Pig basics
  • Install and configure PIG on a cluster
  • PIG Vs MapReduce and SQL
  • PIG Vs Hive
  • Write sample Pig Latin scripts
  • Modes of running PIG
  • Running in Grunt shell
  • Programming in Eclipse
  • Running as Java program
  • PIG UDFs
  • PIG Macros
  • Load data into Pig and process it using Pig

CHAPTER 20 : SQOOP

  • Install and configure Sqoop on cluster
  • Connecting to RDBMS
  • Installing Mysql
  • Import data from Oracle/Mysql to hive
  • Export data to Oracle/Mysql
  • Internal mechanism of import/export
  • Import millions of records into HDFS from RDBMS using Sqoop

CHAPTER 21 : HBASE

  • Data Retrieval - Radom Access Vs. Sequential Access
  • NoSQL Databases
  • HBase concepts
  • HBase architecture
  • Region server architecture
  • File storage architecture
  • HBase basics
  • Cloumn access
  • Scans
  • HBase Use Cases
  • Install and configure HBase on cluster
  • Create database, Develop and run sample applications
  • Access data stored in HBase using clients like Java
  • Map Resuce client to access the HBase data
  • HBase and Hive Integration
  • HBase admin tasks
  • Defining Schema and basic operation

CHAPTER 22 : CASSANDRA

  • Cassandra core concepts
  • Install and configure Cassandra on cluster
  • Create database, tables and access it console
  • Developing applications to access data in Cassandra through Java
  • Install and Configure OpsCenter to access Cassandra data using browser

CHAPTER 23 : OOZIE

  • Oozie architecture
  • XML file specifications
  • Install and configure Oozie on cluster
  • Specifying Work flow
  • Action nodes
  • Control nodes
  • Oozie job coordinator
  • Accessing Oozie jobs command line and using web console
  • Create a sample workflows in oozie and run them on cluster

CHAPTER 24 : Introduction to Zookeeper, Flume, Chukwa, Avro, Scribe,Thrift, HCatalog

  • Flume and Chukwa Concepts
  • Use cases of Thrift ,Avro and scribe
  • Install and Configure flume on cluster

 

CHAPTER 25 : ANALYTICS BASIC

  • Analytics and big data analytics
  • Commonly used analytics algorithms
  • R language basics
  • python language basics
  • Mahout

CHAPTER 26 : CDH5 and HortonWorks

  • Comparision
  • Vendors

Spark & Scala

Chapter 1: Scala Introduction & Environment Setup

  • Java vs Scala
  • Scala is object-oriented,
  • Scala is functional,
  • Scala runs on the JVM
  • Installing Scala

Chapter 2: Scala Basic Syntax

  • First Scala Program
  • Interactive Mode Programming
  • Script Mode Programming

Chapter 3: Scala Data Types

  • Literals
  • Strings
  • Escape Sequences

Chapter 4: Scala Variables:

  • Declaration
  • Data Types
  • Type Inference
  • Multiple assignments
  • Variable Types

Chapter 5: Scala Operators:

  • Arithmetic
  • Relational
  • Logical
  • Operator Precedence in Scala

Chapter 6: Scala Conditions

Chapter 7: Scala Loops

Chapter 8: Scala Strings

Chapter 9: Scala closures and traits

 

Chapter 10: Scala Regular Expressions

  • Forming regular expressions
  • Matching Literals and Constants
  • Matching Tuples and Lists
  • Matching with Types and Guards
  • Pattern Variables and Constants in case Expressions
  • Regular-expression Examples
  • Pattern matching with Extractors

Chapter 11: Scala Functions:

  • Declarations
  • Definitions
  • Calling 
  • Function Literals
  • Anonymous
  • Currying

Chapter 12: Scala Arrays

  • Declaring
  • Processing
  • Multi-Dimensional
  • Create Array with Range
  • Scala Arrays Methods

Chapter 13: Scala Collections

  • Basic Operations on List,
  • Concatenating Lists
  • Creating Uniform Lists
  • Tabulating a Function
  • Scala List Methods
  • Concatenating Sets, Find max, min elements in Set
  • Find common values in Sets
  • Scala Set Methods
  • Basic Operations on Map
  • Check for a Key in Map

Chapter 14: Scala Classes & Objects:

  • Oops Basics
  • Defining Fields,Methods,Constructors

Chapter 15: Introduction to Apache Spark:

  • What is Spark?
  • Spark Ecosystem, &modes of Spark
  • overview of Spark on a cluster
  • Spark Standalone cluster
  • Spark Web UI &
  • Spark Common Operations

Chapter 16: Spark Core

  • performing basic Operations on files in Spark Shell and Overview of SBT
  • building a Spark project with SBT
  • running Spark project with SBT
  • Playing with RDDs:
  • RDDs, transformations in RDD, actions in RDD
  • loading data in RDD
  • saving data through RDD
  • Key-Value Pair RDD
  • MapReduce and Pair RDD Operations
  • Spark and Hadoop Integration-Yarn

Chapter 17: Spark SQL

  • SparkSQL and Performance Tuning in Spark:
  • Analyze Hive and Spark SQL architecture, SQLContext in Spark SQL
  • working with Data Frames
  • implementing an example for Spark SQL
  • integrating hive and Spark SQL
  • support for JSON and Parquet File Formats
  • implement data visualization in Spark
  • loading of data
  • Hive queries through Spark
  • performance tuning tips in Spark

Chapter 18: Spark Streaming

  • A Simple Example
  • Architecture and Abstraction
  • Transformations
  • Stateless Transformations
  • Stateful Transformations
  • Output Operations
  • Input Sources
  • Additional Sources
  • Multiple Sources and Cluster Sizing
  • Worker Fault Tolerance
  • Receiver Fault Tolerance
  • Processing Guarantees
  • Streaming UI
  • Batch and Window Sizes
  • Level of Parallelism

Chapter 19: Spark GraphX

  • Edges
  • Vertices
  • Types of Graphs
  • Usages
  • Simple Program

Chapter 20: SPARK Mlib

  • Vectors
  • Labledpoints
  • Lables
  • Features
  • RDD with Vectors
  • Matrices, Stats, Maths
  • Algorithms with Spark Mlib

Chapter 21: Spark  with Scala Machine Learning model

Machine Learning with Python

CHAPTER 1: Introduction to Script

  • What is Script
  • What is a program?
  • Types of Scripts
  • Difference between Script & Programming Languages
  • Features of Scripting
  • Limitation of Scripting
  • Types of programming Language Paradigms

CHAPTER 2: Introduction to Python

  • What is Python?
  • Why Python?
  • Who Uses Python?
  • Characteristics of Python
  • History of Python
  • What is PSF?
  • Python Versions
  • How to Download Python
  • How to Install Python
  • Install Python with Diff IDEs
  • Features of Python
  • Limitations of Python
  • Python Applications
  • Creating Your First Python Program
  • Printing to the Screen
  • Reading Keyboard Input
  • Using Command Prompt and GUI or IDE
  • Python Distributions

CHAPTER 3: Different Modes in PYTHON

  • Execute the Script
  • Interactive Mode
  • Script Mode
  • Python File Extensions
  • SETTING PATH IN Windows
  • Clear screen inside python
  • Learn Python Main Function
  • Python Comments
  • Quit the Python Shell
  • Shell as a Simple Calculator
  • Order of operations
  • Multiline Statements
  • Quotations in Python
  • Python Path Testing
  • Joining two lines
  • Python Implementation Alternatives
  • Python Sub Packages
  • Uses of Python in Data Science
  • USES OF PYTHON IN IOT
  • Working with Python in Unix/Linux/Windows/Mac/Android..!!

CHAPTER 4 : PYTHON NEW IDEs

  • PyCharm IDE
  • How to Work on PyCharm
  • PyCharm Components
  • Debugging process in PyCharm
  • PYTHON Install Anaconda
  • What is Anaconda?
  • Coding Environments
  • Spyder Components
  • General Spyder Features
  • Spyder Shortcut Keys
  • Jupyter Notebook
  • What is Conda?
  • Conda List?
  • Jupyter and Kernels
  • What is PIP?

CHAPTER 5 : Variables in Python

  • What is Variable?
  • Variables in Python
  • Constants in Python
  • Variable and Value
  • Variable names
  • Mnemonic Variable Names
  • Values and Types
  • What Does “Type” Mean?
  • Multiple Assignment
  • Python different numerical types
  • Standard Data Types
  • Operators and Operands
  • Order of Operations
  • Swap variables
  • Python Mathematics
  • Type Conversion
  • Mutable Versus Immutable Objects

CHAPTER 6 : String Handling

  • What is string?
  • String operations
  • String indices
  • Basic String Operations
  • String Functions, Methods
  • Delete a string
  • String Multiplication and concatenation
  • Python Keywords
  • Python Identifiers
  • Python Literals
  • String Formatting Operator
  • Structuring with indentation in Python
  • Built-in String Methods
  • Define Data Structure?
  • Data Structures in PYTHON

CHAPTER 7: Python Operators and Operands

  • Arithmetic Operators
  • Relational Operators
  • Comparison Operators
  • Python Assignment Operators
  • Short hand Assignment Operators
  • Logical Operators or Bitwise Operators
  • Membership Operators
  • Identity Operators
  • Operator precedence
  • Evaluating Expressions

CHAPTER 8 : Python Conditional Statements

  • How to use “if condition” in conditional structures
  • if statement (One-Way Decisions)
  • if .. else statement (Two-way Decisions)
  • How to use “else condition”
  • if .. elif .. else statement (Multi-way)
  • When “else condition” does not work
  • How to use “elif” condition
  • How to execute conditional statement with minimal code
  • Nested IF Statement

CHAPTER 9 : Python LOOPS

  • How to use “While Loop”
  • How to use “For Loop”
  • How to use For Loop for set of other things besides numbers
  • Break statements in For Loop
  • Continue statement in For Loop
  • Enumerate function for For Loop
  • Practical Example
  • How to use for loop to repeat the same statement over and again
  • Break, continue statements

 

CHAPTER 10 : Learning Python Strings

  • Accessing Values in Strings
  • Various String Operators
  • Some more examples
  • Python String replace() Method
  • Changing upper and lower case strings
  • Using “join” function for the string
  • Reversing String
  • Split Strings

CHAPTER 11 : Sequence or Collections in PYTHON

  • Strings
  • Unicode Strings
  • Lists
  • Tuples
  • buffers
  • xrange

CHAPTER 12 : Python Lists

  • Lists are mutable
  • Getting to Lists
  • List indices
  • Traversing a list
  • List operations
  • List slices
  • List methods
  • Map, filter and reduce
  • Deleting elements
  • Lists and strings

CHAPTER 13 : Python TUPLE

  • Advantages of Tuple over List
  • Packing and Unpacking
  • Comparing tuples
  • Creating nested tuple
  • Using tuples as keys in dictionaries
  • Deleting Tuples
  • Slicing of Tuple
  • Tuple Membership Test
  • Built-in functions with Tuple
  • Dotted Charts

CHAPTER 14 : Python Sets

  • How to create a set?
  • Iteration Over Sets
  • Python Set Methods
  • Python Set Operations
  • Union of sets
  • Built-in Functions with Set
  • Python Frozenset

CHAPTER 15 : Python Dictionary

  • How to create a dictionary?
  • PYTHON HASHING?
  • Python Dictionary Methods
  • Copying dictionary
  • Updating Dictionary
  • Delete Keys from the dictionary
  • Dictionary items() Method
  • Sorting the Dictionary
  • Python Dictionary in-built Functions
  • Dictionary len() Method
  • Variable Types
  • Python List cmp() Method
  • Dictionary Str(dict)

CHAPTER 16 : Python Functions

  • What is a function?
  • How to define and call a function in Python
  • Types of Functions
  • Significance of Indentation (Space) in Python
  • How Function Return Value?
  • Types of Arguments in Functions
  • Default Arguments
  • Non-Default Arguments
  • Keyword Arguments
  • Non-keyword Arguments
  • Arbitrary Arguments
  • Rules to define a function in Python
  • Various Forms of Function Arguments
  • Scope and Lifetime of variables
  • Nested Functions
  • Call By Value, Call by Reference
  • Anonymous Functions/Lambda functions
  • Passing functions to function
  • map(), filter(), reduce() functions
  • What is a Docstring?

CHAPTER 17 : Python Modules

  • What is a Module?
  • Types of Modules
  • The import Statement
  • The from…import Statement
  • ..import * Statement
  • Underscores in Python
  • The dir( ) Function
  • Creating User defined Modules
  • Command line Arguments
  • Python Module Search Path

CHAPTER 18 : Packages in Python

  • What is a Package?
  • Introduction to Packages?
  • py file
  • Importing module from a package
  • Creating a Package
  • Creating Sub Package
  • Importing from Sub-Packages
  • Popular Python Packages

CHAPTER 19 : Python Date and Time

  • How to Use Date & DateTime Class
  • How to Format Time Output
  • How to use Timedelta Objects
  • Calendar in Python
  • datetime classes in Python
  • How to Format Time Output?
  • The Time Module
  • Python Calendar Module
  • Python Text Calendar
  • Python HTML Calendar Class
  • Unix Date and Time Commands

CHAPTER 20 : File Handling

  • What is a data, Information File?
  • File Objects
  • File Different Modes
  • file Object Attributes
  • How to create a Text File
  • How to Append Data to a File
  • How to Read a File
  • Closing a file
  • Read, read line ,read lines, write, write lines…!!
  • Renaming and Deleting Files
  • Directories in Python
  • Working with CSV files
  • Working with CSV Module
  • Handling IO Exceptions

CHAPTER 21 : Python OS Module

  • Shell Script Commands
  • Various OS operations in Python
  • Python File System Shell Methods

CHAPTER 22 : Python Exception Handling

  • Python Errors
  • Common RunTime Errors in PYTHON
  • Abnormal termination
  • Chain of importance Of Exception
  • Exception Handling
  • Try … Except
  • Try .. Except .. else
  • Try … finally
  • Argument of an Exception
  • Python Custom Exceptions
  • Ignore Errors
  • Assertions
  • UsingAssertionsEffectively

CHAPTER 23 : More Advanced PYTHON

  • Python Iterators
  • Python Generators
  • Python Closures
  • Python Decorators
  • Python @property

CHAPTER 24 : Python Class and Objects

  • Introduction to OOPs Programming
  • Object Oriented Programming System
  • OOPS Principles
  • Define Classes
  • Creating Objects
  • Class variables and Instance Variables Constructors
  • Basic concept of Object and Classes
  • Access Modifiers
  • How to define Python classes
  • Python Namespace
  • Self-variable in python
  • Garbage Collection
  • What is Inheritance? Types of Inheritance?
  • How Inheritance works?
  • Python Multiple Inheritance
  • Overloading and Over Riding
  • Polymorphism
  • Abstraction
  • Encapsulation
  • Built-In Class Attributes

CHAPTER 25 : Python Regular Expressions

  • What is Regular Expression?
  • Regular Expression Syntax
  • Understanding Regular Expressions
  • Regular Expression Patterns
  • Literal characters
  • Repetition Cases
  • Example of w+ and ^ Expression
  • Example of \s expression in re.split function
  • Using regular expression methods
  • Using re.match()
  • Finding Pattern in Text (re.search())
  • Using re.findall for text
  • Python Flags
  • Methods of Regular Expressions

CHAPTER 26 : Python XML Parser

  • What is XML?
  • Difference between XML and HTML
  • Difference between XML and JSON and Gson
  • How to Parse XML
  • How to Create XML Node
  • Python vs JAVA
  • XML and HTML

CHAPTER 27 : Python-Data Base Communication

  • What is Database? Types of Databases?
  • What is DBMS?
  • What is RDBMS?
  • What is Big Data? Types of data?
  • Oracle
  • MySQL
  • SQL server
  • DB2
  • Postgre SQL
  • Executing the Queries
  • Bind Variables
  • Installing of Oracle Python Modules
  • Executing DML Operations..!!

CHAPTER 28 : Multi-Threading

  • What is Multi-Threading
  • Threading Module
  • Defining a Thread
  • Thread Synchronization

CHAPTER 29 : Unit Testing with PyUnit

  • What is Testing?
  • Types of Testings and Methods?
  • What is Unit Testing?
  • What is PyUnit?
  • Test scenarios, Test Cases, Test suites

CHAPTER 30: Introduction to Python Web Frameworks

  • Django – Design
  • Advantages of Django
  • MVC and MVT
  • Installing Django
  • Designing Web Pages
  • HTML5, CSS3, AngularJS
  • PYTHON Flask
  • PYTHON Bottle
  • PYTHON Pyramid
  • PYTHON Falcon

CHAPTER 31 : Data Analytics

  • Introduction to data Big Data?
  • Python for Analytics

CHAPTER 32 : Python Libraries Overview

  • scipy
  • numpy
  • matplotlib
  • pandas
  • sklearn

CHAPTER 33 : Data Science

  • What is Data Science?
  • Data Science Life Cycle?
  • What is Data Analysis
  • What is Data Mining
  • Analytics vs Data Science

CHAPTER 34 : Introduction to Machine Learning

  • What is Machine Learning?
  • Supervised learning
  • unsupervised learning
  • Define Problem
  • Prepare Data.
  • Evaluate Algorithms.
  • Improve Results.
  • Present Results

CHAPTER 35: Using Machine Learning Algorithms in python

  • Linear Regression
  • Logistic Regression
  • Decision Tree
  • CART
  • SVM
  • Naive Bayes
  • kNN
  • K-Means
  • Random Forest
  • Dimensionality Reduction Algorithms
  • Gradient Boosting algorithms

Getting Started R

  • R Basics
  • Variables and Class
  • Vectors, List, Factors, Matrix
  • Data Frames
  • Missing Values
  • Data Reading and Writing data
  • Data Visualization using GGPLOT
  • If-Else Conditions
  • Function
  • Loops
  • Data manipulation
  • Python

  • Python Basics

  • Python Lists

  • Functions and Packages

  • Numpy

  • Control flow and Pandas

Probability

  • Counting Combinations, Generating Combinations
  • Generating Random Numbers
  • Generating Reproducible Random Numbers
  • Generating a Random Sample
  • Generating Random Sequences
  • Randomly Permuting a Vector
  • Probabilities for Discrete Distributions
  • Probabilities for Continuous Distributions, Converting
  • Probabilities to Quantiles, Plotting a Density Function

Graphics

  • Edges
  • Vertices
  • Graphs
  • Programs

Machine Learning

  • Introduction to Machine Learning
  • Types Of Machine Learning
  • Real time use cases in Machine Learning
  • Types of Algorithms Types of Problems –
    • Regression
    • Classification
    • Clustering
    • Collaborative Filtering
    • Optimization
    • Prediction
  • Regression –
    • Linear Regression
    • Logistic Regression
  • Classification –
    • Logistic Regression
    • Decision Tree,Random Forest
    • KNN,SVM
    • Naive ayes
  • Clustering –
    • K-means Clustering
Complete Practical Training with Real-time Databases. Course includes Real-time Case Studies. Register Today
All Classes are Instructor-Led & LIVE. Completely Practical and Real-time with Study Material, Session Notes, Tasks and 24x7 Support.
 
Register Today  Other Popular Courses: SQL DBA Training, MSBI Training, SSIS Training, SSAS Training, SSRS Training [+] More Courses

Job-Oriented Real-time Training @ SQL School Training Institute - Trainer: Mr. Sai Phanindra T