Big Data Architect Masters Program

Categories
Big Data
Read Review
5.0 (3375 satisfied learners)

Master Big data skills with EDTIA Big Data Architect Masters Program and lead your way in professional life. In this best Big Data Architect Masters Program, you will learn about the aspects of Big Data Architect.

Course Description

Big Data Architect Masters Program drives you to be professional in tools and systems utilized by Big Data experts. This master in Big data includes training on Hadoop and Spark stack, Cassandra, Talend, and Apache Kafka messaging system.

Big data architects are responsible for providing the framework that appropriately replicates the Big Data needs of a company utilizing data, hardware, software, cloud services, developers, and other IT infrastructure to align the IT support of an association with its enterprise goals.

Candidates with a bachelor's degree in computer science, computer engineering, or a related field can pursue this Course.

Big Data permits institutions to catch trends and spot patterns that can be utilized for future advantage. It can help to see which customers are likely to buy products or help to optimize marketing campaigns by identifying which advertisement strategies have the highest return on investment.

There are no prerequisites for enrollment in the Big Data Architect Certification. Whether you are a skilled professional working in the IT industry or an aspirant planning to enter the data-driven world of analytics, Masters's Program is designed and developed to accommodate many professionals.

Big Data architects create and sustain data infrastructure to pull and organize data for accepted individuals to access. Data architects/engineers operate with database administrators and analysts to guarantee easy access to the company's big data.

One of the most promising and integral roles in data science is the data architect. From 2018–to 2028, it is expected that the demand for data architects will grow by 9%, higher than average for all other occupations.

What you'll learn

  • In this Course, you will learn: Hadoop and Spark stack, Cassandra, Talend and Apache Kafka messaging system and more.

Requirements

  • There is no particular requirement to pursue this course.

Curriculam

learn about Java architecture, advantages of Java, and develop the code with various data types, conditions, and loops.

Bytecode
Class Files
Compilation Process
Data types and Operations
If conditions
Loops - for, while and do-while
Data Types and Operations
if Condition
for..loop
while..loop
do..while loop

learn how to code with arrays, parts, and strings using examples and Programs.

Arrays - Single Dimensional and Multidimensional arrays
Functions
Function with Arguments
Function Overloading
Concept of Static Polymorphism
String Handling -String
String buffer Classes
Declaring the arrays
Accepting data for the arrays
Calling the functions which take arguments, perform a search in the array, and display the record by calling the function which takes arguments

comprehend object-oriented programming through Java using Classes, Objects, and different Java ideas like Abstract, Final, etc.

OOPS in Java: Concept of Object Orientation, Attributes and Methods, Classes and Objects
Methods and Constructors : Default Constructors, Constructors with Arguments, Inheritance, Abstract, Final and Static
Inheritance
Overloading
Overriding

know about packages in Java and scope specifiers of Java. You will also learn exception handling and how multithreading works in Java.

Packages and Interfaces
Access Specifiers
Package
Exception Handling
Multithreading
Interfaces
Packages
Exception
Thread

Discover to write code with Wrapper Classes, Inner Classes, and Applet Programs. How to use io, lang, and util packages of Java and Collections.

Wrapper Classes and Inner Classes: Integer, Character, Boolean, Float, etc.
Applet Programs: Writing UI programs with Applet, Java. Lang, Java.io, Java. Util.
Collections: ArrayList, Vector, HashSet, TreeSet, HashMap, HashTable.
Wrapper class
Collection

comprehend what Big Data is, the constraints of the traditional solutions for Big Data problems, how Hadoop decodes those Big Data problems, Hadoop Ecosystem, Hadoop Architecture, HDFS, Anatomy of File Read and Write & how MapReduce works

Intro to Big Data and its Challenges
Limitations & Solutions of Big Data Architecture
Hadoop & its Features
Hadoop Ecosystem
Hadoop 2. x Core Components
Hadoop Storage: HDFS (Hadoop Distributed File System)
Hadoop Processing: MapReduce Framework
Different Hadoop Distributions

learn Hadoop Cluster Architecture, essential configuration files of Hadoop Cluster, Data Loading Techniques using Sqoop & Flume, and set up Single Node and Multi-Node Hadoop Cluster.

Hadoop 2.x Cluster Architecture
Federation and High Availability Architecture
Typical Production Hadoop Cluster
Hadoop Cluster Modes
Common Hadoop Shell Commands
Hadoop 2.x Configuration Files
one Node Cluster & Multi-Node Cluster set up
Basic Hadoop Administration

understand the Hadoop MapReduce framework fully, the working of MapReduce on data stored in HDFS, and advanced MapReduce concepts like Input Splits, Combiner & Partitioner.

Traditional way vs MapReduce way
Why MapReduce
YARN Components
YARN Architecture
YARN MapReduce Application Execution Flow
YARN Workflow
Anatomy of MapReduce Program
Input Splits, Relation between Input Splits and HDFS Blocks
MapReduce: Combiner & Partitioner
Demo of Health Care Dataset
Demo of Weather Dataset

discover Advanced MapReduce concepts such as Counters, Distributed Cache, MRunit, Reduce Join, Custom Input Format, Sequence Input Format, and XML parsing.

Counters
Distributed Cache
MR unit
Reduce Join
Custom Input Format
Sequence Input Format
XML file Parsing using MapReduce

learn Apache Pig, types of use cases where we can use Pig, tight coupling between Pig and MapReduce, Pig Latin scripting, Pig running modes, Pig UDF, Pig Streaming & Testing Pig Scripts.

Introduction to Apache Pig
MapReduce vs Pig
Pig Components & Pig Execution
Pig Data Types & Data Models in Pig
Pig Latin Programs
Shell and Utility Commands
Pig UDF & Pig Streaming
Testing Pig scripts with Punit
Aviation use-case in PIG
Pig Demo of Healthcare Dataset

learning Hive concepts, Hive Data types, loading and querying data in Hive, running hive scripts, and Hive UDF.

Introduction to Apache Hive
Hive vs Pig
Hive Architecture and Components
Hive Metastore
Limitations of Hive
Comparison with Traditional Database
Hive Data Types and Data Models
Hive Partition
Hive Bucketing
Hive Tables (Managed Tables and External Tables)
Importing Data
Querying Data & Managing Outputs
Hive Script & Hive UDF
Retail use case in Hive
Hive Demo on Healthcare Dataset

comprehend advanced Apache Hive concepts such as UDF, Dynamic Partitioning, Hive indexes and views, and optimizations in Hive, Apache HBase, HBase Architecture, HBase running modes, and its components.

Hive QL: Joining Tables, Dynamic Partitioning,
Custom MapReduce Scripts,
Hive Indexes and views ,
Hive Query Optimizers,
Hive Thrift Server,
Hive UDF,
Apache HBase: Intro to NoSQL Databases and HBase,
HBase v/s RDBMS,
HBase Components,
HBase Architecture,
HBase Run Modes,
HBase Configuration,
HBase Cluster Deployment

Learn advanced Apache HBase concepts. Witness demos on HBase Bulk Loading & HBase Filters. You will also learn what Zookeeper is all about, how it helps monitor a cluster & why HBase uses Zookeeper.

HBase Data Model,
HBase Shell,
HBase Client API,
Hive Data Loading Techniques,
Apache Zookeeper Introduction,
ZooKeeper Data Model,
Zookeeper Service,
HBase Bulk Loading,
Getting and Inserting Data,
HBase Filters

learning Apache Spark, SparkContext & Spark Ecosystem, and working in Resilient Distributed Datasets (RDD) in Apache Spark.

What is Spark,
Spark Ecosystem,
Spark Components,
What is Scala,
Why Scala,
SparkContext,
Spark RDD

comprehend how numerous Hadoop ecosystem components work together to solve Big Data problems, Flume & Sqoop demo, Apache Oozie Workflow Scheduler for Hadoop Jobs, and Hadoop Talend integration.

A. Discover the frequency of books published each year. (Hint: Sample dataset will be provided) B. Find out in which year the highest number of books were published C. Find out how many books were published based on ranking in 2002.

The Book-Crossing dataset consists of 3 tables that will be given to you.

A. Find a list of Airports operating in Country India B. Find the list of Airlines holding zero stops C. List of Airlines operating with codeshare D. Which Country (or) territory has the highest Airports E. Find the list of Active Airlines in the united state

In this service case, there are 3 data sets. Final_airlines, routes.dat, airports_mod.dat

Know Big Data and how it creates problems for traditional Database Management Systems like RDBMS; Cassandra solves these problems and understands Cassandra's features.

Intro to Big Data and Problems caused by it
5V – Volume, Variety, Velocity, Veracity, and Value
Traditional Database Management System
Limitations of RDMS
NoSQL databases
Common characteristics of NoSQL databases
CAP theorem
How does Cassandra solve the Limitations?
History of Cassandra
Features of Cassandra
VM tour

Know about Database Model and similarities between RDBMS and Cassandra Data Model. You will also understand the critical Database Elements of Cassandra and learn about the concept of Primary Key.

Introduction to Database Model
Understand the analogy between RDBMS and Cassandra Data Model
Understand the following Database Elements: Cluster, Keyspace, Column Family/Table, Column
Column Family Options
Columns
Wide Rows, Skinny Rows
Static and dynamic tables
Creating Keyspace
Creating Tables

Gain knowledge of architecting and creating Cassandra Database Systems, complex inner workings of Cassandra such as Gossip Protocol, Read Repairs, and so on.

Cassandra as a Distributed Database • Key Cassandra Elements a. Memtable b. Commit log c. SSTables
Replication Factor
Data Replication in Cassandra
Gossip protocol – Detecting failures
Gossip: Uses
Snitch: Uses
Data Distribution
Staged Event-Driven Architecture (SEDA)
Managers and Services
Virtual Nodes: Write path and Read path
Consistency level
Repair
Incremental repair

learn about Keyspace and its attributes in Cassandra, Keyspace, learn how to create a table, and perform operations like Inserting, Updating, and Deleting data from a table while using CQLSH.

Replication Factor
Replication Strategy
Defining columns and data types
Defining a partition key
Recognizing a partition key
Specifying a descending clustering order
Updating data
Tombstones
Deleting data
Using TTL
Updating a TTL
Create Keyspace in Cassandra
Check Created Keyspace in System_Schema.Keyspaces
Update Replication Factor of Previously Created Keyspace
Drop Previously Created Keyspace
Create A Table Using cqlsh
Make A Table Using UUID & TIMEUUID
Form A Table Using Collection & UDT Column
Construct a Secondary Index On a Table
Insert Data Into Table
Insert Data into Table with UUID & TIMEUUID Columns
Insert Data Using COPY Command
Deleting Data from Table

Learn how to add nodes in Cassandra and configure Nodes using the "Cassandra. yaml" file. Use nodetool to remove the node and restore the node into the service. In addition, by using the node tool repair command, learn the importance of repair and how to repair operation functions.

Cassandra nodes
Specifying seed nodes
Bootstrapping a node
Adding a node (Commissioning) in Cluster
Removing (Decommissioning) a node
Removing a dead node
Repair
Read Repair
What's new in incremental repair
Run a Repair Operation
Cassandra and Spark Implementation

Learn critical aspects of monitoring Cassandra: resources used by each node, response latencies to requests, requests to offline nodes, and the compaction process.

Cassandra monitoring tools
Logging
Tailing
Using Nodetool Utility
Using JConsole
Learning about OpsCenter
Runtime Analysis Tools
JMX and Jconsole
OpsCenter

learn about the importance of Backup and Restore functions in Cassandra and Create Snapshots in Cassandra, Hardware selection, and Performance Tuning (Configuring Log Files) in Cassandra, Cassandra integration with various other frameworks.

Creating a Snapshot
Restoring from a Snapshot
RAM and CPU recommendations
Hardware choices
Selecting storage
Types of Storage to Avoid
Cluster connectivity, safety, and the elements that impact dispersed system performance
End-to-end performance tuning of Cassandra clusters against massive data sets
Load balance and streams
Creating Snapshots
Integration with Kafka
Integration with Spark

learn about the Design, Implementation, and ongoing support of Cassandra Operational Data.

Security
Ongoing Support of Cassandra Operational Data
Hosting a Cassandra Database on Cloud
Hosting Cassandra Database on Amazon Web Services

Learn ETL Technologies and why Talend is referred to as the next Generation Leader in Big Data Integration, various products offered by Talend corporation, and their relevance to Data Integration and Big Data.

Working with ETL,
Rise of Big Data,
Part of Open Source ETL Technologies in Big Data,
Comparison with other market leader tools in the ETL domain,
Importance of Talend (Why Talend),
Talend and its Products,
Introduction of Talend Open Studio,
TOS for Data Integration,
GUI of TOS with Demo,
Creating a basic job

learn to work with various types of Data Sources, Target Systems supported by Talend, Metadata, and how to read/write from popular CSV/Delimited and fixed-width files. Connect to a Database, read/write/update data, read complex source systems like Excel and XML, and some essential components like a blog and tMap using TOS.

Launching Talend Studio,
Working with different workspace directories,
Working with projects,
Creating and executing jobs,
Connection types and triggers,
Most often used Talend components [tJava, tLogRow, tMap],
Read & Write Different Types of Source/Target Systems,
Working with files [CSV, XLS, XML, Positional],
Working with databases [MySQL DB],
Metadata management,
Creating a Business Model,
Adding Components to a Job,
Connecting the Components,
Reading and writing Delimited File,
Reading and writing Positional File,
Reading and writing XML and Xls/Xlsx Files,
Connecting Database(MySQL),
Retrieving Schema from the Database,
Reading from Database Metadata,
Recovering data from a file and inserting it into the Database,
Deleting data from Database,
Working with Logs and Error

understand Data Mapping and Transformations using TOS, filter and join various Data Sources using lookups and search and sort through them.

Context Variables,
Using Talend components,
tJoin,
tFilter,
tSortRow,
tAggregateRow,
tReplicate,
tSplit,
Lookup,
tRowGenerator,
Accessing job level/ component-level details within the job,
SubJob (using tRunJob, tPreJob, tPostJob),
Embedding Context Variables,
Adding different environments,
Data Mapping using tMap,
Using functions in Talend,
tJava,
tSortRow,
tAggregateRow,
tReplicate,
tFilter,
tSplit,
tRowGenerator,
Perform Lookup operations using tJoin,
Creating SubJob (using tRunJob, tPreJob, tPostJob)

understand the Transformation and various steps involved in looping jobs of Talend, ways to search files in a directory, and how to process them in a sequence, FTP connections, export, and import Jobs, run the jobs remotely, and parameterize them from the control line.

different components of file management (like tFileList, tFileAchive, tFileTouch, tFileDelete)
Error Handling [tWarn, tDie]
Type Casting (convert datatypes among source-target platforms)
Looping components (like tLoop, tForeach)
utilising FTP components (like tFTPFileList, tFTPFileExists, tFTPGet, tFTPPut)
Exporting and Importing Talend jobs
How to schedule and run Talend DI jobs externally (using Command line)
Parameterizing a Talend job from command line
executing File Management (like tFileList, tFileAchive, tFileTouch, tFileDelete)
Type Casting (tConvert and tMap(using Expression Builder)
Looping components (like tLoop, tForeach)
utilizing FTP components (like tFTPFileList, tFTPFileExists, tFTPGet, tFTPPut)
Exporting and Importing Talend Jobs
Parameterizing a Talend Job from command line

discover Big Data and Hadoop concepts, such as HDFS (Hadoop Distributed File System) Architecture, MapReduce, leveraging Big Data through Talend and Talend & Big Data Integration.

Big Data and Hadoop
HDFS and MapReduce
Benefits of using Talend with Big Data
Integration of Talend with Big Data
HDFS commands Vs. Talend HDFS utility
Big Data setup using Hortonworks Sandbox on your personal computer
Explaining the TOS for Big Data Environment
Creating a Project and a Job
Adding Components in a Job
Connecting to HDFS
'Putting' files on HDFS
Using tMap, tAggregate functions

learn Hive concepts and the setup of the Hive environment in Talend, Hive Big Data connectors in TOS, and implement Use Cases using Hive in Talend.

Hive and Its Architecture
Connecting to Hive Shell
Set connection to Hive database using Talend
Design Hive Managed and external tables through Talend
Load and Process Hive data using Talend
Transform data from Hive using Talend
Process and transform data from Hive
Load data from HDFS & Local File Systems to Hive Table utilizing Hive Shell
Execute the HiveQL query using Talend

Discover the PIG concepts, the setup of Pig Environment in Talend and Pig Big Data connectors in TOS for Big Data, and implement Use Cases using Pig in Talend. Also, you will be given an insight into Apache Kafka, its architecture, and its integration with Talend through a real-life use case.

Pig Environment in Talend
Pig Data Connectors
Integrate Personalized Pig Code into a Talend job
Apache Kafka
Kafka Components in TOS for Big data
Use Pig and Kafka connectors in Talend

develop a Project using Talend DI and Talend BD with MySQL, Hadoop, HDFS, Hive, Pig, and Kafka.

understand where Kafka fits in the Big Data space and Kafka Architecture, Kafka Cluster, its Components, and how to Configure a Cluster

Introduction to Big Data,
Big Data Analytics,
Need for Kafka,
What is Kafka?
Kafka Features,
Kafka Concepts,