Data Engineer – Big Data

KPMG : Data Engineer – Big Data

Data Engineer – Big Data

Level: Associate Consultant/Consultant/Assistant Manager

Role & ResponsibilitY

Evaluating, developing, maintaining and testing big data solutions for advanced analytics projects.
Design and implement big data platform with components like – Apache Spark, HBASE, Hive, Impala, PIG, Oozie etc
Develop ETL pipelines for processing large volumes of data using Spark Framework – using Spark (Scala/Java/Python).
Gather and process raw data at scale (including writing scripts, write SQL queries, etc.) to build features that will be used in modelling.
Responsible for ensuring data processing pipelines and systems are: secure, reliable, fault-tolerant, scalable, accurate and efficient
Cleaning data as per business requirements using streaming APIs or user defined functions.
Design and implement column family schemas of Hive and HBase within HDFS, assign schemas and create Hive tables.
Develop efficient Hive scripts with joins on datasets using various techniques.
Responsible for infrastructure that provides insight from raw data and handles diverse sources of data seamlessly.
Help in performance tuning of the platform, Hive queries etc.
Development of highly scalable, performance efficient APIs/Solutions/Microservices to enable downstream availability of data for various applications.

THE INDIVIDUAL

Excellent problem-solving skills in any object-oriented/ functional scripting languages: Java, Scala, Python / R.
Strong development experience with Apache Spark and its components (Core API, Spark SQL, Streaming)
Strong understanding and experience in distributed computing frameworks, particularly Apache Hadoop (YARN; MR & HDFS) and associated technologies -- one or more of Hive, Sqoop, Kafka Flume, Oozie, Zookeeper, etc.
Knowledge of different HDFS file formats – ORC, AVRO, Parquet etc
Experience with building stream-processing systems, using solutions such as Spark-Streaming, Flume, Kafka etc.
Experience in developing and deployment Spark jobs in Scala / Java / Python to cluster programmatically.
Experience with Hive Tuning, Bucketing, and Partitioning and create UDFs, UDAFs as per business needs.
Technical expertise regarding data models, data analytics, Big data, database design development, data mining and segmentation techniques
Experience on fine tuning and optimizing spark jobs, joins, map reduce jobs for performance.
Experience with working on any of the major Hadoop distributions (Cloudera, HortonWorks etc)

Qualification

CRITERIA

WoRK Timing

Monday to Friday

WoRK Location

KPMG India, Bangalore, Mumbai, Pune, Gurgaon

Bangalore, Mumbai, Pune, Gurgaon