KPMG : Data Engineer – Big Data

Brief Description of position:

Data Engineer – Big Data

Level: Associate Consultant/Consultant/Assistant Manager

Role & ResponsibilitY

  • Evaluating, developing, maintaining and testing big data solutions for advanced analytics projects.
  • Design and implement big data platform with components like – Apache Spark, HBASE, Hive, Impala, PIG, Oozie etc
  • Develop ETL pipelines for processing large volumes of data using Spark Framework – using Spark (Scala/Java/Python).
  • Gather and process raw data at scale (including writing scripts, write SQL queries, etc.) to build features that will be used in modelling.
  • Responsible for ensuring data processing pipelines and systems are: secure, reliable, fault-tolerant, scalable, accurate and efficient
  • Cleaning data as per business requirements using streaming APIs or user defined functions.
  • Design and implement column family schemas of Hive and HBase within HDFS, assign schemas and create Hive tables.
  • Develop efficient Hive scripts with joins on datasets using various techniques.
  • Responsible for infrastructure that provides insight from raw data and handles diverse sources of data seamlessly.
  • Help in performance tuning of the platform, Hive queries etc.
  • Development of highly scalable, performance efficient APIs/Solutions/Microservices to enable downstream availability of data for various applications.

THE INDIVIDUAL

  • Excellent problem-solving skills in any object-oriented/ functional scripting languages: Java, Scala, Python / R.
  • Strong development experience with Apache Spark and its components (Core API, Spark SQL, Streaming)
  • Strong understanding and experience in distributed computing frameworks, particularly Apache Hadoop (YARN; MR & HDFS) and associated technologies -- one or more of Hive, Sqoop, Kafka Flume, Oozie, Zookeeper, etc.
  • Knowledge of different HDFS file formats – ORC, AVRO, Parquet etc
  • Experience with building stream-processing systems, using solutions such as Spark-Streaming, Flume, Kafka etc.
  • Experience in developing and deployment Spark jobs in Scala / Java / Python to cluster programmatically.
  • Experience with Hive Tuning, Bucketing, and Partitioning and create UDFs, UDAFs as per business needs.
  • Technical expertise regarding data models, data analytics, Big data, database design development, data mining and segmentation techniques
  • Experience on fine tuning and optimizing spark jobs, joins, map reduce jobs for performance.
  • Experience with working on any of the major Hadoop distributions (Cloudera, HortonWorks etc)

Qualification

  • BE/BTech/MCA
  • 3-8 years of strong experience in 3-4 of the above-mentioned skills.

CRITERIA

  • Education 60% above throughout academics
  • One 3 years (at least) regular course is must either Diploma or Graduation

WoRK Timing

Monday to Friday

WoRK Location

KPMG India, Bangalore, Mumbai, Pune, Gurgaon

Location
Bangalore, Mumbai, Pune, Gurgaon
Support

We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy.

Feedback

We believe in making Analytics Vidhya the best experience possible for Data Science enthusiasts. Help us by providing valuable Feedback.