EXL : Data Engineer – PySpark/Big Data

Apply
Brief Description of position:

We are looking for a Spark developer who knows how to fully exploit the potential of our Spark cluster. You will clean, transform, and analyze vast amounts of raw data from various systems using Spark to provide ready-to-use data to our feature developers and business analysts.

This involves both ad-hoc requests as well as data pipelines that are embedded in our production environment.


Must-Have Skills

  • Strong experience in Pyspark - Including Dataframe core functions and Spark SQL
  • Good experience in SQL DBs - Be able to write queries including fair complexity.
  • Should have excellent experience in Big Data programming for data transformation and aggregations
  • Good at ETL architecture. Business rules processing and data extraction from Data Lake into data streams for business consumption.
  • Good Analytical skills
  • Strong oral and written communication skills, including presentation skills


Roles And Responsibilities

  • Responsible for developing and maintaining data pipelines with PySpark
  • Scheduling Scala/Spark jobs for data transformation and aggregation
  • Performance Tuning wrt to executor sizing and other environmental parameters, code optimization, partitions tuning, etc.
  • Adhoc deep dives and root cause analysis for anomalies
  • Lead a Team of 2-3 Data Engineers
  • Interact with business users to understand requirements and troubleshoot issues.
  • Implement Projects based on functional specifications.
Support

We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy.

Feedback

We believe in making Analytics Vidhya the best experience possible for Data Science enthusiasts. Help us by providing valuable Feedback.