Data Engineer – PySpark/Big Data

EXL : Data Engineer – PySpark/Big Data

Brief Description of position:

We are looking for a Spark developer who knows how to fully exploit the potential of our Spark cluster. You will clean, transform, and analyze vast amounts of raw data from various systems using Spark to provide ready-to-use data to our feature developers and business analysts.

This involves both ad-hoc requests as well as data pipelines that are embedded in our production environment.

Must-Have Skills

Strong experience in Pyspark - Including Dataframe core functions and Spark SQL
Good experience in SQL DBs - Be able to write queries including fair complexity.
Should have excellent experience in Big Data programming for data transformation and aggregations
Good at ETL architecture. Business rules processing and data extraction from Data Lake into data streams for business consumption.
Good Analytical skills
Strong oral and written communication skills, including presentation skills

Roles And Responsibilities

Responsible for developing and maintaining data pipelines with PySpark
Scheduling Scala/Spark jobs for data transformation and aggregation
Performance Tuning wrt to executor sizing and other environmental parameters, code optimization, partitions tuning, etc.
Adhoc deep dives and root cause analysis for anomalies
Lead a Team of 2-3 Data Engineers
Interact with business users to understand requirements and troubleshoot issues.
Implement Projects based on functional specifications.

Support

EXL : Data Engineer – PySpark/Big Data

Brief Description of position:

Analytics Vidhya

Data Scientists

Companies

Visit us

Feedback

We believe in making Analytics Vidhya the best experience possible for Data Science enthusiasts. Help us by providing valuable Feedback.