Propellor.ai : Senior Data Engineer

Brief Description of position:

Job Description – Senior Data Engineer

At ThinkBumblebee, we are building an exceptional team of Data engineers who are passionate developers and wants to push the boundaries to solve complex business problems using the latest tech stack. As a Senior Data Engineer, you will work with various Technology and Business teams to deliver our Data Engineering offerings at large scale to our clients across the globe.

The person

  • Articulate
  • High Energy
  • Passion to learn
  • High sense of ownership
  • Ability to work in a fast-paced and deadline driven environment
  • Loves technology
  • Highly skilled at Data Interpretation
  • Problem solver
  • Must be able to see how the technology and people together can create stickiness for long term engagements

The Ask

Experience

Experience 5 to 7 years of demonstrable experience designing technological solutions to complex data problems, developing & testing modular, reusable, efficient and scalable code to implement those solutions. Ideally, this would include work on the following technologies:

  • Expert-level proficiency in PySpark knowledge
  • Expert in Python
  • Strong understanding and experience in distributed computing frameworks, particularly Apache Hadoop (YARN, MR, HDFS) and associated technologies - one or more of Hive, Sqoop, Avro, Flume, Oozie, Zookeeper, Impala, etc
  • Hands-on experience with Apache Spark and its components (Streaming, SQL, MLLib) is a strong advantage.
  • Operating knowledge of cloud computing platforms (AWS/Azure/GCP)
  • Experience working within a Linux computing environment, and use of command line tools including knowledge of Shell/Python scripting for automating common tasks
  • Ability to work in a team in an agile setting, familiarity with JIRA and clear understanding of how Git works or any version control tools

In addition, the ideal candidate would have great problem-solving skills, and the ability & confidence to hack their way out of tight corners.

Must Have (hands-on) experience:

  • Python and PySpark expertise
  • Distributed computing frameworks (Hadoop Ecosystem & Spark components)
  • Must be proficient in any Cloud computing platforms (AWS/Azure/GCP)
  • Experience in GCP services will be preferred (Big Query/Bigtable , Pub sub , Data Flow , App engine )
  • Linux environment, SQL and Shell scripting Desirable (would be a plus)
  • Statistical or machine learning DSL like R
  • Distributed and low latency (streaming) application architecture
  • Row store distributed DBMSs such as Cassandra, CouchDB, MongoDB, etc
  • Familiarity with API design

In addition to the above, the individual must have :

  • Proven track record in keeping existing technical skills and developing new ones, so that you can make strong contributions to deep architecture discussions around systems and applications in the cloud (Google Cloud Platform - GCP)
  • Characteristics of a forward thinker and self-starter that flourishes with new challenges and adapts quickly to learning new knowledge
  • Ability to work with a global team of consulting professionals across multiple projects
  • Knack for helping an organization to understand application architectures and integration approaches, to architect advanced cloud-based solutions, and to help launch the build-out of those systems
  • Passion for educating, training, designing, and building end-to-end systems for a diverse and challenging set of customers to success

Education:

  • B.Tech. or Equivalent degree in CS/CE/IT/ECE/EEE

Role Description

  • The role would involve big data pre-processing & reporting workflows including collecting, parsing, managing, analyzing and visualizing large sets of data to turn information into business insights
  • Develop the software and systems needed for end-to-end execution on large projects
  • Work across all phases of SDLC, and use Software Engineering principles to build scalable solutions
  • Build the knowledge base required to deliver increasingly complex technology projects
  • You would be responsible for evaluating, developing, maintaining and testing big data solutions for advanced analytics projects
  • The role would also involve testing various machine learning models on Big Data, and deploying learned models for ongoing scoring and prediction.
  • An appreciation of the mechanics of complex machine learning algorithms would be a strong advantage.
  • Will be an integral part of client business development and delivery engagements
Location
Remote
Minimum Qualification:
Graduate
Support

We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy.

Feedback

We believe in making Analytics Vidhya the best experience possible for Data Science enthusiasts. Help us by providing valuable Feedback.