Brief Description of position:
Does working with data on a day to day basis excite you? Are you interested in building robust data architecture to identify data patterns and optimise data consumption for our customers, who will forecast and predict what actions to undertake based on data? If this is what excites you, then you’ll love working in our intelligent automation team.
Schneider Digital is leading the digital transformation of Schneider Electric by building highly available, massive scalable digital platform for the enterprise.
We are looking for a savvy Lead Data Engineer to join our growing team of AI and machine learning experts. You will be responsible for expanding and optimizing our data and data pipeline architecture, as well as optimizing data flow and collection for cross functional teams. The ideal candidate is an experienced data pipeline builder and data wrangler who enjoys optimizing data systems and building them from the ground up.
The Data Engineer will support our software engineers, data analysts and data scientists on data initiatives and will ensure optimal data delivery architecture is consistent throughout ongoing projects. They must be self-directed and comfortable supporting the data needs of multiple teams, systems and products.
Qualifications - External
- Create and maintain optimal data pipeline architecture; assemble large, complex data sets that meet functional / non-functional requirements
- Design and build production data pipelines from ingestion to consumption within a big data architecture
- Build the necessary datamarts, data warehouse required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and AWS ‘big data’ technologies.
- Create necessary preprocessing and postprocessing for various forms of data for training/ retraining and inference ingestions as required
- Create data visualization and business intelligence tools for stakeholders and data scientists for necessary business/ solution insights
- Identify, design, and implement internal process improvements: automating manual data processes, optimizing data delivery, etc.
- Ensure our data is separated and secure across national boundaries through multiple data centers and AWS regions
Requirements and Skills
- You should have a bachelors or master’s degree in computer science, Information Technology or other quantitative fields
- You should have at least 8 years working as a data engineer in supporting large data transformation initiatives related to machine learning, with experience in building and optimizing ‘big data’ pipelines and data sets
- Strong analytic skills related to working with unstructured datasets.
- Experience with big data tools: Hadoop, Spark, Kafka, etc.
- Experience with relational SQL and NoSQL databases, including Postgres and Cassandra.
- Experience with data pipeline and workflow management tools: Azkaban, Luigi, Airflow, etc.
- Experience with AWS cloud services: EC2, EMR, RDS, Redshift and familiarity with various log formats from AWS.
- Experience with stream-processing systems: Storm, Spark-Streaming, etc.
- Experience with object-oriented/object function scripting languages: Python, Java, C++, etc.