Fractal : MLOps Engineer
Brief Description of position:
Fractal is looking for people who are passionate around solving business problems through innovation and engineering practices. You'll be required to apply your depth of knowledge and expertise to all aspects of the analytical problem solution lifecycle, as well as partner continuously with your many stakeholders on a daily basis to stay focused on common goals. We embrace a culture of experimentation and constantly strive for improvement and learning. You'll work in a collaborative, trusting, thought-provoking environment-one that encourages diversity of thought and creative solutions that are in the best interests of our customers globally.
As an MLOps Engineer, you will work collaboratively with Data Scientists and Data engineers to deploy and operate systems. You’ll help automate and streamline our operations and processes. You’ll build and maintain tools for deployment, monitoring, and operations. You’ll also troubleshoot and resolve issues in development, testing, and production environments
- Operate and maintain systems supporting the provisioning of new clients, applications, and features.
- Day-to-day monitoring of the Production service delivery environment to ensure all services and applications are operating optimally and SLAs are met.
- Software deployment and configuration management in both QA and Production environments.
- Collaborate with Data Scientists and Data Engineers on feature development teams to containerize and build out deployment pipelines for new modules
- Design, build and optimize applications’ containerization and orchestration with Docker and Kubernetes and AWS or Azure
- Automate applications and infrastructure deployments.
- Produce build and deployment automation scripts to integrate between services
- Be a subject matter expert on DevOps practices, CI/CD and Configuration Management with assigned engineering team
- Experience with one of the cloud computing platforms: Google Cloud, Amazon Web Service, Azure, Kubernetes.
- Experience in MLFlow, Qubeflow, MLTracking, MLExperiments
- Experience in big data technologies preferred: Hadoop, Hive, Spark, Kafka.
- Knowledge of machine learning frameworks: Tensorflow, Caffe/Caffe2, Pytorch, Keras, MXNet, Scikit-Learn.
- At least 3 years’ experience working with cloud-base services and DevOps concepts, tools and practices
- Extensive experience with Unix/AIX/Linux environments
- Experience with Kubernetes or Docker Swarm
- Experience working in cross-functional Agile engineering teams
- Familiarity with standard concepts and technologies used in CI/CD build, deployment pipelines
- Experience with scripting and coding using Python, Shell
- Experience with configuration using tools such as Chef, Ansible
- Experience with automation servers such as Jenkins, CloudBees, Travis
- Experience with logging tools such as Splunk, ElasticSearch, Kibana, Logstash
- Experience with monitoring tools such as Munin, Prometheus, Grafana, AlertManager, PagerDuty
- Big data technical stack experience is a plus such as HDFS, Spark, Ambari, ZooKeeper, Kafka
- Excellent Written and Verbal Communication Skills
- Ability to collaborate effectively with highly technical resources in a fast-paced environment
- Ability to solve complex challenges/problems and rapidly deliver innovative solutions
Minimum Work Experience: