Overview:
Fractal is looking for people who are passionate around solving business problems through innovation and engineering practices. You'll be required to apply your depth of knowledge and expertise to all aspects of the analytical problem solution lifecycle, as well as partner continuously with your many stakeholders on a daily basis to stay focused on common goals. We embrace a culture of experimentation and constantly strive for improvement and learning. You'll work in a collaborative, trusting, thought-provoking environment-one that encourages diversity of thought and creative solutions that are in the best interests of our customers globally.
As an MLOps Engineer, you will work collaboratively with Data Scientists and Data engineers to deploy and operate systems. You’ll help automate and streamline our operations and processes. You’ll build and maintain tools for deployment, monitoring, and operations. You’ll also troubleshoot and resolve issues in development, testing, and production environments
Responsibilities:
Operate and maintain systems supporting the provisioning of new clients, applications, and features.
Day-to-day monitoring of the Production service delivery environment to ensure all services and applications are operating optimally and SLAs are met.
Software deployment and configuration management in both QA and Production environments.
Collaborate with Data Scientists and Data Engineers on feature development teams to containerize and build out deployment pipelines for new modules
Design, build and optimize applications’ containerization and orchestration with Docker and Kubernetes and AWS or Azure
Automate applications and infrastructure deployments.
Produce build and deployment automation scripts to integrate between services
Be a subject matter expert on DevOps practices, CI/CD and Configuration Management with assigned engineering team
Experience with one of the cloud computing platforms: Google Cloud, Amazon Web Service, Azure, Kubernetes.
Experience in MLFlow, Qubeflow, MLTracking, MLExperiments
Experience in big data technologies preferred: Hadoop, Hive, Spark, Kafka.
Knowledge of machine learning frameworks: Tensorflow, Caffe/Caffe2, Pytorch, Keras, MXNet, Scikit-Learn.
Skills:
At least 3 years’ experience working with cloud-base services and DevOps concepts, tools and practices
Extensive experience with Unix/AIX/Linux environments
Experience with Kubernetes or Docker Swarm
Experience working in cross-functional Agile engineering teams
Familiarity with standard concepts and technologies used in CI/CD build, deployment pipelines
Experience with scripting and coding using Python, Shell
Experience with configuration using tools such as Chef, Ansible
Experience with automation servers such as Jenkins, CloudBees, Travis
Experience with logging tools such as Splunk, ElasticSearch, Kibana, Logstash
Experience with monitoring tools such as Munin, Prometheus, Grafana, AlertManager, PagerDuty
Big data technical stack experience is a plus such as HDFS, Spark, Ambari, ZooKeeper, Kafka
Excellent Written and Verbal Communication Skills
Ability to collaborate effectively with highly technical resources in a fast-paced environment
Ability to solve complex challenges/problems and rapidly deliver innovative solutions