May 19, 2021

Data Engineer, DevOps

  • Insitro
  • South San Francisco, California
Full time Data

Job Description

Data Engineering plays a key role in insitro’s approach to rethinking drug development. The Data Engineering DevOps team ensures the infrastructure which powers our biological data factory’s robots, instruments, and machine learning platform is reliable, scalable, and manageable. You will work closely with a cross-functional team of scientists, bioengineers, and data scientists to identify areas where data engineering can make a difference, by developing data architectures and systems on cutting edge, high throughput platforms that enable our scientists to be maximally productive. You will design, implement, and deploy cloud infrastructure, including managed databases, application servers, data warehouses, and interactive/batch computing environments, and work as part of a team to rigorously design our data platform, identify key architectural performance improvements, and join an on-call rotation to ensure that insitro's platform runs at maximum productivity. You will be joining as the founding team of a biotech startup that has long-term stability due to significant funding, but yet is very much in formation. A lot can change in this early and exciting phase, providing many opportunities for significant impact. You will work closely with a very talented team, learn a broad range of skills, and help shape insitro’s culture, strategic direction, and outcomes. Join us, and help make a difference to patients! About You 2-3 years of experience with provisioning AWS cloud services (Experience with GCP and Azure is also relevant). Experience with cloud configuration and resource management tools such as Terraform Experience architecting reliable infrastructure platforms including monitoring and alerting, load balancing, scalable services, multi-region Experience with at least one high-end distributed data processing environment (Hadoop, Spark, etc) Experience with batch computing systems such as AWS Batch, SLURM Experience with container build and deployment systems like Docker, Kubernetes, or ECS Ability to communicate effectively and collaborate with people of diverse backgrounds and job functions Proficiency in Linux environment (including shell scripting and Python programming), experience with database languages (e.g., SQL, No-SQL) and experience with version control practices and tools (Git, Mercurial, etc.) Passion for making a difference in the world Nice to Have Experience with biological data Experience with managing medium-sized data sets (100TB+) in object storage systems like S3 Experience with defining infrastructure following compliance (GDPR, HIPAA, etc). Experience with data processing pipelines Experience with deploying and monitoring machine learning models in a production environment Benefits at insitro Excellent medical, dental, and vision coverage Open vacation policy Team lunches (catered daily) Commuter benefits Paid parental leave This role may be based remotely, if preferred About insitro insitro is a drug discovery and development company using machine learning and data generation at scale to transform the way that drugs are discovered and delivered to patients. We rely on human genetic cohorts, human-derived cellular disease models, and high-throughput biology and chemistry to identify coherent patient segments, actionable therapeutic targets, and new or existing chemical matter. The goal is to deliver predictive insights to improve the probability of success and reduce the number of costly dead ends along the R&D journey. The company has established enabling collaborations with Gilead in NASH and Bristol Myers Squibb in ALS and is building a pipeline of wholly owned and partnered medicines leveraging its unique insights on patient biomarkers, targets, and molecules. insitro is located in South San Francisco, CA and has raised over $600M from top tech, biotech, and crossover investors since formation in 2018. For more information on insitro, please visit the company’s website at

Apply Now