May 19, 2021
South San Francisco, California
Data Engineering plays a key role in insitro’s approach to rethinking drug development. The Data Engineering DevOps team ensures the infrastructure which powers our biological data factory’s robots, instruments, and machine learning platform is reliable, scalable, and manageable. You will work closely with a cross-functional team of scientists, bioengineers, and data scientists to identify areas where data engineering can make a difference, by developing data architectures and systems on cutting edge, high throughput platforms that enable our scientists to be maximally productive. You will design, implement, and deploy cloud infrastructure, including managed databases, application servers, data warehouses, and interactive/batch computing environments, and work as part of a team to rigorously design our data platform, identify key architectural performance improvements, and join an on-call rotation to ensure that insitro's platform runs at maximum productivity.
You will be joining as the founding team of a biotech startup that has long-term stability due to significant funding, but yet is very much in formation. A lot can change in this early and exciting phase, providing many opportunities for significant impact. You will work closely with a very talented team, learn a broad range of skills, and help shape insitro’s culture, strategic direction, and outcomes. Join us, and help make a difference to patients!
2-3 years of experience with provisioning AWS cloud services (Experience with GCP and Azure is also relevant).
Experience with cloud configuration and resource management tools such as Terraform
Experience architecting reliable infrastructure platforms including monitoring and alerting, load balancing, scalable services, multi-region
Experience with at least one high-end distributed data processing environment (Hadoop, Spark, etc)
Experience with batch computing systems such as AWS Batch, SLURM
Experience with container build and deployment systems like Docker, Kubernetes, or ECS
Ability to communicate effectively and collaborate with people of diverse backgrounds and job functions
Proficiency in Linux environment (including shell scripting and Python programming), experience with database languages (e.g., SQL, No-SQL) and experience with version control practices and tools (Git, Mercurial, etc.)
Passion for making a difference in the world
Nice to Have
Experience with biological data
Experience with managing medium-sized data sets (100TB+) in object storage systems like S3
Experience with defining infrastructure following compliance (GDPR, HIPAA, etc).
Experience with data processing pipelines
Experience with deploying and monitoring machine learning models in a production environment
Benefits at insitro
Excellent medical, dental, and vision coverage
Open vacation policy
Team lunches (catered daily)
Paid parental leave
This role may be based remotely, if preferred
insitro is a drug discovery and development company using machine learning and data generation at scale to transform the way that drugs are discovered and delivered to patients. We rely on human genetic cohorts, human-derived cellular disease models, and high-throughput biology and chemistry to identify coherent patient segments, actionable therapeutic targets, and new or existing chemical matter. The goal is to deliver predictive insights to improve the probability of success and reduce the number of costly dead ends along the R&D journey. The company has established enabling collaborations with Gilead in NASH and Bristol Myers Squibb in ALS and is building a pipeline of wholly owned and partnered medicines leveraging its unique insights on patient biomarkers, targets, and molecules. insitro is located in South San Francisco, CA and has raised over $600M from top tech, biotech, and crossover investors since formation in 2018. For more information on insitro, please visit the company’s website at www.insitro.com.