May 19, 2021

Data Engineer

  • DNAnexus
  • Mountain View, CA
Full time Data

Job Description

Company Description DNAnexus is the leading cloud-based SaaS company serving the global life science community. DNAnexus' health informatics platform serves customers across a spectrum of industries — government, biopharmaceutical, clinical diagnostics, healthcare, and academic research in 33 countries with compliant protection of data, privacy, and intellectual property. The platform provides a secure and collaborative environment where genomics, multi-omics, and real world data can be combined with clinical data at scale, providing new insights that can lead to improved diagnostics, new targeted therapies and better patient care. For more information on DNAnexus, please visit www.dnanexus.com or follow the company @DNAnexus. Job Description DNAnexus xVantage Group is passionate about partnerships and client success. Our partnership culture is as important as the technology we provide our clients. Our mission is to help our clients achieve their research and clinical goals with DNAnexus solutions and services. Our team includes highly sought after experts including data scientists, bioinformaticians, cloud computing experts, and software engineers. Job Description As a Data Engineer on the DNAnexus xVantage team you will have an opportunity to partner closely with our customers, working with their data from profiling to ingestion onto our platform. You will have the opportunity to work closely with scientists at the world’s top healthcare providers and pharmaceutical companies to enable their precision medicine use cases. The ideal candidate is a bioinformatician / computational biologist with success in leading research studies combining next-generation sequencing data with various forms of phenotypic, transcriptomic, metabolomic, and other clinical data. They will have an extremely strong background in statistics, data engineering, and analysis including techniques for the quality control of NGS data and the proper handling of missing data. Your work will include modeling, normalization, harmonization and integration of clinical/healthcare data from multiple sources into linked clinico-genomic databases that drive scientific insight. You will partner closely with experts across the DNAnexus Data Science, Engineering, and Product teams to optimize tools and processes for clinical data pre-processing, organization, and presentation utilizing DNAnexus data frameworks. Specific Responsibilities Include Understand, organize, analyze and interpret genetic and phenotypic datasets Work with real-world datasets including the UK Biobank and other public and private datasets to enable research goals Develop and apply analytical approaches for large, complex genomic data sets in conjunction with clinical, phenotypic, and multi-omics data Develop analysis and visualization approaches that allow domain scientists to quickly and accurately interpret complex analysis results Qualifications Qualifications / Experience M.S. in bioinformatics, computational biology, computer science or related biotechnology field; Ph.D. and interdisciplinary exposure preferred 3+ years of experience in bioinformatics, biostatistics, genomics, statistical genetics, population genetics, systems biology, and/or translational research in either academic or industry settings Direct experience in clinical diagnostics, therapeutics, or biomarker discovery and development is a strong plus Ability to develop reusable, well-tested software in Python and bash Ability to write documentation that accurately and meaningfully captures designs and implementation details Ability to work well within a distributed team of developers, peer review code, and own components of our platform Ability to work in an agile environment with strong automated testing practices Desired Knowledge And Skills Deep understanding of existing techniques for managing and analyzing genomic, clinical/phenotypic, pharmacokinetic, and other molecular data (transcriptomic, metabolomic, proteomic, microbiome), and the challenges in aggregating datasets for reuse in follow on studies An understanding of human genetics and the effects on human diseases, e.g. oncology, immunology, cardiovascular Strong computational ability and knowledge of programming languages used for data analysis such as Python, R, Julia, etc. Familiarity with software development and project management processes and tools (agile, JIRA, wiki, etc.) Experience with big data analytics technologies including Spark, Hive, and Hadoop, and an understanding of relational database concepts Experience working with GWAS and other population genomics/PheWAS toolsets, e.g. PLINK, Hail, ADAM Experience working with large-scale omics datasets, e.g. ENCODE, 1000 Genomes, ExAC/gnomAD, TCGA Familiarity with integrated tools such as GDC DAVE, cBioPortal, i2b2 tranSMART, Spotfire, UCSC Genome Browser, and Ingenuity Pathway Analysis Strong organizational skills and excellent written and oral communication skills Entrepreneurial “can do” attitude with the ability to find creative, pragmatic solutions Additional Information Why Our Work Is Important Science is social. It is about discovering, sharing and building upon the findings of others. For biomedical data to realize its true value, it requires openness and collaboration, but at the same time a secure environment to handle sensitive information. DNAnexus provides a dynamic platform where data protection, analysis, and collaboration allow for open conversations, new ideas and precise outcomes. Precision medicine is the future. Join our team and be a part of this revolution, delivering on the potential to predict, prevent, and personalize treatments through precision medicine. If you are interested in joining our team, please apply today! All your information will be kept confidential according to EEO guidelines.

Apply Now