Big Data Engineer (R1077656) in Warsaw, PL at IQVIA™

发布日期: 10/6/2019

职位快速浏览

职位描述

IQVIA™ is the leading human data science company focused on helping healthcare clients find unparalleled insights and better solutions for patients. Formed through the merger of IMS Health and Quintiles, IQVIA offers a broad range of solutions that harness the power of healthcare data, domain expertise, transformative technology, and advanced analytics to drive healthcare forward.

IQVIA™ is the leading human data science company focused on helping healthcare clients find unparalleled insights and better solutions for patients. Formed through the merger of IMS Health and Quintiles, IQVIA offers a broad range of solutions that harness the power of healthcare data, domain expertise, transformative technology, and advanced analytics to drive healthcare forward.

As a senior data engineer this is a unique opportunity to leverage your skills to build unparalleled in-depth knowledge within our life science projects. You will participate in creating data science applications that improve life of patients through supporting various areas of healthcare like disease detection and drug research.

Working with petabyte sized amounts of data, modern distributed systems, advanced data science models and challenging requests in agile environment, you will help shape the way our team approaches prototyping and development of data-driven analytics products. This crucial role involves identifying opportunities for better data modelling, processing, scaling, governance, internal tooling, and other data engineering activities that will help our team maximize our efficiency in data science projects. You will have the opportunity to provide technical mentorship to the team, and to set the standards for code quality and software architecture for the projects you work on.

Your typical activities might include:

  • Collaborating with data scientists, developers and other data engineers to harness huge datasets in our on-premise Cloudera based cluster and helping to turn working prototypes into well-abstracted, reusable data pipelines and Spark based applications for iterative development of data science projects
  • Overseeing technical aspects of greenfield projects from concept to completion.
  • Digging into a variety of databases to engineer data pipelines to extract datasets for ML training and predictions
  • Creating data retention polices and developing jobs for workflow schedulers
  • Exploring new technologies to accelerate data engineering tasks
  • Providing technical leadership and assisting the team on execution of technical tasks centered on delivery of analytics models and generation of features vectors they require.
  • Identifying opportunities for improvements of data processing applications like improving stability and throughput
  • Comprehensive testing of your own code

Our ideal candidate will have:

  • A Bachelors or master’s degree in STEM field such as Computer Science, Engineering, Statistics, Mathematics, Biotechnology
  • Advanced knowledge of Scala or Python 3, with at least some familiarity in both and with one of them supported by 5+ years of programming experience using functional and object-oriented paradigms
  • At least some familiarity with second from Python and Scala pair
  • At least 3 years of experience with Hadoop ecosystem including tools like YARN, Hive, Impala, HDFS including some knowledge about Hadoop clusters architecture
  • Advanced knowledge about Spark supported by 2+ years of experience with at least one year with Spark 2.x.
  • Proficiency with relational databases and more than one dialect of SQL
  • Experience with more than one non-relational databases like MongoDB and Redis
  • Experience with workflow managers like Airflow, Azkaban or Luigi
  • Strong unit testing and debugging skills
  • Good understanding of code versioning tools such as Git and Linux proficiency
  • Experience in following Scrum best practices
  • Fluency in English (spoken and written)

We will also appreciate if you have some of:

  • Experience with Apache Flink and/or Beam
  • Experience with containerization tools such as Docker, Kubernetes
  • Proficient understanding of designing microservices based applications
  • Experience in putting machine learning models into production
  • Familiarity with advanced Python data structures like numpy arrays and pandas
  • Experience with deploying code into production through CI/CD tools like Jenkins
  • Experience in creating web crawlers
  • Experience with ELK stack and Kafka

Join Us

Making a positive impact on human health takes insight, curiosity, and intellectual courage. It takes brave minds, pushing the boundaries to transform healthcare. Regardless of your role, you will have the opportunity to play an important part in helping our clients drive healthcare forward and ultimately improve outcomes for patients.

Forge a career with greater purpose, make an impact, and never stop learning.



Job ID: R1077656

查找我们相似的职位

  1. 软件工程师职位
  2. 项目工程师职位