Data Engineer

Harvard Medical School

Boston, MA

ID: 7092862
Posted: January 26, 2022
Application Deadline: Open Until Filled

Job Description

Job-Specific Responsibilities

The Center for Computational Biomedicine (CCB) is a new center within the Blavatnik Institute at Harvard Medical School. Our mission is to provide cutting edge computational capabilities, data analysis, and data integration technologies to support medical and biological research within the Medical School.
Based at the Harvard Medical School Longwood Campus, we are part of a vibrant community of scientists, physicians, and engineers whose goal is to advance the boundaries of knowledge and improve patient care. The working environment combines the best features of a startup (fast pace, flexibility, flat
hierarchies) with those of one of the leading medical schools (excellent benefits, outstanding opportunities for learning, great resources, name recognition).

CCB is looking for an individual to join the Data and Analytic Platforms Group, a group of engineers and
scientists developing data warehousing and analytic solutions in support of epidemiology, healthcare economics, machine learning, and basic science research.

The Group works to reduce the burden on faculty by developing centrally managed and shareable data
solutions to be used across research silos. We curate very large public and private healthcare utilization
(insurance claims, electronic health record), multi-omics, environmental exposure, and social determinants data sets, provision access to those curated data sets, and develop analytic frameworks to accelerate reproducible academic research on top of them. Collectively these data sets contain
information relating to hundreds of millions of patients.

This position reports to the Director of the CCB Data and Analytic Platforms Group. Primary responsibilities will include designing and implementing relational database architecture (schema, indexing, stored procedures, ETL processes, etc.) to warehouse multi-terabyte data sets in Microsoft SQL Server. This will include periodically evaluating various query performance metrics to ensure real-time availability to the research community and recommending modification to the underlying database platform to resolve any identified issues. The bulk of this design work will be left up with the candidate, while a small portion will involve refactoring (or strategically deciding to abandon) existing ETL /indexing strategies. The data sets will be staged into a combination of proprietary schemas as well as the open source i2b2 data model.

Additional opportunities will be available for the candidate to interact with individual scientific research
teams to help improve their workflows.

**The below Typical Core Duties are a generalized list provided by Harvard's Job Frameworks, and may not actually reflect the job-specific responsibilities of this position.

Typical Core Duties

Oversee aspects of data management services which may include data modeling and database and analytics platform design, database performance and optimization, recovery/load strategy and implementation, and data modeling
Lead team in development and enhancements of the data user interface including data acquisition/access analysis
Monitor status of assignments; review code and document scripts and procedures
Design and implement data verification and testing methods
Identify and evaluate opportunities to improve existing subject areas and applications and determine viability for adoption
Provide technical expertise and direction in developing and supporting system level programs
Identify areas for efficiency or improvement; recommend improvements
Create new standards and procedures related to end user and interface development, including user requirements
Partner with others on technical issues and system architecture definition
May manage vendor relationships
Provide training to clients and staff
Function as subject matter expert or project lead; advise unit/school
Abide by and follow the Harvard University IT technical standards, policies, and Code of Conduct

Basic Qualifications

Minimum of seven years’ post-secondary education or relevant work experience

Additional Qualifications and Skills

Bachelor’s Degree in Computer Science or related degree preferred. At least 5 years' experience as a software systems architect, including experience developing solutions with both relational database systems and at least one of the following languages: Java, Python, R.
Master’s Degree in a related field (Computer Science / Electrical Engineering, Bioinformatics, Statistics, Data Science, etc.) preferred.
Excellent communication skills, both written and oral
Experience with Microsoft SQL Server or cloud-based data warehousing technologies Experience designing and maintaining multi-terabyte analytic relational databases, including index and query optimization
Experience orchestrating and optimizing Extract-Transform-Load (ETL) processes for multiterabyte data warehouses
Comfort doing basic system administration in a Linux environment Comfort doing basic system administration in a Windows environment Experience with relational database index optimization
Experience with containerized (Docker or Singularity) workflows/paradigms
Experience with non-relational database systems (graph, key/value, document, array data stores) Experience with the R statistical computing platform
Experience with Java Experience with Python
Experience with high-performance computing
Comfort independently exploring distributed computing and database technologies and generating executive reports
Experience with public cloud platforms (AWS, Azure, Google Cloud)

Certificates and Licenses

Completion of Harvard IT Academy specified foundational courses (or external equivalent) preferred

Working Conditions

Work is performed in an office setting

Harvard Medical School strives to cultivate an environment that promotes inclusiveness and collaboration among students, faculty and staff and to create new avenues for discussion that will advance our shared mission to improve the health of people throughout the world.