Data Infrastructure Engineer

Kennesaw State University

Kennesaw, GA

ID: 7234342
Posted: April 2, 2024
Application Deadline: Open Until Filled

Job Description

Job Summary
Focuses on building and maintaining the data infrastructure including the extraction, loading, and staging of data from various data sources, both on-premises or in the cloud. Ensures the seamless and secure transfer of data, optimizing for performance, integration and reliability to enable subsequent data transformation and modeling processes for enterprise reporting and analytics.

Responsibilities
KEY RESPONSIBILITIES:
1. Develops, tests, and implements data pipelines to efficiently, securely, and reliably ingest data from various data sources into the enterprise data lake or data warehouse
2. Collaborates with data architects and data source SMEs to understand requirements and designs data pipelines accordingly
3. Develops and maintains robust data integrity checks, ensuring data accuracy, timeliness, and consistency
4. Utilizes cloud tools and custom scripts for validation, set up automated anomaly detection, and collaborates with stakeholders to align quality checks with data requirements
5. Enables integration of data from various data sources, such as databases, cloud services and APIs, to facilitate seamless data flow for enterprise and self-service reporting and analytics
6. Monitors and optimizes data pipelines, identifies and troubleshoots issues using tools such as Azure Monitor or similar system
7. Implements automated alerts and collaborates with architects to adopt best practices in pipeline design for enhanced performance
8. Maintains compliance with data security policies and regulations in the data infrastructure
9. Implements encryption, manage access controls, conduct security audits, and stay updated with security best practices to safeguard data integrity and privacy
10. Manages the storage structure within the data lake or data warehouse, optimizing resource utilization and ensuring efficient integration with data transformation and modeling processes
11. Supports senior technical staff in project planning and the development of standard operating procedures
12. Contributes to the establishment of best practices, ensuring project alignment with technical standards and organizational goals
13. Creates and regularly updates technical documentation for data pipeline processes
14. Ensures clear, comprehensive, and accessible documentation is available, covering all aspects of pipeline design, operation, and maintenance

Required Qualifications
Educational Requirements
High school diploma or equivalent

Required Experience
Five (5) years of related IT experience.

Preferred Qualifications
Preferred Educational Qualifications
An undergraduate or advanced degree from an accredited institution of higher education in Computer Science, Information Systems, Business Administration or related field

Preferred Experience
Experience working with reporting tools such as Power BI, Tableau, etc.
Previous work experience in Higher Education

Knowledge, Skills, & Abilities
ABILITIES
Commitment to continuous learning and staying updated with the latest trends and best practices in data engineering
Able to handle multiple tasks or projects at one time meeting assigned deadlines

KNOWLEDGE
Familiarity with data protection and privacy laws and regulations (i.e., FERPA, HIPAA, etc.)
Knowledge of various file formats used in data storage, like parquet, avro, csv, etc., and their implications on performance and storage
Understanding of cloud storage services, such as blob storage, data lakes, data lakehouses, and data warehouses
Understanding of data warehouse architecture patterns (e.g., Medallion Architecture, OBT, Materialized View Pattern, Star-Schema, etc.)
Knowledge of data warehousing principles, including data quality, data enrichment and standardization, and data modeling.
Knowledge of data warehouse architecture patterns, such as star schema, One Big Table (OBT), materialized view architecture, etc.
Knowledge of best practices in data pipelines orchestration
Knowledge of data security practices, including encryption/decryption, and compliance with data governance policies and guidelines

SKILLS
Excellent interpersonal, initiative, teamwork, problem solving, independent judgment, organization, communication (verbal and written), time management, project management and presentation skills
Demonstrated skills in relational databases (e.g., SQL Server, Oracle, MySQL, etc.)
Demonstrated skills in data engineering tools (e.g., Azure Data Factory, SSIS, Informatica, Pentaho, Oracle Data Integrator, etc.)
Skills in identifying bottlenecks in data pipelines and optimizing for efficiency and scalability
Skills in developing efficient and scalable data ingestion pipelines between cloud-based and on-premises data sources and destinations
Proficient with computer applications and programs associated with the position (i.e., Microsoft Office suite and other collaboration tools)
Proficiency with SQL and its variants (e.g., PL/SQL, T-SQL, etc.)
Proficiency in programming/scripting languages (e.g., Python, Java, PowerShell, etc.)
Proficiency in data engineering technologies and tools (e.g., Azure Data Factory, Apache Spark, Azure Synapse Analytics, Python, Airflow, etc.)
Strong attention to detail and organization skills
Strong customer service skills and phone and email etiquette