We are looking for a Data Engineer to lead the modernization and maintenance of our mission-critical data infrastructure. This high-autonomy role offers the chance to shape our data engineering landscape, working with cutting-edge tools like Dagster and Kubernetes, and directly impacting our client’s linear TV optimization capabilities.
Location: 100% remote. US Central Time Working Hours
About the Company:
Abstra is a fast-growing, Nearshore Tech Talent services company, providing top Latin American tech talent to U.S. companies and beyond. Founded by U.S.-bred engineers with over 15 years of experience, Abstra specializes in sourcing skilled professionals across a wide range of technologies to meet our clients’ needs, driving innovation and efficiency.
Job Description:
About the Role
As the Data Engineer, you’ll be at the forefront of modernizing and maintaining mission-critical data infrastructure that powers enterprise-grade linear TV optimization for the client. This is a unique opportunity to own and transform the entire data engineering landscape of the client, working directly with the data platform built on Dagster and Kubernetes.
Responsibilities:
- Infrastructure Management & Optimization: Manage and optimize our Dagster/Kubernetes-based data platform, ensuring scalability, reliability, and efficiency. Oversee and maintain our pixel tracking and reporting infrastructure, ensuring accuracy, reliability, and seamless business operations throughout its lifecycle.
- Data Migration & Pipeline Development: Lead the transition of ~100 production data jobs from legacy Spark/EMR and Pentaho (Kettle/Spoon) systems to a modern Dagster/Kubernetes infrastructure. Build, maintain, and optimize robust ETL/ELT pipelines for TV audience analytics, including real-time audience size estimation, demographic analysis, and data quality assurance.
- Data Ingestion & Integration: Manage the ingestion of large datasets from multiple third-party providers, ensuring efficient, scalable, and reliable data integration processes. Develop and maintain data pipelines using Trino, Postgres, and AWS S3, handling high-volume structured and unstructured data.
- Performance Optimization & Monitoring: Optimize ingestion performance using parallel processing, incremental updates, and streaming architectures where applicable. Monitor and troubleshoot data pipelines, proactively identifying and resolving bottlenecks, failures, or inconsistencies in AWS S3, Trino, and Postgres environments. Implement monitoring, alerting, and optimization strategies to enhance data workflow performance.
- Data Quality & Compliance: Implement data validation, cleansing, and anomaly detection mechanisms to ensure high data quality and compliance with internal and industry standards. Ensure all data operations align with security and privacy best practices, implementing measures to protect sensitive information and maintain compliance with regulatory requirements.
- Technical Architecture & Product Alignment: Design and implement scalable data processing solutions for evolving product features. Collaborate closely with product and engineering teams to architect and develop new data solutions that align with business objectives.
- Cross-Functional Collaboration & Leadership: Work closely with software engineers, product managers, and other stakeholders to integrate machine learning models into applications. Serve as a liaison between technical and non-technical teams, translating business needs into technical requirements.
- Strategic Decision-Making & Stakeholder Communication: Work autonomously while maintaining clear communication with the VP of Data Intelligence and Technology and other stakeholders. Translate business requirements into technical solutions, effectively managing expectations and project timelines.
Qualifications:
- Bachelor’s degree in Computer Science, Engineering, or related field
- 3+ years of professional experience in data engineering, with significant experience in:
- ETL pipeline development and maintenance
- Big data processing frameworks (especially Apache Spark)
- Data warehouse architecture and optimization
Deep expertise in:
- Modern orchestration tools (Dagster, Airflow, or similar)
- Kubernetes and container technologies
- Python development for data processing
Proven experience with:
- Large-scale data pipeline migration projects
- Enterprise-grade ETL processes (such as Pentaho Kettle/Spoon)
- AWS services, particularly EMR and other data processing services
Strong understanding of:
- Data pipeline monitoring and optimization
- Data quality management and validation
- CI/CD practices for data infrastructure
Experience with:
- Real-time data processing and analytics
- Web analytics and pixel tracking systems (desired)
- SQL and NoSQL databases
Demonstrated ability to:
- Work autonomously in a complex technical environment
- Make architectural decisions with minimal oversight
- Communicate effectively with both technical and non-technical stakeholders
Plus:
- Experience with media/entertainment industry data
- Background in audience analytics or advertising technology
- History of successfully modernizing legacy data systems
What we offer:
- Flexible working hours and remote work options.
- Opportunities for professional growth and development.
- A collaborative and inclusive work environment.
- The chance to work on impactful projects with a talented team.
- Excellent compensation in USD.
- Hardware and software setup (if needed).
Job Features
Job Category | Data Analysis, Data and Analytics, Data Consultant, Data Engineering |
Type | Remote |
Time Zone | US Central Time |