Press Enter


    Press Enter


    Data Infrastructure Engineer at Heetch () (allows remote)
    Heetch Employer
    Job Type
    Job Location
    Full Time

    Job Description:

    Data Engineering Team @Heetch

    Our team's mission is to help the company generate confident insights, make better decisions and build data-driven products. We believe the data platform is the digital nervous system of Heetch and that empowering everyone in the company with data access is critical to our business success. As a new sub-team within Data Engineering, the Data Infrastructure team is dedicated to designing, building and scaling our data platform and the underlying data infrastructure.

    What will be your role?

    You will enable Data Scientists, Data Analysts, and Operations teams, tailor the data platform to their needs and empower them to solve challenging ML and analytics problems. If you're experienced, passionate and interested in leading the transformation of our data infrastructure, we would love to talk to you!

    Does it sound like you?

    • You've architected, built, scaled, tuned and maintained large-scale distributed systems in a production environment, specifically on top of AWS.

    • You've got proven experience working with data technologies that power data platforms (e.g.: Spark, Presto, Kafka, Airflow, Avro, Redshift, ElasticSearch, etc.).

    • You've led DevOps topics such as CI/CD, containerization, monitoring, etc. in a data ecosystem.

    • You display strong coding skills in Python and Scala with a focus on maintainability, scale, and automation.

    • You love to work autonomously and take on unconstrained problems.

    • You can drive a vision, estimate the associated tasks and plan from development to delivery.

    • You take pride in sharing and gathering knowledge through documentation, advocacy and getting soaked in stakeholders use cases.

    What will you do?

    • Build frameworks, libraries, and abstractions to enable easy and reliable data processing, ingestion and exposition

    • Automate data pipeline and services deployment and configuration management

    • Support, manage and handle operations on cloud-based data technologies (e.g., clusters, serverless applications, APIs, databases)

    • Monitor the health of the data platform through automation

    • Handle periodic on-call rotations

    • Allow data engineering and data science to execute their pipelines through workflow management

    What are going to be your challenges?

    • Build the next generation of our data platform using open source big data technologies such as Kafka, Kafka Streams, Airflow, Spark, Metacat and Kubernetes

    • Enable data scientists to test and productionize various ML models to enhance the performance of our marketplace

    • Craft robust infrastructure foundations to support API-based data access including finatra microservices and AWS Lambda functions

    • Support, manage and handle operations on our MPP databases (Redshift, Presto)

    • Design change data capture from PostgreSQL databases to feed the data lake

    • Simplify data integration with Apache Gobblin

    • Enable dataset discovery, metadata exploration, and change notification

    • Unlock acceptance testing with Airflow, Spark, and Cucumber


    Flexible Hours
    Letter of Recommendation
    Note : This project is an external project, and it was posted on the platform by the Gradbee Team. We curate all the internships available across the internet by visiting company websites, and social networks like Facebook, LinkedIn, WhatsApp, Twitter etc. If you are the owner of this internship / project and need to get it removed, kindly mail us at [email protected]