Most of all, we are creators. From designing ground-breaking products to finding unique ways to solve technical challenges at an exceptional scale, our tech teams work with state of the art methodologies to shape the future of advertising.
The Site Reliability teams keep one of the largest computing platforms in the AdTech world functioning like clockwork. They are processing, storing and monitoring through large scale data compute & storage services (Hadoop, SQL & NoSQL), streaming (Kafka), platform as a service (Chef, Mesos), identity management (Kerberos) and analytics (Hive, Druid, Vertica).
The "Lake" team builds, maintains and improves the platform on which Criteo runs its Engine and its analytics (more than 300k jobs per day) and provides the means to tune these jobs. Our two Hadoop clusters comprizes 3000 nodes and 600Tb of RAM each.
As an Engineer, you will:
> Be in charge of keeping a high SLA level for our YARN platform running our +300ks daily jobs and their 50M JVM containers performs at its best at all time
> Support Dev teams to understand how to optimize and get the most out of their jobs
> Build, improve and operate our distributed jobs introspection platform:
- - Garmadon, our live containers metrics collector service
- - Dr-Elephant and our extensions for smart jobs performance analysis
- - Oops, our live profiling service
- - Our historical hadoop, spark, flink, … metrology service
> Contribute to open-source projects (working on Hadoop and Spark batch, Garmadon and Mumak)
> Mentor junior engineers
> Participate and drive team level and R&D wide initiatives
You will work with the following technologies: Hadoop, Spark, Flink, MapReduce, Mesos, Prometheus, Consul, Chef, ... Your profile :
- Very knowledgeable about JVM Performances and Java/Scala development
- Good experience with Python/Ruby development
- You continuously grow your Systems and Software Engineering skills and use these for company-wide improvements
- You have in-depth knowledge of the Linux operating system and have hands-on operational experience in performance troubleshooting
- You have a strong interest in DevOps topics
- Great oral and written communication and presentation skills in English
- A “can do” attitude and the ability to work on problems by thinking positively and in a collaborative manner