As a Site Reliability Engineer, you will champion our SRE practice and own the reliability of our services and applications. As an SRE, you will work closely with our engineering teams to build mature, production-ready services and applications. As part of the SRE team, you will help define our standards for monitoring, alerting, scalability, and production-readiness. You will monitor and report on the uptime of our systems and services, the performance of our applications, and the capacity of our platform.
The SRE team owns our incident response process. As an SRE, you will be a front-line responder to production incidents. We like writing runbooks to make operations and on-call easier. We get excited about things like runbook automation, autoscaling, and metrics.
To be successful, you’ll need:
- A proven career working in a Linux-based environment
- Experience monitoring, operating and tuning production applications (our teams write Java)
- Experience operating and scaling services in AWS using technologies such as: EC2, ALB, VPC, RDS, and Aurora
- Experience with automation and infrastructure management tools such as Ansible, Terraform, and Docker
- Familiarity with popular monitoring tools such as: New Relic, DataDog, Prometheus, CloudWatch, and the ELK stack
- Prior experience participating in an on-call rotation
- A passion for automation and optimization and an unrelenting commitment to a good customer experience
What We Offer:
We are passionate about creating memorable experiences for our fans… and the best in class experience for our employees. Vivid Seats offers competitive compensation levels, individual and team-based bonus opportunities, generous benefits package and Flex PTO policy plus a variety of workplace perks. The most exciting one: We offer our employees $100 worth of credits each month to spend on Vivid Seats tickets along with promotional discounts. At the heart of it, we are all fans of great live events. We want to help you get there more often.