30 Sep
GSPANN Technologies
Secunderabad
Dynatrace, Splunk, Datadog, Grafana, New Relic, Dashboards, Azure, Python, Kubernetes, Docker, GitLab, Jenkins, Ansible, Terraform, DevOps, Troubleshooting, SLO/SLAs Monitoring, Incident Response, Root Cause Analysis (RCA), E2E Implementation
Description
GSPANN is hiring a Site Reliability Engineer (SRE) for its Pune or Hyderabad location. This full-time role focuses on enhancing the reliability of global eCommerce platforms through automation, observability, and cloud-native tools like Azure, Kubernetes, and Terraform.
Location: Pune / Hyderabad
Role Type: Full Time
Published On: 2 June 2025
Experience: 3 - 8 Years
Share this job
Description
GSPANN is hiring a Site Reliability Engineer (SRE)
for its Pune or Hyderabad location. This full-time role focuses on enhancing the reliability of global eCommerce platforms through automation, observability, and cloud-native tools like Azure, Kubernetes, and Terraform.
Role and Responsibilities
- Use monitoring tools such as Dynatrace, Splunk, Datadog, Grafana, or New Relic in hands-on scenarios.
- Demonstrate strong knowledge of observability tools, trends, and technologies.
- Identify gaps in SRE practices and implement scalable, effective solutions.
- Support cloud-based production environments, with a preference for Microsoft Azure.
- Write automation scripts proficiently, ideally using Python.
- Utilize cloud deployment tools like Ansible, Terraform, and Azure DevOps effectively.
- Work comfortably in containerized environments using Kubernetes and Docker.
- Apply configuration management tools such as Chef, Ansible, or AWS CodeDeploy.
- Troubleshoot complex issues independently and provide quick resolutions.
- Use and configure observability dashboards and manage end-to-end (E2E) monitoring requirements.
- Maintain expertise in cloud and automation tools (e.g., Azure, Python).
- Leverage Continuous Integration/Continuous Deployment (CI/CD) and Infrastructure as Code (IaC) tools like GitLab, Jenkins, Ansible, Terraform, and Azure DevOps.
- Exhibit soft skills including ownership, effective troubleshooting, and solid collaboration.
- Define and monitor Service Level Objectives (SLOs) and Service Level Agreements (SLAs).
- Participate in incident response efforts and conduct Root Cause Analysis (RCA) post-outages.
Skills And Experience
- Bachelor's degree in Computer Science, Information Science, Engineering, or a related field.
- 38 years of experience in a Site Reliability Engineering (SRE) or DevOps role.
- Monitor global e-commerce platforms to ensure optimal availability, performance, and efficiency while managing emergency responses.
- Promote observability best practices and drive operational excellence across systems.
- Build and maintain comprehensive observability dashboards with end-to-end monitoring.
- Design solutions and tools that enhance visibility for both internal teams and external stakeholders.
- Establish instrumentation standards and develop repeatable implementation patterns for engineering teams.
- Work closely with cross-functional teams to embed high-reliability practices into system design and operations.
- Apply SRE principles to improve overall system performance and reduce incidents.
- Automate incident response processes and coordinate outage preparedness across teams.
- Maintain error budgets, meet SLOs, and ensure consistent uptime of mission-critical services.
📌 Site Reliability Engineer
🏢 GSPANN Technologies
📍 Secunderabad
Impress this employer describing Your skills and abilities, fill out the form below and leave Your personal touch in the presentation letter.