Jobs at KP Recruiting Group

View all jobs



Our IT Operations team is looking for an outstanding Site Reliability Engineer (SRE). The SRE will serve as a highly specialized senior level technical application lead focusing on operational stability by driving IT operations readiness through the continuous improvement in our products. This role will involve working closely with development teams and business partners, coaching junior operations teams, and implementing enhanced monitoring and alerting capabilities for our distributed platforms. Additionally, the SRE will aid in the development of automation to reduce MTTR and manual tasks. The ideal talent will have experience driving large scale development efforts in an agile environment as well as a thorough understanding of DevOps practices with a focus on managing production environments. We are looking for a high energy, teammate with an innovative mindset interested in joining a group of IT professionals dedicated to enhancing IT operations. This position will report to a Director of IT Operations. Passion for technology and problem solving are a must have.

The Work Itself

  • Collaborates with Agile squads/developers, sustain and business partners and provides significant contributions to develop specifications to resolve problems, and to address enhancement needs focusing in areas of logging, monitoring and metrics for operational readiness.
  • Uses technical knowledge, creativity, and company practices and to drive down occurrences of incidents through development of proactive alerting and monitoring.
  • Provide continuous feedback to development teams on system stability, defect analysis and system enhancements.
  • Serves as a mentor to lower level developers and IT operations teams
  • Participate in technical discussions with the development team for deployment and code reviews
  • Drive knowledge transition from development to sustain team for each functional deployment
  • Work with IT business and development partners to gather inputs to develop new capabilities in displaying/monitoring/alerting on key performance indicators (KPIs) by tracking business transactions (BT) in real-time
  • Partner with application owners to develop creative and effective solutions to mitigate risk and successfully remediate any audit issues
  • Lead RCA and SWAT investigations for the IT Operations team
  • Plan for validation and verification of changes deployed by infrastructure teams, development teams and sustain team
  • Facilitate day to day execution of real time L2 technical support and troubleshooting
  • Attend CAB Meetings and approve changes
  • Support business continuity and disaster recovery activities
  • Lead maintenance of master documents i.e. Runbook, Playbook and help maintain accurate application documentation
  • Provides guidance in resolving performance related issues and designing solutions for any technical issues faced by the application
  • Review and accept the technical documentation

The Skills You Bring

  • Holds BS (preferably MS) in Computer Science or related field preferred
  • 5 years of experience in a similar sustain role and extensive knowledge of associated processes
  • Shows deep knowledge and understanding of enterprise-scale platforms and architectures
  • Possesses strong analytical, problem-solving skills and exhibits strong leadership skills
  • Experience with Co-ordination between upstream applications to resolve incidents
  • Grasps new technologies and can adapt to rapid shifts in priorities
  • Experience with implementing sustainable, audit-ready processes to support IT controls such as executing deployment, access management, audits, incident management, change management, etc.
  • Applied experience with as many of the following as possible: Unix and Windows platforms, Java EE, JavaScript, Spring, Spring Boot, REST API/Micro Services, Jenkins, Shell Scripting, PL/SQL and databases, specifically Oracle
  • AWS/Cloud experience preferred
  • Experience in development of automation with tools such as Ansible
  • Experience with Splunk, AppDynamics or other similar monitoring tools preferred
  • Correlate environment conditions and metrics to application events
  • Experience debugging problems in a distributed system
  • Experience with source control management and build tools including SVN
Very competitive salary and benefits!
Great company to work for!
Full relocation package is available!
Don't miss out! Apply now and we'll be in touch immediately with more specific details, salary information and to answer any questions!!

This position does NOT provide sponsorship so please do NOT apply if you require sponsorship.  Thank You

KP Recruiting Group 
"Bringing Talent to the Marketplace"

KP Recruiting Group is a well established and very respected recruiting firm.  We have built a strong reputation as a premier resource for providing highly qualified candidates for our clients.  We are very experienced in many industries and have a wide range of clients.  We will serve as your advocate during your career search!  Let us do the work for you!  There is never a fee for our services!


Share This Job

Powered by