new Service Reliability Engineer

Date Posted: 8 days ago

Job Description

With us, you’ll do meaningful work from Day 1. Our collaborative culture is built on three core behaviors: We Play to Win, We Get Better Every Day & We Succeed Together. And we mean it — we want you to grow and make a difference at one of the world's leading digital banking and payments companies. We value what makes you unique so that you have an opportunity to shine. Come build your future, while being the reason millions of people find a brighter financial future with Discover. Job Description Discover is the fastest growing global payments network and we are looking to hire 100 skilled technology professionals to join our growing UK office. We are carrying out cutting-edge work in the areas of cloud, DevOps, agile and automation. Our digital-first mindset and belief in empowering talented people will provide you with a platform to make a difference. As part of our recruiting efforts, we are looking to hire a number of / SRE/ Cloud Operations Engineers into our UK Technology hub. We will also support remote working. Cloud Operations Engineers are a hybrid of systems and software engineers who are responsible for scaling, automation, and production issue support for applications. SRE’s have an intense passion for finding and improving efficiencies with infrastructure, development and deployment automation. As a Cloud Operations Engineer, you will lead the efforts of application deployment, reliability, scalability, availability and performance alongside the engineering and infrastructure teams. Cloud Operations Engineers will work closely with our Software Development & Engineering teams to build mature, production-ready services and applications. As part of the team, you will help define our standards for monitoring, alerting, scalability, and production-readiness. You will monitor and report on the uptime of our systems and services, the performance of our applications, and the capacity of our platform. You will be empowered (yes, empowered) to apply engineering techniques and discipline to production operations and help us deliver the world’s greatest solutions. You will provide feedback into the architecture and application design for each next generation of Payment Services development. If you are the type of person that loves driving technology problem solving sessions; has a tireless passion to increase the performance, resiliency and availability of IT solutions serving the greatest Customers and Partners in the World; we believe our Cloud Operations Engineer opportunity will allow you to be the superstar of all superstars! What You’ll Do Handle responsibilities for operational stability and performance of one or more critical business services used by Discover customers and employees. Enhance and Maintain complex software components and distributed systems. Monitor, Alert, Analyze and Troubleshoot large scale distributed systems Define and drive adoption of a best in class monitoring frameworks to accomplish end to end application or service monitoring. Work with clustering technologies – High Availability, Resiliency, Reliability and Scaling. Monitor and report on SLA/SLO for a given applications services Develop & Maintain Dashboards (ELK) – Business and Operational to establish key performance indicators & trends Understand the defining and execution of High Availability, Disaster Recovery, Sustained Resiliency, Chaos Engineering tests Lead and participate in Non-Functional Testing(performance& resilience), identifies the bottlenecks, opportunities for optimization and capacity demands Leverage DevOps skills and methodologies – Create and manage a continuous build, integration, test, and deployment systems. Control application code deployment servers and code deployment methods Control application log collection and analysis – Automate processes and systems configuration/deployment Partner with security engineers and developing plans and automation to aggressively and safely respond to new risks and vulnerabilities Design and architect operational solutions for managing applications and infrastructure, with the specific goal of increasing the automation, repeatability, and consistency of operational tasks Own Release & Change Management – Includes CAB Representation and Implementation of change and software releases Partner & Train the L1 & L1.5 Teams – Including creation and/or enhancement of SOPs, Knowledge Articles etc Leverage one or more general purpose programming languages: Python, Go, shell scripting (Unix/Linux), Java Analyze and participate in periodic on-call duties to prevent, solve and automate the response to problems on mission critical services How You’ll Do It Operational stability and performance Work with other members of their assigned value stream to ensure that in-scope applications/platforms are meeting performance and stability requirements. This includes managing major incidents to mitigation/resolution. Problem management: Perform post-incident reviews of all major incidents and determine action items required to avoid similar issues/minimize downtime for future incidents. Monitors and metrics: Work with Application Development to ensure that assigned applications/platforms have appropriate monitoring and metrics in place to appropriately measure performance and stability. Identify functional and non-functional improvements: Act as the Operations representative in value stream planning and prioritize sessions to ensure that operational needs of assigned applications/platforms are addressed as needed. Hold quarterly operational performance reviews with value stream management. Release planning and coordination: Work with other members of his/her assigned value stream to ensure that the production releases for their in scope applications/platforms are properly planned and coordinated. This includes Holds Change/Release implementation reviews to ensure thorough and appropriate implementation plans. Review and sign-off/approval of change tickets for the assigned value stream Represent the value stream at Change Advisory Board Meetings. Participate in Program Increment Planning Sessions as a liaison for Operations and Infrastructure support. Provide information regarding upcoming critical changes to the value stream. Operational readiness: Ensure that applications/platforms in the value stream are operationally ready for production. This includes Annual Review of all SOPs/knowledge articles. Monitor review for any new feature launch or other significant change that may impact monitoring. Review SOP/knowledge article for any new feature launch or other significant change that may impact support documentation. Train Command Center and Application 1st level Support on new SOPs, knowledge articles, and any other support-related needs. Perform monthly capacity analysis of applications/platforms within the value stream. Create and maintain operationally focused ELK dashboards for the value stream. Qualifications You’ll Need The Basics Bachelor's degree in business, computer information systems, computer science, MIS, engineering, science, or related field Experience in Site or Service Reliability Engineering, DevOps or similar within information technology, or related field Bonus Points If You Have 4+ years of experience in technology, or related field 5+ years of coding experience using strongly typed language Java 3+ years of experience in SRE, DevOps, or similar role 2+ years of experience with scripting languages like Python / Bash 2+ years of experience working with Cloud technologies – preferably AWS Automation tools experience such as Chef, Puppet, Ansible. Developing monitoring tools and log analysis tools to manage operations Familiar with design principles of monitoring and alerting systems Understanding of Networking concepts and experience with HTTP protocol What are you waiting for? Apply today! The same way we treat our employees is how we treat all applicants – with respect. Discover Financial Services is an equal opportunity employer (EEO is the law). We thrive on diversity & inclusion. You will be treated fairly throughout our recruiting process and without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or veteran status in consideration for a career at Discover.

Work Perks:

  • 24-Hour Nurse Hotline & Telehealth Services
  • 7 Paid Holidays
  • Adoption Assistance
  • Annual Flu Shots
  • Commuter Benefits
  • Employee Assistance Program
  • Flexible Work Environment
  • Group Auto, Home and Pet Insurance
  • Healthy Eating Program
  • Legal Assistance Plan
  • Mother’s Rooms
  • Onsite Emotional Health Counselors
  • Onsite Fitness Centers
  • Onsite Weight Watchers at Work
  • Paid Parental Leave
  • Professional and Leadership Development Programs
  • Recognition Program
  • Service Anniversary Awards
  • Tuition Reimbursement


  • Annual Health Evaluation and Health Coaching
  • Critical Illness Insurance
  • Health Savings Account, Health Reimbursement Account and Flexible Spending Accounts
  • Health, Vision and Dental Insurance
  • Life and Accident Insurance
  • Long-term and Short-term Disability Insurance
  • Onsite Health Services Center with Nurse Practitioner

Financial Wellness:

  • 401(k) Savings Plan with Fixed and Matching Contributions
  • Employee Stock Purchase Plan
  • Financial Engines
  • “Financial Wellness for You” Learning Programs

Related Jobs

Job Detail

  • Job Id
  • Location
  • Company
  • Type
  • Employment Status
  • Positions
  • Career Level
  • Gender

Contact Discover Financial Services

Sponsored by connects jobseekers and recruiters by accurately matching candidate profiles to the relevant job openings through an advanced 2-way matching technology. While most job portals only focus on getting candidates the next job, Shine focuses on the entire career growth of candidates.

Latest Job