In this role you will help build and operate a fast-paced, always-on SaaS application hosted in the Azure environment. You can use all of your skills to help advance a cutting-edge platform leveraging AI to solve meaningful and impactful problems in healthcare. The NEXTSTEP team’s primary responsibility is to ensure product successfully delivers on Concord’s commitment to provide highly available services of exceptional quality to our clients and partners. We're looking for a SRE to help shape our application stack in the cloud by providing key insights related to the security, stability and supportability of the product, and concurrently assist in proposing, designing and building out our SRE resources, practices and systems.
What you will be doing
Scope & Complexity
As a Sr. DevOps Engineer/SRE, you will be responsible for solving complex problems defining, designing, deploying, and troubleshooting Concord’s Cloud platform. You are an expert at articulating technical characteristics of your services and the dependencies between services, and guide development teams to engineer and add features to the Concord’s Cloud platform.
In this role you are the ultimate authority and are accountable for the end-to-end performance and operability of the services you own. You will understand the end-to-end design, configuration, technical dependencies, and overall behavior of the production services you own. In partnership with the development team, you will share the responsibility of ensuring services are designed and delivered as mission critical with a focus on security, resiliency, scale, and performance.
You will be called upon during major incidents 24/7 as a key subject matter expert when the source of a problem is unknown or unclear
You will partner with the development and product management teams in defining and implementing improvements in service architecture, both current and future.
Manage the platform with reliability, scalability, resilience, performance, and security at the forefront of your approach.
You will understand and be able to communicate the scale, capacity, security, performance attributes and requirements of the services you own.
You are able to understand and communicate every characteristic of your service including:o degradation and behavior under load of the services and their dependencieso end-to-end tuning needs, optimizing resource utilization, as load patterns fluctuateo Instrumentation and metrics that clearly describe the service behaviorso scaling requirements and patternso resiliency and recoverability, ensuring that backup / restore and disaster recovery capabilities are implemented, tested and maintained
You will have a clear understanding of automation and orchestration principles, and will be eager to automate, wherever and whenever the possibility arises, while simultaneously eliminating technical debt.
After resolving an incident, you will investigate and document how to more quickly get to root cause and solve the problem next time.
• Our business is entrepreneurial, established and growing fast. As a result, the successful candidate will demonstrate a high degree of adaptability and a passion for working in a less formally structured environment
• Hands-on experience with cloud IaaS environments using IaaS in Azure
• Hands on implementation, maintenance, and troubleshooting of complex environments
• Linux/Windows system administration exposure
• Familiarity with optimizing and configuring cloud components in highly automated environments
• Development experience, preferably in .NET/C#, Java, Python, Node and or Angular.o Expertise with Angular a plus
• Strong background and understanding of application architectures, networking, security, reliability, resiliency, and scalability concepts
• Exposure to containerization tools such as Kubernetes, and Docker
• Azure DevOps or similar experience in build and deployment automation.
Qualifications and Abilities:
• Minimum 5 year of experience in SRE, DevOps or related roles
• Proven collaboration skills
• Demonstrable ability to apply creative thinking and problem solving
The employee will be required to travel as business dictates necessary, but will typically operate from an office environment