SRE - Architect - [HT-812]

SRE - Architect - [HT-812]

26 May

26 May



Microsoft Consulting Services’ (MCS) mission is to lead and serve Microsoft’s customers and partners as they realize their full potential through Microsoft software and services. MCS and its partners bring the deepest expertise of Microsoft solutions—from enterprise planning to deployment and support—that helps our customers innovate to strategic advantage in their respective industries.

Within MCS, the Modern Delivery and Consulting Product Management (CPM) practice is a group of Product Owners, Product Managers, UX Designers, Tech Leads, Product Line Architects, and full-stack engineers that design, build, and operate cloud based products in collaboration with our customers. This team is outcome-focused, product-centric,

outside-in user experience driven, and squad based, meaning that squads and product units stay together across customer assignments. These high performing product squads and consulting product units have industry focus (for example Finance-Underwriting), or technology focus (for example medical imaging machine learning classification), or both.


Technical Knowledge and Domain-Specific Expertise

Demonstrates end-to-end expertise in distributed systems design, interactions between cloud technology layers and components, functions of physical network devices, and dependencies at scale. Drives efforts within an organization to identify and recommend optimal configurations of cloud technology solutions and develops or modifies the code base that defines infrastructures to improve the reliability and operability of supported products.

Develops end-to-end technical expertise in the architecture, code, features, and operations of specific products as required to implement improvements in product availability, reliability, efficiency, observability, and/or performance. Drives code/design reviews with the engineering teams that develop and/or manage those products and shares learnings and recommendations across engineering teams working on related products within their organization.

Researches and maintains deep knowledge of industry trends as well as advances in largescale distributed systems and cloud technologies; identifies opportunities to create, implement, and/or optimally utilize new tools, technologies, and/or processes to solve ambiguous problems and improve product availability, reliability, efficiency, observability, and/or performance. Drives the adoption of new solutions across engineering teams working with related products within an organization and provides guidance and coaching to others on relevant topics.

Contributions to Development and Design

Leverages technical expertise in the infrastructure of large scale distributed systems and specific products, as well as objective insights drawn from analyses of production telemetry data to advocate for, or directly contribute to, changes to the code base to improve the availability, reliability, efficiency, observability, and performance of related sets of products developed and supported by teams within an organization.

Develops, tests, and implements changes to optimize code and improve the observability, reliability and operability of platforms, systems, and products at scale. Reviews the effect of these changes to document and share development insights within their team.

Engages with product engineering teams within an organization by driving code/design reviews, hosting regular meetings, and participating in on-call rotations and incident responses throughout product development and operations cycles; leverages end-to-end technical expertise on underlying systems/platforms and insights from engagements with product engineering teams and telemetry analyses to propose scalable improvements in code and designs with attention to customer/business objectives and incident prevention.

Driving Operational Excellence

Develops code, scripts, systems, or platforms that automate moderately complex but repetitive operations processes (e.g., monitoring, alerting, deploying products and updates, debugging) at scale; reviews existing automation code and scripts to evaluate reusability, extendibility, and scalability within an organization.

Leverages end-to-end technical expertise and telemetry analysis to identify patterns and opportunities to implement configuration and data changes for related sets of platforms, systems, or products in production using code, tooling, and automation; identifies cases where teams lack the tools and/or capability to manage platforms, systems, or products using code and drives efforts within an organization to expand capabilities and/or tooling accordingly.

Leverages existing tools and automation to enable product engineering teams within their organization to increase the velocity in which they can reliably and safely implement changes in production; monitors the effects of changes across platforms or systems.

Analyzes data from telemetry pipelines and monitoring tools that detail operations metrics (e.g., availability, reliability, performance, efficiency) of systems, platforms, or products operating at scale. Contributes to the development of new tooling and/or predictive models to identify and test potential improvements in product development and/or operations, and monitors the impact of changes on operations metrics (e.g., Time-to-X) within an organization.

Identifies optimal uses for existing tools and/or models to identify contributing factors or points of failure that are affecting the availability, reliability, performance, and/or efficiency of systems, platforms, or products; proposes and implements solutions that resolve root cause(s) and prevent issues from occurring in related products by working with product engineering teams within an organization to test and deploy them to production.

Responds to incidents during regular on-call rotations by identifying the level of impact, troubleshooting complex issues, and deploying appropriate fixes to resolve root cause(s); alerts product teams, owners, and leadership to issues with major customer/business impact and escalates resolution of the highly complex, ambiguous, and impactful issues to include other engineering teams and/or subject matter experts as needed. Shares details related to incidents and their resolution through post-mortem reports and during regular review meetings.

Develops, maintains, and leverages capacity planning models and monitoring tools to forecast product capacity and resource demands; models the predicted effect of changes to capacity plans to optimize code bases to better manage resources in respond to dynamic capacity demands. May contribute to the development of automated resource utilization tools or processes that can dynamically scale compute resources up or down to adjust to capacity demands.

Draws insights from performance and resource monitoring across products within their organization to identify whether there is a need to optimize code, infrastructure, or architecture - or if changes to compute resources are required; uses advanced models to forecast and verify the efficacy of changes at scale and proposes solutions that are aligned with customer/business needs.

Shares insights and best practices that can be applied to improve development and operations across related sets of systems, platforms, and/or products. Continues to develop their understanding of insights and best practices through interactions with more experienced SREs and members of product engineering teams. Mentors and coaches more junior engineers to help them identify and propose relevant solutions.


10+ Years of experience in IT Industry.

Should be highly credible with demonstrable ability to solve business problems and able to explain the benefits to the customer ranging from Business value to mastering deep technical explanations if needed

Passion for customers, learning, having proven ability to be client focused, results-focused, proactive, collaborative, and confident under pressure

Experience in Team Leadership, working with project management/project stakeholders and pre-sales activities.

Strong time, project, and priority management skills

Demonstrable experience of working face to face at varying levels within organizations

Demonstrated ability to adapt to new technologies and learn quickly

Passion for mentoring, training peers and sharing knowledge

Great Communication skills, Presentation skills, Written skills, Customer/Partner relationships and expectations management.

Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request via the Accommodation request form.

Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.

The original job offer can be found in Kit Job:

Reply to this offer

Impress this employer describing Your skills and abilities, fill out the form below and leave Your personal touch in the presentation letter.

Subscribe to this job alert:
Enter Your E-mail address to receive the latest job offers for: sre - architect - [ht-812]
Publish a new Free Offer
Need to publish an offer? With more than 1 million unique users per month, you will find the ideal candidate for your company instantly, what are you waiting for!
Publish Now

Subscribe to this job alert