Resources aka msOPS 20 repo aka msmymsignitethetour T

  • Slides: 47
Download presentation

Resources aka. ms/OPS 20 repo aka. ms/mymsignitethetour

Resources aka. ms/OPS 20 repo aka. ms/mymsignitethetour

T. T. R. Time to recover /remediate /restore

T. T. R. Time to recover /remediate /restore

Elite/high 123 Less than one hour 2019 State of Dev. Ops

Elite/high 123 Less than one hour 2019 State of Dev. Ops

Elite/high Medium 123 Less than one hour Less than one day 2019 State of

Elite/high Medium 123 Less than one hour Less than one day 2019 State of Dev. Ops

Elite/high Medium Low 123 Less than one hour Less than one day Between one

Elite/high Medium Low 123 Less than one hour Less than one day Between one week and one month 2019 State of Dev. Ops

Elite performers Recover from incidents 2, 604 X faster Deploying to production 208 X

Elite performers Recover from incidents 2, 604 X faster Deploying to production 208 X more often 2019 State of Dev. Ops

Foundations The Dickerson Hierarchy of Reliability

Foundations The Dickerson Hierarchy of Reliability

UX Dev Capacity/scale Testing/release Post-incident review Incident response Monitoring

UX Dev Capacity/scale Testing/release Post-incident review Incident response Monitoring

Responding to incidents

Responding to incidents

Service disruptions Feared/avoided Incidents Subjective Unplanned investments work

Service disruptions Feared/avoided Incidents Subjective Unplanned investments work

Incidents are the pulse of your systems

Incidents are the pulse of your systems

Detection Readiness Response Lifecycle of an incident Analysis Remediation

Detection Readiness Response Lifecycle of an incident Analysis Remediation

Tailwind traders challenges Increased disruptions and no method to track and respond Everything is

Tailwind traders challenges Increased disruptions and no method to track and respond Everything is ad-hoc and reactionary Information and status is difficult to find Time to resolution is terrible and getting worse Reoccurrence of problems and mistakes

123 Foundations Rosters Roles Incident response Rotations

123 Foundations Rosters Roles Incident response Rotations

Rosters Roles Rotations

Rosters Roles Rotations

Rosters Roles Rotations

Rosters Roles Rotations

Rosters Roles Rotations Scheduled shifts Engineers take turns being “on-call” for their recurring rotation(s)

Rosters Roles Rotations Scheduled shifts Engineers take turns being “on-call” for their recurring rotation(s) Types • 24 x 7 • Follow the sun • Custom (weekends)

Key takeaway Respond with urgency, rather than react

Key takeaway Respond with urgency, rather than react

Incident tracking

Incident tracking

Unique channel for communications Conversation bridge Incident-related only

Unique channel for communications Conversation bridge Incident-related only

Codeless automation with logic apps Response improvements Create and track issues Create and track

Codeless automation with logic apps Response improvements Create and track issues Create and track efforts Lookup on-call and guides

Logic app connectors Azure Boards Create and track issues Azure Storage Create and track

Logic app connectors Azure Boards Create and track issues Azure Storage Create and track efforts Microsoft Teams Lookup on-call and guides

Azure Boards Azure Storage Microsoft Teams Create issue Assign engineer Set state Update details

Azure Boards Azure Storage Microsoft Teams Create issue Assign engineer Set state Update details

Azure Boards Azure Storage Microsoft Teams Lookup on-call Lookup workbook

Azure Boards Azure Storage Microsoft Teams Lookup on-call Lookup workbook

Create channel Azure Boards Azure Storage Microsoft Teams Post details

Create channel Azure Boards Azure Storage Microsoft Teams Post details

Creating an incident response plan

Creating an incident response plan

Key takeaway Prioritize for clear communication

Key takeaway Prioritize for clear communication

Detection Readiness Response Lifecycle of an incident Analysis Remediation

Detection Readiness Response Lifecycle of an incident Analysis Remediation

Remediation improvements Troubleshooting guides Context and guidance Chat. Ops Update stakeholders

Remediation improvements Troubleshooting guides Context and guidance Chat. Ops Update stakeholders

Context and guidance Troubleshooting guides

Context and guidance Troubleshooting guides

Update stakeholders

Update stakeholders

Chat. Ops tools + chat

Chat. Ops tools + chat

Collaboration Sharing of domain knowledge Visibility and awareness Learning Empathy Chat. Ops

Collaboration Sharing of domain knowledge Visibility and awareness Learning Empathy Chat. Ops

Chat. Ops Microsoft Teams Outgoing webhook

Chat. Ops Microsoft Teams Outgoing webhook

Chat. Ops Microsoft Teams Azure Function Node. js

Chat. Ops Microsoft Teams Azure Function Node. js

Chat. Ops Microsoft Teams Azure Function Azure Storage Static HTML

Chat. Ops Microsoft Teams Azure Function Azure Storage Static HTML

Troubleshooting guides and status page

Troubleshooting guides and status page

Remediation of shopping cart

Remediation of shopping cart

Key takeaway Make information and resources accessible

Key takeaway Make information and resources accessible

Respond with urgency, rather than react Prioritize for clear communication Make information and resources

Respond with urgency, rather than react Prioritize for clear communication Make information and resources accessible

/Docs alert aka. ms/OPS 20 Logic. Apps

/Docs alert aka. ms/OPS 20 Logic. Apps

/MS Learn alert Complete interactive learning exercises, watch videos, and practice and apply your

/MS Learn alert Complete interactive learning exercises, watch videos, and practice and apply your new skills. aka. ms/OPS 20 MSLearn. Collection

Resources aka. ms/OPS 20 repo aka. ms/mymsignitethetour

Resources aka. ms/OPS 20 repo aka. ms/mymsignitethetour

/Upcoming session alert Time Room/location

/Upcoming session alert Time Room/location