SOA Infrastructure Healthcheck 1 Smart SOA Requires IT

  • Slides: 32
Download presentation
SOA Infrastructure Healthcheck 1

SOA Infrastructure Healthcheck 1

Smart SOA Requires IT Fitness Along the Way SOA Health is Important for All

Smart SOA Requires IT Fitness Along the Way SOA Health is Important for All Smart SOA Approaches Foundational 2 Extend End-to-End Transform Adapt Dynamically

GTS offerings represent a set of Powerful SOA services SOA Strategy • • •

GTS offerings represent a set of Powerful SOA services SOA Strategy • • • SOA Business Architecture / Component Bus. Modeling SOA Vision / Roadmap SOA Business Value SOA Governance SOA Change Mgt Strategy SOA Diagnostic • • SOA Application & Services Healthcheck SOA Maturity Assessment SOA Business Process Review SOA Technical Review SOA Implementation Planning Business Process Management (BPM) Enabled by SOA Design, Development, and Integration Services SOA Management Services 3 GTS SOA Capabilities GBS SOA Capabilities IBM SOA Offerings • SOA Solution Implementation Planning • • Business Process Modeling & Simulation Business Process Governance Business Process Optimization Business Process Dashboard and Scorecard Design & Implementation • Custom Business Services Design, Development & Integration • Business Process Automation • Composite Business Services Design, Development & Integration • SOA Solution Management • Event & Service Monitoring • Security Management • Service Support • SOA Governance Life Cycle Management • Infrastructure strategy & planning for SOA • Service Management strategy and planning • Infrastructure Healthcheck Services supporting SOA • Infrastructure architecture & design for SOA • Design and Implementation Services for Web. Sphere Process Server • • • Connectivity & Reuse Web infrastructure optimization & virtualization Web application server Portal Infrastructure Testing Center of Excellence • • • Infrastructure Management and Governance Service Management design Service Management implementation Business of IT Dashboard Access Management, Identity Management

Ready to Take the Next Step? How is your SOA Health? Is your IT

Ready to Take the Next Step? How is your SOA Health? Is your IT fit enough to handle your SOA needs? Check Your SOA Health… § Is everything working as well on the inside as it appears to be on the outside? § Is your SOA ready to scale for enterprise wide demands? § Are you experiencing chronic, nagging issues? … and Avoid SOA Rescue Missions 4

Assessing SOA Health Pays Off Major Asian Government Organization Challenge § Determine if existing

Assessing SOA Health Pays Off Major Asian Government Organization Challenge § Determine if existing SOA can handle an expansion to include hundreds of federation partners 4 Simplify tax payment process for easier citizen use, faster processing, reduced “silo” hand-offs 4 Avoid massive investments in point-to-point integration projects Solution § Engaged IBM Global Services to assess the “health” of the existing environment and lay out a plan for success… which led to: 4 Designed and extended the SOA solution using open-standards based IBM technology 4 Deployed and tested the new SOA environment and now automatically handle millions of transactions per day 5 Benefits § Eased management of national tax collection and eliminated process delays § Deployed a system that the customer can easily extend with flexible SOA technology § Saved an estimated US$1 billion in IT costs with the SOA approach

Assessing SOA Health Pays Off IBM CIO Office… “The Offering Information Management System” Challenge

Assessing SOA Health Pays Off IBM CIO Office… “The Offering Information Management System” Challenge § Needed to determine if a “nearly ready to deploy” SOA solution can meet expected performance and stability requirement before the application is exposed to vital users 4 Base analysis on a deployment to Pre. Production site 4 Ensure user expectations are met on “Day One” 4 Ensure stated hardware requirements are reasonable Solution § Engaged IBM GTS to assess the “health” of the proposed SOA Solution/Infrastructure and ensure capacity would be adequate 4 Use PARA-medic to analyze the end-to-end transaction flow, execution time, server loads and user response 4 Correct code issues discovered during the Healthcheck, optimize cluster configuration, improve server configurations. 6 Benefits § Spotted difficult to find treading errors and server affinities limiting hardware effectiveness § Suggested changes which eliminated hardware upgrade § Reduced response time and raised maximum throughput capacity allowing growth path

Assessing SOA Health Pays Off A Major International Banking Firm Challenge § Assess health

Assessing SOA Health Pays Off A Major International Banking Firm Challenge § Assess health of a vital new SOA solution and help ensure banking operations will not impacted by the switch. 4 Will the solution be stable and robust 4 Does the existing environment have the capacity to deliver expected performance with the new middleware and service layers 4 Base the analysis on pre-deployment pilot data Solution § Engaged IBM to perform a comprehensive SOA Infrastructure Healthcheck Assessment 4 Use PARA-medic to analyze end-to-end transaction flows, execution times, server loads and user response impacts, error analysis, cluster configurations and network performance 4 Use Sonoma to forecast future performance, user experience and hardware requirements. 4 Fully Leverage IBM Methods and Tools 7 Benefits § Identified Server/Cluster Configuration Errors § Exposed code issues/errors impacting end-to-end response § Offered insight in to the endto-end transaction flows and identified bottlenecks § More to come….

Infrastructure SOA Health Infrastructure Flexibility Is your SOA meeting the demand spikes and configured

Infrastructure SOA Health Infrastructure Flexibility Is your SOA meeting the demand spikes and configured to easily handle changes? Middleware Is your SOA platform robust enough to handle the transaction volume with integrity? Service Management Do you have the visibility and control to manage your services? 8

Infrastructure Flexibility Symptoms of SOA Health Issues SOA helps enable innovation and rapid change,

Infrastructure Flexibility Symptoms of SOA Health Issues SOA helps enable innovation and rapid change, but… …are you experiencing: § Performance issues with demand peaks? Transactions are delayed, or lost Infrastructure operations becomes overwhelmed § Problems in tuning the system? When we deployed the SOA everything seemed stable, but now to keep up we’re having to make frequent changes § Issues seem to come up whenever you have to make changes to the system? Tracking the changes, and their impact can be tedious and labor intensive, and tasking our IT operations staff 9

Middleware Symptoms of SOA Health Issues SOA helps integrate processes across services and platforms,

Middleware Symptoms of SOA Health Issues SOA helps integrate processes across services and platforms, but … …what if: § You need your SOA to grow? Are you confident your current architecture and solution can scale 5 x or even 10 x what it is today? § You suffer from inconsistent communications? Response time is supposed to be a 1 to 2 seconds, but every now and then it’s anywhere from 6 to 30 seconds? § You’re asked to guarantee connectivity? Each of the components are tested and work well, but you still suffer performance from end to end? 10 Picture of struggling football sports picture Goes Here

Hidden Symptoms of SOA Health Issues SOA helps enable end to end transaction flow,

Hidden Symptoms of SOA Health Issues SOA helps enable end to end transaction flow, however … …do your SOA solutions: § Perform smoothly, and efficiently? Or do some parts of the transaction struggle to provide designed functionality and have intermittent or spotty performance § Effectively handle periodic surges in user activity seamlessly? Or do you constantly operate in a reactive mode just to keep things running smoothly § Provide a resilient infrastructure that maintains availability? Or do you experience outages at the weirdest times, and your not sure why Have trouble meeting SLA’s 11

New from IBM Global Technology Services IBM Infrastructure Healthcheck Services supporting SOA Infrastructure Healthcheck

New from IBM Global Technology Services IBM Infrastructure Healthcheck Services supporting SOA Infrastructure Healthcheck Workshop for SOA Key Specialized Diagnostics: § Infrastructure Readiness Assessment for SOA § Healthcheck Services for Middleware, Application and Service Layers § Healthcheck Services for Web Portals § Service Management Workshop for SOA § Predictive Performance and Capacity Capability 12

IBM SOA Infrastructure Consulting Services – Infrastructure Healthcheck Workshop for SOA Identifying opportunities to

IBM SOA Infrastructure Consulting Services – Infrastructure Healthcheck Workshop for SOA Identifying opportunities to improve SOA infrastructure performance § Leverages awarded IBM Research tools to diagnose the health of your SOA infrastructure Offers a workshop-based approach designed to help § Conducts collaborative sessions to assess your identify areas where known infrastructure environment, performance & utilization remedies can be applied or § Helps you understand how enhancement of your where further in-depth analysis infrastructure environment and capabilities can improve service quality in your unique SOA is needed environment Benefits § Recommends prioritized actions for healthy SOA infrastructure environment § Improves IT position as an enabler to drive the business benefits of SOA adoption § Shows opportunities to optimize the cost of SOA service delivery while maintaining service levels § Supports faster project execution 13

IBM SOA Infrastructure Consulting Services – Infrastructure Architecture Workshop for SOA Identifying infrastructure requirements

IBM SOA Infrastructure Consulting Services – Infrastructure Architecture Workshop for SOA Identifying infrastructure requirements to ensure SOA performance § Leverages awarded IBM Research tools to predict the needs/performance of your SOA infrastructure Offers a workshop-based approach designed to help § Conducts collaborative sessions to explore options identify areas where known for establishing an infrastructure to meet your performance and utilization targets remedies can be applied or where further in-depth analysis§ Helps you understand impact of enhancements to your infrastructure environment on delivery of is needed service levels within your SOA environment Benefits § Recommends a “blueprint” for a healthy SOA infrastructure and hosting environment § Improves IT position as an enabler to drive the business benefits of SOA adoption § Allows exploration of opportunities to optimize the cost of SOA service delivery while maintaining service levels § Supports faster project execution 14

IBM SOA Infrastructure Consulting Services – Infrastructure Readiness for SOA Evaluate your IT foundation

IBM SOA Infrastructure Consulting Services – Infrastructure Readiness for SOA Evaluate your IT foundation for an SOA implementation Assesses the capabilities of your organization’s IT infrastructure, processes and technology to determine the right plan for supporting a service-oriented architecture (SOA) § Performs interviews and workshops and reviews data and documents to evaluate your organization’s readiness for change § Develops a case for infrastructure change to support your SOA adoption § Assists you in transforming your IT to be more responsive, flexible and service driven § Provides an IT solution strategy report and transition recommendations Benefits § Starts you on a well-defined road toward infrastructure transition in order to support an SOA implementation § Builds the case for infrastructure change § Supports faster project execution § Shows opportunities to optimize the cost of SOA service delivery while maintaining service levels 15

IBM SOA Infrastructure Consulting Services – Key SOA Healthcheck Tools § PARA-medic Analysis Tool

IBM SOA Infrastructure Consulting Services – Key SOA Healthcheck Tools § PARA-medic Analysis Tool Set - Is non-invasive and does not require installation on the target site § Uses a variety of application/system logs § May require increases in logging levels - - Allows an end-to-end view of transaction flows with timing and load details Can spot difficult to see issues such as § Server Configuration Errors § Clustering failures and affinity issues § Excessive I/O or CPU activity - - 16 Can relate errors to activities within a transaction flow Can be used for periodic checkups as changes occur § Sonoma Performance/Capacity Tool - - - Is non-invasive and does not require installation on the target site Allows modeling of application performance within a range of infrastructures and application attributes Models IBM and non-IBM hardware Allows use of IBM’s library of standard workflow scenarios to get early indications of performance Allows analysis based on test case/test system results Allows use of production data to analyze changes or expansion plans

Conducting a Healthcheck exposed a potential bottleneck Major Credit Management Firm Challenge § Unsure

Conducting a Healthcheck exposed a potential bottleneck Major Credit Management Firm Challenge § Unsure of the impact SOA would have on infrastructure and service management § Had already encountered issues with early deployment of Web Services Solution § Workshop-based Healthcheck of current capabilities and plans to support SOA § Identified gaps between current and required state § Built series of customized project roadmaps to transition their infrastructure 17 Benefits § Resolved decision making over required infrastructure components § Identified changes required to organization, governance, and processes § Provided first time forum for SOA discussion across IT organization

How do I pick the right path for my SOA Healthcheck? START * Infrastructure

How do I pick the right path for my SOA Healthcheck? START * Infrastructure Healthcheck Workshop for SOA UNSURE Has an existing SOA Project Deployed? NO * Infrastructure Architecture Workshop for SOA YES Existing SOA Project Healthy? YES Planning Increase in Scope? YES * Specialized Diagnostics YES NO Know Where Problem Is? NO * New and/or enhanced service as part of the SOA Fall Launch 18

Recent Real World Results Healthcheck details from a complex engagement 19

Recent Real World Results Healthcheck details from a complex engagement 19

Project followed a Compressed Timetable Week 1 Week 2 Week 3 Week 4 Week

Project followed a Compressed Timetable Week 1 Week 2 Week 3 Week 4 Week 5 Activities: Activities: § Project Initiation and Kick-Off § Data Collection for Sonoma and PARA-medic tools § 2 nd Sonoma Workshop § Analyze SOA Pilot Data using PARAmedic and Sonoma § PARA-medic Analysis § Infrastructure and Service / Application Discovery § Conduct 1 st Sonoma Workshop § Define Data Collection Criteria and Timeframe § Review Results of Sonoma Workshop § Explore / Define open issues and project status § Inspect Initial PARA-medic data collection effort § Refine collection of data for PARAmedic tool § Expand understanding via Continued Discovery § Collect Pilot Data From Prod/Pilot Project Management 20 § 1 st Draft of Recommendation and Findings § Present Interim Status Report § Collect Final Pilot Data § Refine Sonoma Data with PARAmedic Results § Complete Final Report § Deliver Final Report with Feedback

Sampling of PARA-Medic Findings § Both IFE and Middleware Appear to Scale Well and

Sampling of PARA-Medic Findings § Both IFE and Middleware Appear to Scale Well and Should Have No Issues for Handling Rapidly Growing Usage as Deployment Continues to the planned objective, providing adequate server capacity is in place. § Both IFE JVM and ESB JVM exhibit long response times for a small percentage (5%) of transactions – Some responses are as long as 3 minutes, indicating a timeout situation. § Both IFE LPARs and Middleware LPARs have unmonitored background activities, which consume a significant amount of resources (10% to 20% of CPU capacity). Since the source of this background activity is still unknown after investigation by the operations team, and occurs frequently, this raises some concern. § HTTP++ Edge server unnecessarily adds further load balancing between the IFE and Middleware. There is no benefit to the configuration while it increases edge server overhead and slows response time. § The IFE/Controller_Global_Id_Servlet is the most frequently executed component, and may offer potential for optimization. § The ‘shared_service’ component has long service time (up to 2. 5 mins) and is very CPU intensive, impacting ESB processing time and throughput. We project this will be of significant concern as users are added. § HTTP++ on both IFE and Middleware LPARs experience “HTTP Internal Error” (HTTP status code 500). This is especially frequent on Middleware LPARs (occurring in 4. 5% of HTTP requests) § Middleware LPAR node 1 and node 2 have different committed numbers of CPU, yet load balancing is configured for round-robin. 21

Specific Capacity Findings for IFE -- Assumptions § Two components of the Interstage Frontend

Specific Capacity Findings for IFE -- Assumptions § Two components of the Interstage Frontend Application were analyzed - The IFE Web Frontend § Operating as a Web. Service § Running Web. Sphere Application Server and IHS - The IFE Enterprise Service Bus § Performing Mediation and Connection Services § Layer Running Web. Sphere ESB under Web. Sphere Process Server § Analysis made use of IBM’s Sonoma Capacity Analysis Tool, IBM Healthcheck Methodologies and PARA-medic Inputs for CPU Service Time - - Custom Simulation Models Were Created for the IFE application based on pilot results while running on future production hardware. Two Base Models were Created – § A Model for IFE Frontend § A Model for Middleware (WESB Mediations) § Several Iterations Of the Model Were Built and Refined From Validation Tests and PARA-medic Inputs, Resulting in a “Gold” model configuration 22

Overview of Findings, IFE Frontend Service – Recommend Server Config #1 of 5 §

Overview of Findings, IFE Frontend Service – Recommend Server Config #1 of 5 § Sonoma Analysis for IFE Frontend : - Server Configuration: P 5 -590 2100 (2. 1 Ghz CPUs) - 2 Service Nodes, with 6 CPUs Per Node. Base Plus 10 % Contingency Base Arrival Rate (user/sec) 43. 6 47. 9 Response Time (sec) 0. 390 0. 365 Concurrent Users 17. 0 18. 0 Requests Per Sec 43. 6 47. 9 Effective Teller/User Number 614 675 Processor Utilization 70. 96% 70. 87% § Supports Full IFE Rollout to 180 Service Locations § This Configuration Will Support Projected Heavy Usage (Friday Lunch Time) During Normal Activities § Will Not Support Extraordinary Usage Periods Such as Major Holidays without Performance Impacts – See Recommendation #2. 23

Status for Log-Driven Healthcheck Using PARA-medic Healthcheck Scope for IFE in the Production Environment

Status for Log-Driven Healthcheck Using PARA-medic Healthcheck Scope for IFE in the Production Environment with end points for logging request, execution path, performance, and resource utilization data. Site 1 4 6 IFE-LPAR_1 HTTP++ IFE Client Requests 6 1 4 2 MW-LPAR_1 3 ESB-JVM HTTP++ IFE-JVM SS-JVM WS 7 Site 2 IBM Edge Server 2 5 3 MQ DB 2 MS-SQL WS 6 HTTP++ 4 IFE-JVM IFE-LPAR_2 ESB-JVM 6 HTTP++ 1 SS-JVM 4 Site 2 24 MQ DB 2 MS-SQL Note: MW-LPAR (Middleware LPAR), SS-JVM (Shared Service JVM) MW-LPAR_2 7

Logging Configuration for IFE & Middleware Servers 1. 2. 3. 4. 5. 6. 7.

Logging Configuration for IFE & Middleware Servers 1. 2. 3. 4. 5. 6. 7. 25 [XXXX] IFE log with “global transaction ids” § End point: each BFE-JVM per BFE-LPAR § BFE-Client log with global transaction ids § BFE-WS log with global transaction ids [XXXX] JAX RPC log (in custom format) with XXXX “global transaction ids” § End point: each MW-JVM per MW-LPAR § XXXX-”JAX RPC” log with global transaction ids (“accounting log”) [Wily] Invocation log for WS, SCA, MQ, SQL, DB 2 invocations) § End point: each MW-JVM per MW-LPAR § Invocation logs for WS, SCA components, MQ, Databases (DB 2 & MS SQL) [AIX] VMSTAT & IOSTAT § End point: each IFE & MW LPARs § VMSTAT for CPU & memory (vmstat –t 2 1800”) § IOSTAT for disk (command: (“iostat -D -T 2”) [Edge Server] HTTP invocation logs § End point: : Each Edge Server LPAR § Traffic logs for IFE and MW applications § VMSTAT for CPU & memory (vmstat –t 2 1800”) [HTTP Traffic] IBM HTTP Server & Plugin log files § End Point: each IFE & MW LPARs § HTTP Server log name: access_log § HTTP Server Plugin log name: http_plugin. log [Shared Services] Shared invocation logs § End Point: Each MW LPARs § Shared Services Accounting statistics log

IFE LPAR 1 and IFE LPAR 2 CPU Utilization: 8 AM-5 PM abnormal 26

IFE LPAR 1 and IFE LPAR 2 CPU Utilization: 8 AM-5 PM abnormal 26

IFE LPAR 1 and IFE LPAR 2 CPU Utilization: 12: 30 PM-12: 50 PM

IFE LPAR 1 and IFE LPAR 2 CPU Utilization: 12: 30 PM-12: 50 PM § IFE LPAR 1 has significant background activities between 12: 30 PM-12: 50 PM § IFE 1 had a much higher CPU utilization than IFE 2, while IFE 1 actually processed less transactions §PARA-medic estimates that IFE 1’s background CPU utilization is higher than 10% during this period of time §Qustions: what actually happened in the background? §Recommendation: use ‘topas’ to continuously monitor and log all running processes and their CPU consumption, which helps identify heavy-weight “background” processes when unusual things like this happen 27

Response Time of Individual Transactions for IFE JVM on LPAR 1 between 9 AM-5

Response Time of Individual Transactions for IFE JVM on LPAR 1 between 9 AM-5 PM § Extremely long response time of some transactions (>3 minutes) - The slowest transactions encountered exceptions in the code Recommendation: change code to avoid these exceptions, at high levels of utilization this will become bottleneck with significant impact on performance abnormal 28

Cumulative Distribution of Transaction Response Time for IFE JVM on XXX between 9 AM-5

Cumulative Distribution of Transaction Response Time for IFE JVM on XXX between 9 AM-5 PM § 91% of transactions finished within one second § 1. 6% of transactions took longer than 5 seconds to finish § 0. 3% of transactions took longer than 60 seconds to finish, which is not caused by slow Middleware LPAR 29

Questions? 30

Questions? 30

Trust IBM to Help you Understand your SOA Health Lessons Learned Help IBM Understand

Trust IBM to Help you Understand your SOA Health Lessons Learned Help IBM Understand Clients’ SOA Health Cross-IBM global deep dive analysis of 200 SOA deployment experiences: § 750 Lessons Learned § 650 Best Practices Regardless of Where You Are in the SOA Continuum Foundational Extend End-to-End Transform Adapt Dynamically Leverage IBM Experience based on 5700 customers across Smart SOA* 31 *# of Customers using our SOA offerings

Thank you! More information: www. ibm. com/soa 32

Thank you! More information: www. ibm. com/soa 32