http aka msE 2013 Calc Web ServiceDefault Web
http: //aka. ms/E 2013 Calc
Web Service(Default Web Site)Current Connections
MSExchange Active Manager(_total)Database Mounted
http: //aka. ms/Ex. Online. Limits
Large Organization Configuration 36 Cores / 450 GB RAM per server Higher Mailbox Density Deployed Exchange 2013 in All-In-One configuration Hardware NLB configured for ‘Least Connections’ What Happened? Policy change required removal of local storage of email Outlook now required to run in “Online Mode” Impact Increased in network traffic Users frequently disconnected during peak periods ~2 weeks to isolate problem ~2 weeks to get remediation changes in place
Network Load Balancer 40 k users 3 4 1 5 6 2 Exchange. cohovineyard. com Exchange 2013 All-in-One 7 13 19 25 31 40 8 14 20 26 32 41 9 15 21 27 42 Virtual IP 28 10 16 22 43 11 44 17 23 29 45 12 18 24 30
Network Load Balancer 40 k users 47 49 46 48 Exchange. cohovineyard. com Exchange 2013 All-in-One 1 7 13 19 25 31 40 2 8 14 20 26 32 41 3 9 15 21 27 42 54 4 10 50 52 55 5 44 51 53 56 57 58 59 60 61 62 63 Virtual IP !
Hardware NLB 40 k users 3 4 1 5 2 23 Exchange. cohovineyard. com Exchange 2013 All-in-One 6 11 16 21 29 7 12 17 22 30 8 13 18 24 31 9 14 19 25 32 10 15 20 26 33 27 28 34 35 36 Virtual IP
Lookup Active Mailbox Location IIS Rpc. Http. Proxy IIS RPC Client Access Rpc. Http Store Worker /RPC Port 443 57 Port 444 Port 6001 MBx. DB https. sys MSExchange. Rpc. Proxy. Front. End. App. Pool (W 3 WP) https. sys MSExchange. Rpc. Proxy. App. Pool (W 3 WP) M. E. Rpc. Client. Access M. E. Store. Worker
Max 65535 Requests Connection Manager /RPC: 44357 Request Router /RPC: 443 W 3 WP Queue 58 64 59 65 66 60 67 61 Managed Availability /RPC: 444 68 62 69 63 IIS /RPC: 444 W 3 WP Queue Thread Thread System. Web Buffer Buffer Buffer Buffer Buffer MSExchange. Rpc. Proxy. Front. End. App. Pool (W 3 WP)
inetpublogsLog. FilesW 3 SVC 1u_ex. XXXXXX. log date time s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs(User-Agent) 2014 -0721 07: 59: 44 192. 168. 1. 1 RPC_IN_DATA /rpc/ rpcpr oxy. d ll 8416409 b-081 e-4 fe 892007 e 54 d 8874 d 7 c@cohov ineyard. com: 6001&R equest. Id=fc 60 c 1759 c 77 -47 d 0 -b 435 ae 3 d 04 acea 1 b 443 COHOVI NEYARD SM_4 f 3083 c 2 bd 6 a 40 d 8 b 192. 168. MSRPC 1. 5 cs(Referer) sc-status scsubstatus sc-win 32 -status time-taken - 200 0 64 29513 inetpublogsLog. FilesW 3 SVC 1httperr. XXXXX. log date time c-ip c-port s-ip s-port Cs-version Cs-method Cs-uri Scstatus S-siteid S-reason S-queuename 201407 -21 07: 5 9: 44 192. 16 8. 1. 5 160 45 192. 16 8. 1. 1 44 4 HTTP /1. 1 RPC_IN _DATA /rpcproxy. dll? COHOEXCH. cohoviney ard. com: 6001 400 2 Connection_Dropped MSExchange. Rpc. Pro xy. App. Pool 201407 -21 07: 5 9: 44 192. 16 8. 1. 5 160 45 192. 16 8. 1. 1 44 3 HTTP /1. 1 RPC_IN _DATA /rpcproxy. dll? 8416409 b 081 e-4 fe 8 -9200 7 e 54 d 8874 d 7 c@C OHOEXCH. cohoviney ard. com: 6001 - 1 Connection_Dropped_List_Full MSExchange. Rpc. Pro xy. App. Pool IIS indicating it cannot hand off connection because queue is full
IIS Location File Names Perfmon Counter Rpc. Http. Proxy IIS Rpc. Http RPC Client Access inetpub logs Log. Files W 3 SVC 1 Logging Rpc. Http W 3 SVC 1 Logging Http. Proxy Rpc. Http Inetpub logs Log. Files W 3 SVC 2 Logging Rpc. Http W 3 SVC 2 Logging RPC Client Access u_ex. XXXXXX. log httperr. XXXXX. log Rpc. Http. XXXXX. log Http. Proxy. XXXXXX-X. log u_ex. XXXXXX. log httperr. XXXXX. log Rpc. Http. XXXXX. log RCA_XXXXXX. log Web Service(Default Web Site) Current Connections RPC/HTTP Proxy Current Number of Incoming RPC over HTTP Connections MSExchange Http. Proxy Accepted Connection Count Web Service(Exchange Back End) Current Connections RPC/HTTP Proxy Current Number of Incoming RPC over HTTP Connections MSExchange RPC Client. Access Current Connections
Network CPU Memory Storage
Network (Requests) Web Service(Default Web Site)Current Connections MSExchange. IS Store(*)RPC Average Latency < 100 ms MSExchange. IS Client Type(*)RPC Average Latency < 100 ms MSExchange. IS Store(*)RPC Operation/Sec MSExchange. IS Client Type(*)RPC Operation/Sec CAS Experience Mo. MT MSExchange Rpc. Client. AccessRPC Averaged Latency MSExchange Rpc. Client. AccessRPC Operations/sec EAS MSExchange Active. SyncRequests/sec MSExchange Active. SyncCurrent Requests EWS MSExchange. WSAverage Response Time MSExchange. WSRequests/sec OWA MSExchange OWAAverage Response Time MSExchange OWAAverage Search Time MSExchange OWARequests/sec POP MSExchange. Pop 3(*)Average LDAP Latency MSExchange. Pop 3(*)Average RPC Latency MSExchange. Pop 3(*)Request Rate IMAP MSExchange. Imap 4(*)Average LDAP Latency MSExchange. Imap 4(*)Average RPC Latency MSExchange. Imap 4(*)Request Rate Management / Background Ops PS MSExchange. Remote. PowershellCurrent Connection Sessions MSExchange. Remote. PowershellCurrent Connected Unique Users Overall RPC Average Latency is not impacted
Memory (Exchange Process Usage) Memory% Committed Bytes in Use < 80% MemoryAvailable MBytes > 5% or RAM . NET CLR Memory(*)% Time in GC Should be below 10% on average . NET CLR Exceptions(*)# of Excepts Thrown / sec Should be less than 5% of total requests per second (RPS) (Web Server(_Total)C onnection Attempts/sec *. 05). . NET CLR Memory(*)# Bytes in all Heaps Memory (Workstation. GC to Server. GC). NET CLR MemoryAllocated Bytes/sec Sustained >50 mb Only 30% bytes committed
Storage (Exchange I/O) MSExchange Active Manager(_total)Database Mounted Balanced across all MBX servers MSExchange Database ++> Instances(*)I/O Database Reads (Attached) Average Latency < 20 ms MSExchange Database ++> Instances(*)I/O Database Writes(Attached) Average Latency < 50 ms MSExchange Database ++> Instances(*)I/O Log Writes Average Latency < 10 ms MSExchange Database ++> Instances(*)I/O Database Reads (Recovery) Average Latency < 200 ms MSExchange Database ++> Instances(*)I/O Database Writes(Recovery) Average Latency < read latency for same instance as above I/O is acceptable
CPU (Exchange Processes) Processor(_Total)% Processor Time Should be less than 75% on average. Processor(_Total)% Privileged Time (kernel) Should be less than 75% on average. Processor(_Total)%User Time Should be less than 75% on average. Process (*)% Processor Time <specific process> SystemProcessor Queue Length (all instances) Shouldn't be greater than 5 per processor. W 3 WP#3 is the MSExchange. Rpc. Proxy. Front. End. App. Pool W 3 wp#3 high CPU
Most Recent Usage Provides a periodic snapshot of executing code. Used by developers to track “hot” code paths Requires source code to interpret. Download Start http: //aka. ms/perfview http: //channel 9. msdn. com/Serie s/Perf. View-Tutorial ntdll!Zw. Wait. For. Multiple. Objects KERNELBASE!Wait. For. Multiple. Objects. Ex clr!Wait. For. Multiple. Objects. Ex_SO_TOLERANT clr!Thread: : Do. Appropriate. Apt. State. Wait clr!Thread: : Do. Appropriate. Wait. Worker clr!Thread: : Do. Appropriate. Wait clr!CLREvent. Base: : Wait. Ex clr!Aware. Lock: : Enter. Epilog. Helper clr!Aware. Lock: : Enter. Epilog clr!Aware. Lock: : Contention clr!JITutil_Mon. Contention System_Web_ni!System. Web. Buffer. Allocator. Get. Buffer() System_Web_ni!System. Web. Hosting. Recyclable. Array. Helper. Get. Int. Ptr. Array(Int 32) System_Web_ni!System. Web. Hosting. IIS 7 Worker. Request. Flush. Cached. Response(Boolean) System_Web_ni!System. Web. Http. Response. Update. Native. Response(Boolean) System_Web_ni!System. Web. Http. Response. Flush(Boolean, Boolean) System_Web_ni!System. Web. Http. Writer. Write. From. Stream(Byte[], Int 32) mscorlib_ni!System. IO. Stream. <Begin. Write. Internal>b__11(System. Object) mscorlib_ni!System. Threading. Tasks. Task`1[[System. Boolean, mscorlib]]. Inner. Invoke() mscorlib_ni!System. Threading. Tasks. Task. Execute() mscorlib_ni!System. Threading. Execution. Context. Run. Internal(System. Threading. Execution. Context, System. Threading. Context. Callback, System. Object, Boolean) mscorlib_ni!System. Threading. Execution. Context. Run(System. Threading. Execution. Context, System. Threading. Context. Callback, System. Object, Boolean) mscorlib_ni!System. Threading. Tasks. Task. Execute. With. Thread. Local(System. Threading. Tasks. Task By. Ref) mscorlib_ni!System. Threading. Tasks. Task. Execute. Entry(Boolean) mscorlib_ni!System. Threading. Thread. Pool. Work. Queue. Dispatch() clr!Call. Descr. Worker. Internal clr!Call. Descr. Worker. With. Handler clr!Method. Desc. Call. Site: : Call. Target. Worker clr!Method. Desc. Call. Site: : Call_Ret. Bool clr!Queue. User. Work. Item. Managed. Callback clr!Managed. Thread. Base_Dispatch. Inner clr!Managed. Thread. Base_Dispatch. Middle clr!Managed. Thread. Base_Dispatch. Outer clr!Managed. Thread. Base_Dispatch. In. Correct. AD clr!Thread: : Do. ADCall. Back clr!Managed. Thread. Base_Dispatch. Inner clr!Managed. Thread. Base_Dispatch. Middle clr!Managed. Thread. Base_Dispatch. Outer clr!Managed. Thread. Base_Full. Transition. With. AD clr!Managed. Thread. Base: : Thread. Pool clr!Managed. Per. App. Domain. TPCount: : Dispatch. Work. Item clr!Threadpool. Mgr: : Execute. Work. Request clr!Threadpool. Mgr: : Worker. Thread. Start clr!Thread: : intermediate. Thread. Proc kernel 32!Base. Thread. Init. Thunk ntdll!Rtl. User. Thread. Start
Source From:
Investigation Large number of connections to server in short timeframe ~4 weeks Preferred architecture not followed Network load balancer adds server to rotation Rpc. Proxy Front. End App. Pool requests backlogged Network load balancer takes server out of rotation Managed Availability Probe Fails Customer scaled beyond tested configuration NLB algorithm not optimized for Exchange load profile Resolution Least Connection / Slow Start on hardware LB Reduced Cores < 20 Scalability Improvements coming. NET 4. 6 (In Preview) Managed Availability restarts service
Large Organization Configuration 16 Cores / 92 GB RAM per server Deployed Exchange 2013 in All-In-One configuration NLB configured for ‘Round Robin’ What Happened? File writes failing, MA Probe failures, MDB Failovers Encountered bug with Anti-Virus Failed to deploy recommended fixes prior to migration Exposed new bug Impact Users frequently disconnected during peak periods ~8 weeks to isolate problem ~3 weeks to get fix and configuration changes in place
IIS Rpc. Http. Proxy IIS Rpc. Http RPC Client Access Store Worker Stalled I/O delaying clients response (dump showed 6 min lock) I/O Manager Is Valid File to Scan? File System Driver Anti-Virus Filter Driver Device Driver Mini-Port Driver MBx. DB Continued I/O delayed stalled forces MA to move Databases.
Responders Goals Bring Office 365 Capabilities On-Premises Monitor based upon end user experience Focus on recovery oriented computing Components Probes test components and user experience Monitors analyze probe(s) for Pass/Fail Responders take action based up monitor results When troubleshooting Restart Bug. Check Reset App. Pool Offline Failover MBX Escalate Services Monitors Outlook. Rpc. Ctp. Probe Outlook. Proxy. Test. Probe Outlook. Rpc. Self. Test. Probe Monitor failures are a signal to a problem Consistent failures can force a bluescreen Performance Counters Event Logs
Storage Some Database I/O Latencies, but overall I/O is fairly healthy.
CPU The server appears to be busy but uncertain if this normal or a bug… W 3 wp#11 CPU util running hot?
Private Bytes reached 10 GB+ before restarting Memory Massive growth in memory footprint of w 3 wp#11 process throughout the day. W 3 WP Process ID = 62192
App. Domain Used to enable isolation within a process 3 App. Domain by default Normal W 3 WP for Exchange has 3 -4 App. Domains Created as a result of config change Exchange Leak in W 3 SVC/1= MSExchange. Rpc. Proxy. Front. End. App. Pool Process Explorer View App. Domains and other. NET stats for running processes. Process Explorer
Outlook Anywhere Servicelets used by Exchange for minor tasks RPCHTTPServicelet runs every 15 minutes RPCHTTPServicelet was writing update to the Default Web Site/Rpc site from “SSL” to “None” on every run. What was causing this change to continually be updated?
Config Binaries Front-End App. Domain +10 Front-End App. Domain Connections Heaps ser 0 U Back-End App. Domain s Front-End App. Domain (~125 mb at startup) 0 U +5 Default App. Domain Us er s System App. Domain +2 00 MSExchange. RPCApp. Pool Every 15 Min Set SSLOffloading = true MSExchange Services Host Store Worker Instance +60 Users RPC Client Access rs se Front-End App. Domain MBx. DB
Investigation ~10 weeks of investigation Many iterations of data collected analyzed Data Collection Deployment Guidance Missteps NLB Configuration Set to Round Robin Most recent CU Update + Hotfixes Resolution NLB Configuration changed to Slow Start Most recent CU Update + Hotfixes installed Interim configuration change until KB 2925281 hotfix release Final fix in Exchange 2013 Service Pack 1 Analysis
• • Exchange Server 2013 Performance Recommendations Exchange 2013 Sizing and Configuration Recommendations Exchange 2013 Performance Counters for troubleshooting • • IIS Logs and Log Parser Studio Reports Exchange Performance Data Collection tool Exchange 2013 Performance Health Checker Script Windows Performance Tool. Kit (WPT) Performance Analysis of Logs (PAL) Tool Windows Sys. Internals •
BRK 3131: Exchange Design Concepts and Best Practices BRK 3197: Exchange Server Preferred Architecture BRK 3178: Exchange on Iaa. S: Concerns, Tradeoffs, and Best Practices BRK 3173: Experts Unplugged: Exchange Server Deployment and Architecture BRK 3158: Experts Unplugged: Exchange Top Issues BRK 3129: Deploying Exchange Server 2016 BRK 3102: Experts Unplugged: Exchange Server High Availability and Site Resilience
http: //myignite. microsoft. com
- Slides: 42