Network Driver Performance Outline Software features for high
- Slides: 20
Network Driver Performance
Outline Software features for high performance NICs Some of the top features include: Scatter-Gather DMA Automatic Tuning of resources Task Offloading support for IPv 6 Hardware features for high performance NICs Some of the top features include: Task Offloading support Receive-Side Scaling (RSS) support Performance Tools NTttcp Kernrate Profiler
Goals This information can be used to optimally tune your network driver to work with your hardware for best networking performance This information can be used to fine-tune your hardware features to operate at its optimal performance How to use NTttcp to isolate Network performance problems How to use Kernrate to identify bottlenecks on hot paths Note: The mention of packets is relevant to NDIS 5. x drivers and translates to Net. Buffers and Net. Buffer. Lists for NDIS 6. 0 drivers on Windows codenamed “Longhorn”
Software Optimizations
Network Software Optimizations Scatter Gather DMA SG DMA yields optimum performance with NDIS 6. 0 model It is highly recommended to pre-allocate the buffer hosting the SCATTER_GATHER_LIST as part of Transmit Control Block during the initialization phase and reuse it. Use maximum buffer size for Maximum. Physical. Mapping parameter in Ndis. MInitialize. Scatter. Gather. Dma function to avoid buffer allocation and copy Using Cached Memory to allocate NIC receive buffers X 86, IA 64, and x 64 hardware guarantees DMA coherency and there is no need to call Io. Flush. Buffer since it would become a nop Ndis. MAllocate. Shared. Memory ( p. Mp. Rxbuf->Alloc. Size, TRUE, // CACHED &p. Mp. Rxbuf->Alloc. Va, &p. Mp. Rxbuf->Alloc. Pa);
More Network Software Optimizations NDIS Safe APIs Required for NDIS 6. 0 model! It has shown overall TCP/IP improvements of up to 7% in Kernel mode scenarios (e. g. IIS 6. 0) Eliminate the need to call into Kernel for probing and locking buffer Set NDIS_ATTRIBUTES_USES_SAFE_BUFFER_APIS flag in Ndis. MSet. Attributes. Ex for NDIS 5. x drivers. The flag does not need to be set for NDIS 6. 0 drivers Example: When using Ndis. Query. Buffer. Safe, the Virtual. Address parameter should be set to NULL to avoid mapping of buffers sent down by NDIS 64 -bit DMA Support Avoid copies for addresses above the 4 GB range by setting Dma 64 Addresses to TRUE in Ndis. MInitialize. Scatter. Gather. Dma
Locking Mechanisms Optimizations Expensive hit to system performance if not used properly Measurements show that we use approximately 160 cycles for Lock Acquires and 140 cycles for Lock Releases. Spinlocks should be used to protect data and not code. Locking at DPC Level When at DPC level, avoid extra code by using the following: Ndis. Dpr. Acquire. Spinlock Ndis. Dpr. Release. Spinlock Reader-Write Locks To minimize the number of spinlock acquire and release operations, use the NDIS Read. Write. Lock functions for scalability: Ndis. Initialize. Read. Write. Lock Ndis. Acquire. Read. Write. Lock Ndis. Release. Read. Write. Lock The Read-Write Locks allow multiple concurrent readers to use a single lock and limit write access to a single writer thread. No read access is allowed during a write access. They will still behave like a spinlock and raise the IRQL to dispatch when acquired.
Auto Tuning Network Drivers Static: Driver and NIC hardware parameters are based on system configuration such as whether it is a client or server machine, CPU, memory, and what can the NIC do. Dynamic: System conditions dictate what type of tuning is necessary for optimum performance. It uses resource utilization and network load as metrics for determining the best operating points for the NIC and driver. Some of the primary auto tuning parameters include: Interrupt moderation Receive Buffers allocation Small buffer coalescing Packets processed per DPC Drivers can obtain current processor utilization by using the Ndis. Get. Current. Processor. Counts function.
Hardware Optimizations
Task Offload Support Checksum Offload It has shown to improve overall TCP/IP performance by up to 20% It improves caching effect and eliminates churning – 8% increase It reduces code path length – 12% improvement TCP Segmentation Offload It has shown to improve overall TCP/IP performance by up to 11% Reduces sender Cycles per Byte cost by 2 x (it goes below 1. 5) NDIS 6. 0 has support for successor: Giant Send Offload (> 64 K) NDIS 6. 0 has IPv 6 support for TCP Segmentation Offload NDIS 6. 0 offers support for IPSec Offload
Message Signaled Interrupts (MSI) MSI has the following attributes: No acknowledgment is necessary for the message No sharing is usually necessary There is support for many interrupts per PCI function Caveat: It only works on P 4 and later chipsets Advantages of MSI With no sharing in place, latency is less with a single ISR running Bus utilization goes down by eliminating some read operations from device Device can target interrupts at designated processors (e. g. RSS) It guarantees data buffer coherency because message follows DMA traffic on bus
Receive Side Scaling (RSS) Existing stack limits receive processing to one CPU Restricts scalability of Web server to the number of short-lived connections a single CPU can process (per NIC) Limits transaction throughput to packet receive processing rate of one CPU Example: A four processor machine can not use more than 25% of its overall CPU cycles when hosting a single NIC on the system RSS helps both long and short-lived connections At times when CPU processing is dominated by connection setup, RSS improves performance Connection setup tasks map well to a general purpose CPU RSS gives us parallel receive processing = parallel DPCs Planned availability in Windows Server 2003 Network Scalable Pack Add-on and Longhorn
Receive Side Scaling Today NDIS CPU 0 ISR NDIS DPC NDIS CPU 1 DPC NDIS CPU 2 DPC Parallel Receive Packet Queues NIC One processor per NIC Multiple processors per NIC
Network Performance Tools NTttcp benchmark Uses Winsock 2. x publicly available APIs Uses Overlapped I/O and Multithreading model Transfers random data from Memory to Memory Provides Throughput, CPU, and Interrupt rate Provides Cycles per Byte metric - key for measuring performance to catch regressions Provides Packet to ACK ratio to detect link condition Provides number of Segment Retransmits and Errors Supports all Windows hardware architectures
NTttcp Output for a Single Thread
NTttcp Output for Multiple Threads
More Network Performance Tools Kernrate Profiling tool General purpose profiler for tracking CPU utilization Samples periodically (programmable) to see what is executing Adjustable granularity Per-processor, per-process, and total Supports all Windows hardware architectures Supports Windows 2000 and beyond Highly customizable (numerous options) The profiling tool and its viewer (Kr. View) can be downloaded from: http: //www. microsoft. com/whdc/system/sysperf/krview. mspx
Call To Action NDIS 6. 0 driver developers need to implement Task Offloading support for IPv 6 Fine-tune your hardware so it operates at its optimal performance point Fine-tune your network driver to work optimally with your hardware for best performance For questions, please e-mail ndis 6 fb @ microsoft. com. Please include your name, company name, and phone number
Additional Resources Email: ndis 6 fb @ microsoft. com Web Resources: Analyzing Driver Performance: http: //www. microsoft. com/whdc/driver/perform/ drvperf. mspx High Performing Adapters and Drivers whitepaper: http: //www. microsoft. com/whdc/device/network/ Net. Adapters-Drvs. mspx Kernrate is available for download from the following: http: //www. microsoft. com/whdc/system/sysperf/krview. mspx
© 2005 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.
- Features of peer to peer network and client server network
- User-mode driver framework
- Examples of quote sandwiches
- Ginasplug
- Drivers definition computer
- Eltima software virtual serial port
- Kontinuitetshantering
- Typiska drag för en novell
- Nationell inriktning för artificiell intelligens
- Vad står k.r.å.k.a.n för
- Varför kallas perioden 1918-1939 för mellankrigstiden
- En lathund för arbete med kontinuitetshantering
- Kassaregister ideell förening
- Tidbok yrkesförare
- Sura för anatom
- Densitet vatten
- Datorkunskap för nybörjare
- Stig kerman
- Att skriva debattartikel
- Delegerande ledarskap
- Nyckelkompetenser för livslångt lärande