How To Write A Device Driver That Doesn’t Break On Partitioned Systems Mike Tricker Program Manager Windows Kernel Microsoft Corporation Bruce Sherwin Software Design Engineer Windows Kernel Microsoft Corporation Agenda Introduction Motivation Dynamic Partitioning (DP) on Windows Server codename “Longhorn” Kernel driver reminders Driver considerations on partitionable systems Call to action Dynamic Partitioning Core Memory Core … … Core Cache Core Memory … Core Memory Cache Core Core … Core Cache Cache Windows Server Longhorn dynamic hardware partitioning features are focused on improving server RAS Memory Service Processor IO Bridge IO Bridge IO Bridge IO Bridge ... ... ... ... Partition Manager PCI Express Example Hardware Partitionable Server 1. Partition Manager provides the UI for partition creation and management 2. Service Processor controls the inter processor and IO connections 3. Platforms partitionable to the socket level; virtualization used for sub socket partitioning 4. Support for dynamic partitioning and socket replacement Motivation For DP This is being driven in part by the availability of multi-core processors In a year or two a 4-processor system will have the performance of a 32-processor system today And will require Enterprise-class RAS features In Windows Server Longhorn DP is focused primarily on Reliability, Availability and Serviceability (RAS) Minimizing unplanned downtime due to failing hardware Replacing hardware that is showing signs of impending failure Capacity on demand Adding system resources as needed What We Are Doing Windows Server Longhorn 64-bit will add support for Hot Add of processors Together with PCI Express native hot plug of devices And continues to support Hot Add memory It will also support Hot Replace of memory and processors Transparent to applications Drivers receive power IRPs Windows Server Longhorn is planned to support Hot Add of I/O host bridges Supporting devices using line-based interrupts DP will not be available on all Server SKUs Kernel Driver Reminders All 64-bit x64 kernel drivers must be signed or they will not load on Windows Server Longhorn or Windows Vista Drivers that manipulate critical kernel data structures will cause the system to bugcheck, for example System service tables IDT GDT Kernel stacks not allocated by the kernel Patching any part of the kernel Taking Advantage Of DP In Your Device Driver DP-Aware Drivers Receiving Notifications Use IoRegisterPlugPlayNotification() EventCategory is EventCategoryDeviceInterfaceChange The GUID used in the DEVICE_INTERFACE_CHANGE_NOTIFICATION structure will be For Hot Add memory: GUID_DEVICE_MEMORY For Hot Add processor: GUID_DEVICE_PROCESSOR Include WDM.H and POCLASS.H DP-Aware Drivers Memory Changes Note: Hot Add memory does not affect either paged or non-paged pool sizes in Windows Server Longhorn Drivers may attempt to allocate more physical memory after receiving the Hot Add memory notification Since memory usage is fundamentally competitive they should handle the case of an allocation failure as they (should) do already DP-Aware Drivers Processor Changes Drivers that care about the number of active processors should call KeQueryActiveProcessorCount to update their count Do not use KeNumberProcessors – it’s undocumented and is not static If they receive a Hot Add processor notification they must call this DDI again to see what changed Which gives them both the updated count and the affinity mask So they can update their internal count Affinity Mask Manipulation Avoid affinity manipulation “by hand”; use appropriate RTL APIs for affinity manipulation when absolutely required As mentioned on the previous slide KeQueryActiveProcessorCount should do exactly what you need As well as returning the count of active processors it also returns the affinity mask: Count = KeQueryActiveProcessorCount( &AffinityMask); This will provide you with a current mask that you can use as you do today Processor Scaling If a driver uses per-processor worker threads Create new ones per newly added processor Load balancing algorithms should be Hot Add processor-aware to ensure scalability As processors are added workloads may need to rebalance between threads Miniport developers should ask the class driver owners about possible behavior changes on Hot Add How the miniport could benefit from the additional resources Per-Processor Data Using arrays (or “slotted” data structures) with one entry per processor works well Either allocate enough memory to handle the maximum number of processors possible for that architecture when creating the data structure And accept the overhead when the additional processors are not present Or use a data structure that can be grown dynamically as processors are added So long as your driver gets notified when processors are Hot Added Provide wrapper functions to hide the complexity of accessing the entries using the processor number as the index E.g., GetData(Processor) and SetData(Processor) Using Per-Processor Data // Register for PnP notification status = IoRegisterPlugPlayNotification( EventCategoryDeviceInterfaceChange, 0, &GUID_DEVICE_PROCESSOR, DriverObject, CpuDeviceArrivalNotificationCallback, NULL, (PVOID) PerProcessorData ); NTSTATUS CpuDeviceArrivalNotificationCallback( IN PVOID NotificationStructure, IN PVOID Context ) { MY_DATA_STRUCTURE *PerProcessorData = (MY_DATA_STRUCTURE *)Context; // Get current processor count and affinity newNumberOfProcessors = KeQueryActiveProcessorCount(&newActiveProcessorMask) Using Per-Processor Data if (newNumberOfProcessors > oldNumberOfProcessors) { // Expand existing per-processor data structures to be aware of new //processors InitializeMyDataStructureWithNewProcessorCount( PerProcessorData, newNumberOfProcessors); // Create a thread that runs with the new processor affinity PsCreateSystemThread(&thread, THREAD_ALL_ACCESS, NULL, NULL, NULL, (PKSTART_ROUTINE) MyWorkerThread, (PVOID) &newActiveProcessorMask); ... } } VOID MyWorkerThread(IN OUT PVOID Context) { PKAFFINITY affinityMask = (PKAFFINITY)Context; // Set thread affinity KeSetSystemAffinityThread(*affinityMask); ... } Interrupt Targeting To a New Processor System triggers a tree-wide rebalance Allows interrupt-consuming device drivers to connect interrupts to newly added processors DPCs queued from an ISR will run on the same processor by default Please don’t fail QUERY_STOP or you will prevent the system from distributing interrupts to new processors DP-Aware Drivers Resource Rebalance Started State IRP_MN_QUERY_STOP_DEVICE (1) IRP_MN_CANCEL_STOP_DEVICE (4) Stop-Pending State IRP_MN_START_DEVICE (3) IRP_MN_STOP_DEVICE (2) Stopped State Failed Restart (5) Connecting Interrupts Windows will include new processor in the affinity for interrupt resource assigned during rebalance New resources passed in IRP_MN_START_DEVICE Driver should call IoConnectInterruptEx() using this new affinity This will connect the device interrupt to new processor Application Implications Running processes will not change affinity by default for compatibility They can receive notifications and change their affinity if desired RegisterDeviceNotification() GUID_DEVICE_MEMORY GUID_DEVICE_PROCESSOR SetProcessAffinityMask() SetThreadAffinityMask() GlobalMemoryStatusEx() New processes will take advantage of new processor(s) System process affinity will be changed to include the newly added processors Worker threads run in system process User Mode IOCTLs Applications may issue custom IOCTLs to a partner device driver ~40% of “tier 1” applications we test today install drivers to handle specific tasks In this case both need to register for Hot Add processor notifications otherwise they may get out of sync For example the application may issue IOCTLs on a thread running on a processor the driver doesn’t know about If the driver uses per-processor data, then it needs to register for hot add processor notification and update its data structures Hot Replace Flow System state is migrated in stages 1. Paged memory is copied 2. System is quiesced Stop DMA Stop interrupts Devices moved to D3 Quiesce duration ~1 second 3. Non-paged memory is copied Processor state is migrated 4. System is resumed DMA and interrupts resumed Drivers return to D0 Hot Replace Implications For Drivers Hot Replace uses a pseudo-S4 state where we temporarily “hibernate” the system to quiesce devices (stop DMA operations and stop interrupts) But without actually using a hiberfile Your driver therefore needs to handle power S IRP and D IRP requests IRP_MN_SET_POWER, S4 IRP_MN_SET_POWER, D3 Note: This includes IA-64 System Implications For Hot Replace Applications do not see a system change when a Hot Replace operation occurs Amount of physical memory is unchanged Number of logical processor is unchanged This is deliberate to avoid application compatibility issues Hot Replace is treated as an atomic operation Hot Add Processor Bruce Sherwin Software Design Engineer Windows Kernel Related Issues NUMA Behavior NUMA is becoming widespread and should be transparent in most cases If the platform supports the _PXM ACPI method you will automatically use NUMA local memory at AddDevice and StartDevice time This ensures your device extensions and common buffers are allocated out of NUMA local memory Testing Today's DP-capable systems are typically large, restricting access for driver developers We expect that to change in the next few years In the meantime if your device needs to work with DP we expect you to work with the OEM on whose system you’ll ship to ensure a great customer experience Also work with Microsoft, since we’re also testing these systems If you’re already in-box and can run on these systems you should be covered Future Technologies Indicating DP Support Detecting If DP Is Available We’ve been asked how to determine if a system actually supports DP Since OEMs may ship very similar systems with DP as a high profile RAS feature And thus not enabled on lower cost systems Firmware mechanism to report platform capabilities Kernel and user mode API to query this capability Exact mechanism TBD Supporting Hot Remove Windows Server Longhorn will not support Hot Remove of memory or processors When we do we’re concerned about drivers and applications Sparse affinity mask and drivers parsing the mask until finding a missing entry Memory that’s been pinned for DMA Drivers using KeNumberProcessors We’ll have to be very careful about what drivers we load on such a platform And consider what may need to be blocked Call To Action If you want your 64-bit x64 drivers to load on Windows Server Longhorn (or Windows Vista) ensure they’re signed Ensure that your 64-bit x64 drivers avoid manipulating undocumented private kernel data structures as they will bugcheck the system Please follow the guidelines for good behavior outlined in this presentation Realize that DP won’t be “just a high end feature” for much longer That will change during the life of Windows Server Longhorn Ensure that any driver targeting future Windows Server releases is DP (and NUMA) aware Additional Resources Web resources Dynamic Partitioning home page http://www.microsoft.com/whdc/system/platform/server/dhp.mspx White papers http://www.microsoft.com/whdc/system/vista/kernel-en.mspx WinHEC 2005 sessions http://download.microsoft.com/download/9/8/f/98f3fe47-dfc3-4e74-92a3088782200fe7/TWAR05016_WinHEC05.ppt Kernel patching FAQ http://www.microsoft.com/whdc/driver/kernel/64bitpatch_FAQ.mspx Related sessions Kernel Enhancements for Windows Server Longhorn How to Use the WDK to Develop, Sign and Test Drivers For feedback on all things pertaining to Dynamic Partitioning please send mail to: dpfb @ microsoft.com © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
© Copyright 2024