Reconstruct x86 Interrupt Architecture with Hierarchy IRQ Domain Jiang Liu <[email protected]> Rui Wang <[email protected]> 1 About Myself – Jiang Liu (Gerry Liu) •An OS Hacker • Fan of Linux kernel since 1998 • Contributing to Linux community since 2012 • Enabled x86 CPU/memory hotplug on OpenSolaris •To • Be A Farmer Once retired from IT Industry 2 Agenda X86 Hardware Interrupt Architecture X86 Software Interrupt Architecture Hierarchy IRQ Domain Reconstruct x86 Interrupt Architecture Q/A 3 Why Do We Care About X86 Interrupt Architecture? --To Support Physical Processor Hotplug •Physical processor hotplug for • Dynamic capacity adjustment • Device hot-replacement • North bridge has been built into physical processor • Memory controllers • PCIe host bridge (PCIe root ports) • IOMMU devices • IOAPIC devices • IOAT devices MC QPI MC CPU L3 CPU Cac CPU CPU he CPU CPU PCI Root Complex DMA IOMMU IOAPIC Root Port 4 System Device Hotplug Development Status --More Than 10-year Efforts •System devices includes CPUs, memory devices, PCI host bridges etc. •Community has worked on system device hotplug for more than 10 years, starting from 2002. •Current development status • ACPI based hotplug framework is ready • Logical CPU hotplug is ready • PCIe host bridge hotplug is ready • Memory hotplug is ready, but still needs improvements • IOMMU hotplug is under development • IOAPIC hotplug is under development • IOAT hot-addition is ready, but hot-removal still doesn’t work MC QPI MC CPU L3 CPU Cac CPU CPU he CPU CPU PCI Root Complex DMA IOMMU IOAPIC Root Port Hopefully we will have full functional system device hotplug soon 5 X86 Interrupt Hardware Architecture --Legacy Programmable Interrupt Controller •Pin-based Interrupt Controller • Limited Interrupt pins • Multiple devices may share the same pin •Designed for UP •Hardcoded or jumper to configure IRQ number •Most famous Intel 8259 Interrupt Controller 6 X86 Interrupt Hardware Architecture --Message Based Interrupt Controller •Advanced Programmable Interrupt Controller • Designed for SMP • Local APIC and I/O APIC • Convert pin-based request into message •PCI MSI • HPET IRQ • DMAR IRQ •HyperTransport IRQ 7 X86 Interrupt Hardware Architecture --Interrupt Remapping •Interrupt Remapping for • Interrupt Isolation (virtualization) • Atomic Interrupt Migration • x2APIC Support •Remap All Message Based Interrupt except • IPI • DMAR IRQ 8 X86 Interrupt Hardware Architecture --The Whole Picture – A Hierarchy Topology CPU Core Local APIC VT-d/IR 8259 Local APIC HTIRQ DMAR MSI HPET IOAPIC Interrupt Controller 9 X86 Hardware Interrupt Architecture X86 Software Interrupt Architecture Hierarchy IRQ Domain Reconstruct x86 Interrupt Architecture Q/A 10 Linux Interrupt Management Architecture •Interrupt management core is platform independent, which provides: • Manage all interrupt activities • Client interfaces for device drivers • Interfaces for interrupt controller drivers • irq_desc data structure management • IRQ domain helper •Interrupt source enumerator, delivery and controller drivers are platform dependent Device Driver Device Driver Device Driver Device Driver Interrupt Management Core Device Driver Interfaces IRQ Domain IRQ Descriptor Interrupt Source Enumerator Manager Misc Helpers Interrupt Controller Interfaces Interrupt Delivery Interrupt Controller Driver 11 Interfaces for Device Drivers •The interfaces for device drivers are relatively simple, just allocate/free/enable/diable IRQ •Device drivers needs to provide: • Interrupt identifier: unsigned int irq But where comes this magic interrupt identifier? • Interrupt handler: irq_handler_t handler • Callback argument: void *dev Device Driver Device Driver Flags to control interrupt behavior: unsigned long flags Device Driver Interrupt Management Core Device Driver Interfaces IRQ Domain IRQ Descriptor • Device Driver Interrupt Source Enumerator Manager Misc Helpers Interrupt Controller Interfaces Interrupt Delivery Interrupt Controller Driver int request_threaded_irq(unsigned int irq, irq_handler_t handler, irq_handler_t thread_fn, unsigned long flags, const char *name, void *dev); int request_irq(unsigned int irq, irq_handler_t handler, unsigned long flags, const char *name, void *dev); int request_any_context_irq(unsigned int irq, irq_handler_t handler, unsigned long flags, const char *name, void *dev_id); void free_irq(unsigned int, void *); void disable_irq(unsigned int irq); void enable_irq(unsigned int irq); 12 Interrupt Source Enumerator --Who creates the magic interrupt identifier •It’s interrupt source enumerator’s responsibility to: • Associate interrupt source with device • Allocate IRQ number for interrupt source Device Driver Device Driver Device Driver Device Driver Interrupt Management Core Device Driver Interfaces • Configure flow handler and irq_chip for IRQ •Statically enumerates interrupt sources • By firmware table: MP table, ACPI table, FDT • For IOAPIC, HPET, DMAR •Dynamically enumerates interrupt sources • by hardware specifications • For PCI MSI/MSIx, HT_IRQ etc IRQ Domain IRQ Descriptor Interrupt Source Enumerator Manager Misc Helpers Interrupt Controller Interfaces Interrupt Delivery Interrupt Controller Driver int pci_enable_msix(struct pci_dev *dev, struct msix_entry *entries, int nvec); int pci_enable_msi_exact(struct pci_dev *dev, int nvec); int __ht_create_irq(struct pci_dev *dev, int idx, ht_irq_update_t *update); 13 IRQ Domain -- Helper for IRQ Source Enumerator •A framework to dynamically associate/disassociate IRQ numbers for interrupt sources • Device Driver Device Driver Device Driver Interrupt Management Core Device Driver Interfaces IRQ Domain An IRQ domain manages a group of interrupt sources Manager IRQ Descriptor • • It allocates/frees IRQ numbers for interrupt sources on demand Device Driver Interrupt Source Enumerator Misc Helpers Interrupt Controller Interfaces Interrupt Delivery Interrupt Controller Driver It configure/deconfigure interrupt controller hardware on demand unsigned int irq_create_mapping(struct irq_domain *host, irq_hw_number_t hwirq); void irq_dispose_mapping(unsigned int virq); struct irq_domain { struct list_head link; const char *name; const struct irq_domain_ops *ops; … }; struct irq_desc { struct irq_data unsigned int __percpu irq_flow_handler_t … }; struct irq_data { u32 unsigned int unsigned long struct irq_domain … }; irq_data; *kstat_irqs; handle_irq; mask; irq; hwirq; *domain; 14 Interfaces for Interrupt Controller •Flow handler • control the overall interrupt handling flow •Struct irq_chip • callbacks to configure and control interrupt controller •Struct irq_data • irq chip data passed down to chip functions Device Driver Device Driver Device Driver Device Driver Interrupt Management Core Device Driver Interfaces IRQ Domain IRQ Descriptor Interrupt Source Enumerator Manager Misc Helpers Interrupt Controller Interfaces Interrupt Delivery Interrupt Controller Driver struct irq_chip { const char unsigned int void void void void void void void void int int int int void void void void void void void void void int void unsigned long }; *name; (*irq_startup)(struct irq_data *data); (*irq_shutdown)(struct irq_data *data); (*irq_enable)(struct irq_data *data); (*irq_disable)(struct irq_data *data); (*irq_ack)(struct irq_data *data); (*irq_mask)(struct irq_data *data); (*irq_mask_ack)(struct irq_data *data); (*irq_unmask)(struct irq_data *data); (*irq_eoi)(struct irq_data *data); (*irq_set_affinity)(struct irq_data *data, co (*irq_retrigger)(struct irq_data *data); (*irq_set_type)(struct irq_data *data, unsign (*irq_set_wake)(struct irq_data *data, unsign (*irq_bus_lock)(struct irq_data *data); (*irq_bus_sync_unlock)(struct irq_data *data) (*irq_cpu_online)(struct irq_data *data); (*irq_cpu_offline)(struct irq_data *data); (*irq_suspend)(struct irq_data *data); (*irq_resume)(struct irq_data *data); (*irq_pm_shutdown)(struct irq_data *data); (*irq_calc_mask)(struct irq_data *data); (*irq_print_chip)(struct irq_data *data, stru (*irq_request_resources)(struct irq_data *dat (*irq_release_resources)(struct irq_data *dat flags; 15 X86 Interrupt Management Architecture •Flattened topology • Vector management, interrupt remapping, interrupt controller are flatten into one irq_chip 8259 enumerator & default_legacy_pic ioapic_chip ht_irq_chip msi_chip lapic_chip hpet_msi_ty pe dmar_msi_ty pe Call Call Overwrite Interrupt remapping drivers CPU vector management Ioapic enum HPET enum HTIRQ enum DMAR enum 16 X86 Interrupt Implementation Example Quotation from Thomas: There are a few other things to consider. Right now the irq remapping code modifies the x86_ops to redirect the ioapic and the msi functions to the remap implementations and at irq setup time it also modifies the irq chip callbacks. That's a completely unreadable and undocumented maze of indirections and substitutions. All in all design from hell hacked into submission ... arch_setup_msi_irqs() irq_remapping_setup_msi_irqs() do_setup_msix_irqs() irq_alloc_hwirq() irq = __irq_alloc_descs() arch_setup_hwirq(irq) cfg = alloc_irq_cfg(irq) __assign_irq_vector(irq, cfg) irq_set_chip_data(irq, cfg) msi_alloc_remapped_irq(irq) intel_msi_alloc_irq(irq) alloc_irte(irq) setup_msi_irq(irq) msi_compose_msg(irq) assign_irq_vector() x86_msi.compose_msi_msg() compose_remapped_msi_msg() intel_compose_msi_msg() fiddle_with_irte() write_msi_msg() setup_remapped_irq() irq_remap_modify_chip_defaults () x86_init.c irq_remapping.c irq core code io_apic.c irq_remapping.c intel_irq_remapping.c io_apic.c <- Second invocation !?!?! irq_remapping.c intel_irq_remapping.c io_apic.c msi.c irq_remapping.c 17 X86 Hardware vs Software Architecture CPU Core Local APIC VT-d/IR 8259 Local APIC 8259 enumerator & default_legacy_pic HTIRQ DMAR MSI HPET ioapic_chip ht_irq_chip msi_chip lapic_chip hpet_msi_ty pe dmar_msi_ty pe IOAPIC Call Call Overwrite Interrupt remapping drivers CPU vector management 18 X86 Interrupt Management Limitations •Hard to reserve IRQ numbers for IOAPIC hotplug • IRQ number for IOAPIC is statically allocated at the low end during boot •Software architecture doesn’t confirm to hardware architecture • Hard to understand and maintain •Unnecessary dependency on IOAPIC for MSI/HPET/DMAR/HTIRQ • They really depends on Local APIC instead of IOAPIC So improvements are needed here! •Dynamically allocate IRQ number for IOAPIC on demand •Make software architecture confirm to hardware architecture for easy maintenance 19 Agenda X86 Hardware Interrupt Architecture X86 Software Interrupt Architecture Reconstruct with Hierarchy IRQ Domain Q/A 20 Background: Details about Legacy PIC and IOAPIC •X86 is not a perfect, or even a dirty, world for backward compatibility • Always challenging to deal with legacy ISA IRQs • PIC mode, APIC mode, Virtual Wire mode 21 First Step: Use IRQ Domain to Manage IOAPIC •A straight forward step to use IRQ domain to manage IOAPIC • One IRQ domain for each IOAPIC controller • Each IOAPIC pin is an interrupt source • IRQ domain callbacks to program IOAPIC entries •Dynamical IRQ management • Allocate IRQ number on demand • Free IRQ number when not needed any more • RISK: some drivers assume IRQ == IOAPIC pin •Three IRQ number allocation policies • LEGACY: ensure IRQ == pin number to support IOAPIC managing legacy ISA IRQs • STRICT: ensure IRQ == pin number to support platform without 8259 but assuming IRQ == pin number • DYNAMIC: dynamically allocate IRQ number for IOAPIC not managing legacy ISA IRQs 22 Second Step:How about This Hierarchy IRQ Domain? Match Software Architecture with Hardware Architecture CPU Core •From hardware point of view, x86 has a hierarchy interrupt topology with multiple interrupt controllers •So build a hierarchy software architecture to match hardware architecture •Use irqdomain to match interrupt controller •Each interrupt controller driver only talks to corresponding hardware and its parent irqdomain Local APIC VT-d/IR 8259 8259 enume rator & defau lt_le gacy_ pic Local APIC HTIRQ DMAR MSI HPET IOAPIC CPU vector domain remapping domain HTIRQ domain DMAR domain IOAPIC domain HPET domain MSI domain 23 How to Implement Hierarchy IRQ Domain? •Design Rules: Keep existing IRQ domain interfaces as is and add new interfaces to support hierarchy IRQ domain Root IRQ domain irq_data IRQ domain2 irq_data IRQ domain1 irq_data embedded in irq_desc •Enhance to existing Data Structures • Add “parent” to struct irq_domain • Add “parent_data to struct irq_data • Add new callbacks to struct irq_domain_ops •New hierarchy IRQ domain interfaces • irq_domain_alloc_irqs() • irq_domain_free_irqs() • irq_domain_activate_irq() • irq_domain_deactivate_irq() irq_get_irq_data(irq) 24 How Hierarchy IRQ Domain Works? •Interrupt Source Enumerator: create irqdomain for each interrupt controller • Enumerate interrupt controllers • Provide callbacks to manage those interrupt controllers •Interrupt Source Enumerator: call irq_domain_alloc_irqs(hwirq) to create IRQ for interrupt sources •Allocate IRQ number and resources to manage the interrupt source •Set flow handler and irq_chip for new IRQs •Interrupt Core: call irq_domain_alloc_irqs() to program interrupt controllers when irq_startup() is called •Program interrupt controller to correctly delivery the interrupt •Device Drivers: no changes needed 25 Step Three: Stacked IRQ Chip --What’s the Problem? OK, they belong to IOAPIC •Bad Encapsulation •Hard to Maintain static struct irq_chip ioapic_chip __read_mostly = { .name = "IO-APIC", .irq_startup = startup_ioapic_irq, .irq_mask = mask_ioapic_irq, .irq_unmask = unmask_ioapic_irq, .irq_ack = ack_apic_edge, .irq_eoi = ack_apic_level, .irq_set_affinity = native_ioapic_set_affinity, .irq_retrigger = ioapic_retrigger_irq, }; Worse, they belong to Local APIC, and may be override by IRQ remapping driver Bad, it belongs to Local APIC 26 Step Three: Stacked IRQ Chip --The Solution irq_data irq_chip3 call •We already have one struct irq_data for each interrupt controller driver, so we could set dedicated struct irq_chip into struct irq_data for each interrupt controller irq_data •Each interrupt controller driver(irq_chip) only manages its hardware, and asks help from parent controller when needed irq_data embedded in irq_desc irq_chip2 call irq_chip1 static struct irq_chip ioapic_chip __read_mostly = { .name = "IO-APIC", .irq_startup = startup_ioapic_irq, .irq_mask = mask_ioapic_irq, .irq_unmask = unmask_ioapic_irq, .irq_ack = irq_chip_ack_parent, .irq_eoi = ioapic_ack_level, .irq_set_affinity = ioapic_set_affinity, .irq_print_chip = irq_remapping_print_chip, }; 27 Effects of Reconstruction •All communication between interrupt controllers are through public IRQ domain interfaces •Each interrupt controller drivers are almost independent of other interrupt controller drivers • There are still some exceptions, especially for interrupt remapping •Minimized information shared among interrupt controller drivers •Refined Interrupt Remapping Interfaces •MSI is now independent of IOAPIC struct irq_cfg { struct irq_pin_list *irq_2_pin; cpumask_var_t domain; cpumask_var_t old_domain; u8 vector; u8 move_in_progress : 1; #ifdef CONFIG_IRQ_REMAP u8 remapped : 1; union { struct irq_2_iommu irq_2_iommu; struct irq_2_irte irq_2_irte; }; #endif }; struct irq_cfg { unsigned int dest_apicid; u8 vector; }; 28 Agenda X86 Hardware Interrupt Architecture X86 Software Interrupt Architecture Reconstruct with Hierarchy IRQ Domain Q/A 29 Current Development Status --We are targeting v3.19 merging window •The first part, using IRQ domain to manage IOAPIC, has been merged into v3.17. •The second part, enabling IOAPIC hotplug, is targeting v3.19 • https://lkml.org/lkml/2014/9/24/206 •The third part, preparing for enabling hierarchy irqdomain, is targeting v3.19 • https://lkml.org/lkml/2014/9/11/101 •The fourth part, converting local APIC, remapping drivers, MSI, DMAR, HPET to hierarchy irqdomain, is targeting v3.19 • https://lkml.org/lkml/2014/9/26/538 •The fifth part, converting IOAPIC to hierarchy irqdomain and cleaning up code, is targeting v3.19 too • https://lkml.org/lkml/2014/9/26/611 30 Apply to Other Architectures •ARM MediaTek is trying to adopt the new hierarchy IRQ domain mechanism • http://www.spinics.net/lists/arm-kernel/msg368808.html •Sure it will be used by other architectures/interrupt controllers too 31 Long-term Goal – No IRQ Number Anymore --Treat Each Interrupt Source as an Object •IRQ number is a legacy concept to globally number all interrupt sources with system • Good for flat pin-based IRQ controller •The Future – One Interrupt Object for Each Interrupt Source • Just pass on the interrupt object handler instead of an “unsigned int irq” Device Driver Device Driver IRQ domain is object for interrupt controller IRQ descriptor is object for interrupt source Device Driver Device Driver Interrupt Management Core Device Driver Interfaces IRQ Domain IRQ Descriptor Interrupt Source Enumerator Manager Misc Helpers Interrupt Controller Interfaces Interrupt Delivery Interrupt Controller Driver 32 33
© Copyright 2024