Reconstruct x86 Interrupt Architecture with Hierarchy IRQ Domain Jiang Liu < >

Reconstruct x86 Interrupt Architecture
with Hierarchy IRQ Domain
Jiang Liu <[email protected]>
Rui Wang <[email protected]>
1
About Myself – Jiang Liu (Gerry Liu)
•An OS Hacker
•
Fan of Linux kernel since 1998
•
Contributing to Linux community since 2012
•
Enabled x86 CPU/memory hotplug on OpenSolaris
•To
•
Be A Farmer
Once retired from IT Industry
2
Agenda
X86 Hardware Interrupt Architecture
X86 Software Interrupt Architecture
Hierarchy IRQ Domain
Reconstruct x86 Interrupt Architecture
Q/A
3
Why Do We Care About X86 Interrupt Architecture?
--To Support Physical Processor Hotplug
•Physical processor hotplug for
•
Dynamic capacity adjustment
•
Device hot-replacement
•
North bridge has been built into physical processor
•
Memory controllers
•
PCIe host bridge (PCIe root ports)
•
IOMMU devices
•
IOAPIC devices
•
IOAT devices
MC
QPI
MC
CPU
L3
CPU
Cac
CPU
CPU
he
CPU
CPU
PCI Root Complex
DMA
IOMMU
IOAPIC
Root Port
4
System Device Hotplug Development Status
--More Than 10-year Efforts
•System devices includes CPUs, memory devices,
PCI host bridges etc.
•Community has worked on system device hotplug
for more than 10 years, starting from 2002.
•Current development status
•
ACPI based hotplug framework is ready
•
Logical CPU hotplug is ready
•
PCIe host bridge hotplug is ready
•
Memory hotplug is ready, but still needs improvements
•
IOMMU hotplug is under development
•
IOAPIC hotplug is under development
•
IOAT hot-addition is ready, but hot-removal still doesn’t work
MC
QPI
MC
CPU
L3
CPU
Cac
CPU
CPU
he
CPU
CPU
PCI Root Complex
DMA
IOMMU
IOAPIC
Root Port
Hopefully we will have
full functional system
device hotplug soon
5
X86 Interrupt Hardware Architecture
--Legacy Programmable Interrupt Controller
•Pin-based Interrupt Controller
•
Limited Interrupt pins
•
Multiple devices may share the same pin
•Designed for UP
•Hardcoded or jumper to configure IRQ number
•Most famous Intel 8259 Interrupt Controller
6
X86 Interrupt Hardware Architecture
--Message Based Interrupt Controller
•Advanced Programmable Interrupt Controller
•
Designed for SMP
•
Local APIC and I/O APIC
• Convert pin-based request into message
•PCI MSI
•
HPET IRQ
•
DMAR IRQ
•HyperTransport IRQ
7
X86 Interrupt Hardware Architecture
--Interrupt Remapping
•Interrupt Remapping for
•
Interrupt Isolation (virtualization)
•
Atomic Interrupt Migration
•
x2APIC Support
•Remap All Message Based Interrupt except
•
IPI
•
DMAR IRQ
8
X86 Interrupt Hardware Architecture
--The Whole Picture – A Hierarchy Topology
CPU Core
Local APIC
VT-d/IR
8259
Local
APIC
HTIRQ
DMAR
MSI
HPET
IOAPIC
Interrupt Controller
9
X86 Hardware Interrupt Architecture
X86 Software Interrupt Architecture
Hierarchy IRQ Domain
Reconstruct x86 Interrupt Architecture
Q/A
10
Linux Interrupt Management Architecture
•Interrupt management core is
platform independent, which
provides:
•
Manage all interrupt activities
•
Client interfaces for device drivers
•
Interfaces for interrupt controller drivers
•
irq_desc data structure management
•
IRQ domain helper
•Interrupt source enumerator,
delivery and controller drivers are
platform dependent
Device
Driver
Device
Driver
Device
Driver
Device
Driver
Interrupt Management Core
Device Driver Interfaces
IRQ Domain
IRQ Descriptor
Interrupt
Source
Enumerator
Manager
Misc Helpers
Interrupt Controller
Interfaces
Interrupt
Delivery
Interrupt
Controller
Driver
11
Interfaces for Device Drivers
•The interfaces for device drivers are relatively simple, just allocate/free/enable/diable IRQ
•Device drivers needs to provide:
•
Interrupt identifier: unsigned int irq
But where comes this magic
interrupt identifier?
•
Interrupt handler: irq_handler_t handler
•
Callback argument: void *dev
Device
Driver
Device
Driver
Flags to control interrupt behavior: unsigned long flags
Device
Driver
Interrupt
Management
Core
Device Driver
Interfaces
IRQ Domain
IRQ Descriptor
•
Device
Driver
Interrupt
Source
Enumerator
Manager
Misc Helpers
Interrupt Controller Interfaces
Interrupt
Delivery
Interrupt
Controller Driver
int request_threaded_irq(unsigned int irq, irq_handler_t handler,
irq_handler_t thread_fn, unsigned long flags, const char *name, void *dev);
int request_irq(unsigned int irq, irq_handler_t handler, unsigned long flags,
const char *name, void *dev);
int request_any_context_irq(unsigned int irq, irq_handler_t handler,
unsigned long flags, const char *name, void *dev_id);
void free_irq(unsigned int, void *);
void disable_irq(unsigned int irq);
void enable_irq(unsigned int irq);
12
Interrupt Source Enumerator
--Who creates the magic interrupt identifier
•It’s interrupt source enumerator’s responsibility to:
•
Associate interrupt source with device
•
Allocate IRQ number for interrupt source
Device
Driver
Device
Driver
Device
Driver
Device
Driver
Interrupt
Management
Core
Device Driver
Interfaces
•
Configure flow handler and irq_chip for IRQ
•Statically enumerates interrupt sources
•
By firmware table: MP table, ACPI table, FDT
•
For IOAPIC, HPET, DMAR
•Dynamically enumerates interrupt sources
•
by hardware specifications
•
For PCI MSI/MSIx, HT_IRQ etc
IRQ Domain
IRQ Descriptor
Interrupt
Source
Enumerator
Manager
Misc Helpers
Interrupt Controller Interfaces
Interrupt
Delivery
Interrupt
Controller Driver
int pci_enable_msix(struct pci_dev *dev,
struct msix_entry *entries, int nvec);
int pci_enable_msi_exact(struct pci_dev *dev, int nvec);
int __ht_create_irq(struct pci_dev *dev, int idx,
ht_irq_update_t *update);
13
IRQ Domain
-- Helper for IRQ Source Enumerator
•A framework to dynamically associate/disassociate IRQ numbers
for interrupt sources
•
Device
Driver
Device
Driver
Device
Driver
Interrupt
Management
Core
Device Driver
Interfaces
IRQ Domain
An IRQ domain manages a group of interrupt sources
Manager
IRQ Descriptor
•
•
It allocates/frees IRQ numbers for interrupt sources on demand
Device
Driver
Interrupt
Source
Enumerator
Misc Helpers
Interrupt Controller Interfaces
Interrupt
Delivery
Interrupt
Controller Driver
It configure/deconfigure interrupt controller hardware on demand
unsigned int irq_create_mapping(struct irq_domain *host,
irq_hw_number_t hwirq);
void irq_dispose_mapping(unsigned int virq);
struct irq_domain {
struct list_head link;
const char *name;
const struct irq_domain_ops *ops;
…
};
struct irq_desc {
struct irq_data
unsigned int __percpu
irq_flow_handler_t
…
};
struct irq_data {
u32
unsigned int
unsigned long
struct irq_domain
…
};
irq_data;
*kstat_irqs;
handle_irq;
mask;
irq;
hwirq;
*domain;
14
Interfaces for Interrupt Controller
•Flow handler
•
control the overall interrupt handling flow
•Struct irq_chip
•
callbacks to configure and control interrupt
controller
•Struct irq_data
•
irq chip data passed down to chip
functions
Device
Driver
Device
Driver
Device
Driver
Device
Driver
Interrupt
Management
Core
Device Driver
Interfaces
IRQ Domain
IRQ Descriptor
Interrupt
Source
Enumerator
Manager
Misc Helpers
Interrupt Controller Interfaces
Interrupt
Delivery
Interrupt
Controller Driver
struct irq_chip {
const char
unsigned int
void
void
void
void
void
void
void
void
int
int
int
int
void
void
void
void
void
void
void
void
void
int
void
unsigned long
};
*name;
(*irq_startup)(struct irq_data *data);
(*irq_shutdown)(struct irq_data *data);
(*irq_enable)(struct irq_data *data);
(*irq_disable)(struct irq_data *data);
(*irq_ack)(struct irq_data *data);
(*irq_mask)(struct irq_data *data);
(*irq_mask_ack)(struct irq_data *data);
(*irq_unmask)(struct irq_data *data);
(*irq_eoi)(struct irq_data *data);
(*irq_set_affinity)(struct irq_data *data, co
(*irq_retrigger)(struct irq_data *data);
(*irq_set_type)(struct irq_data *data, unsign
(*irq_set_wake)(struct irq_data *data, unsign
(*irq_bus_lock)(struct irq_data *data);
(*irq_bus_sync_unlock)(struct irq_data *data)
(*irq_cpu_online)(struct irq_data *data);
(*irq_cpu_offline)(struct irq_data *data);
(*irq_suspend)(struct irq_data *data);
(*irq_resume)(struct irq_data *data);
(*irq_pm_shutdown)(struct irq_data *data);
(*irq_calc_mask)(struct irq_data *data);
(*irq_print_chip)(struct irq_data *data, stru
(*irq_request_resources)(struct irq_data *dat
(*irq_release_resources)(struct irq_data *dat
flags;
15
X86 Interrupt Management Architecture
•Flattened topology
•
Vector management, interrupt remapping, interrupt controller are flatten into one irq_chip
8259 enumerator &
default_legacy_pic
ioapic_chip
ht_irq_chip
msi_chip
lapic_chip
hpet_msi_ty
pe
dmar_msi_ty
pe
Call
Call
Overwrite
Interrupt
remapping drivers
CPU vector management
Ioapic enum
HPET enum
HTIRQ enum
DMAR enum
16
X86 Interrupt Implementation Example
Quotation from Thomas:
There are a few other things to
consider. Right now the irq remapping
code modifies the x86_ops to redirect
the ioapic and the msi functions to the
remap implementations and at irq setup
time it also modifies the irq chip
callbacks. That's a completely
unreadable and undocumented maze
of indirections and substitutions.
All in all design from hell hacked into
submission ...
arch_setup_msi_irqs()
irq_remapping_setup_msi_irqs()
do_setup_msix_irqs()
irq_alloc_hwirq()
irq = __irq_alloc_descs()
arch_setup_hwirq(irq)
cfg = alloc_irq_cfg(irq)
__assign_irq_vector(irq, cfg)
irq_set_chip_data(irq, cfg)
msi_alloc_remapped_irq(irq)
intel_msi_alloc_irq(irq)
alloc_irte(irq)
setup_msi_irq(irq)
msi_compose_msg(irq)
assign_irq_vector()
x86_msi.compose_msi_msg()
compose_remapped_msi_msg()
intel_compose_msi_msg()
fiddle_with_irte()
write_msi_msg()
setup_remapped_irq()
irq_remap_modify_chip_defaults ()
x86_init.c
irq_remapping.c
irq core code
io_apic.c
irq_remapping.c
intel_irq_remapping.c
io_apic.c
<- Second invocation !?!?!
irq_remapping.c
intel_irq_remapping.c
io_apic.c
msi.c
irq_remapping.c
17
X86 Hardware vs Software Architecture
CPU Core
Local APIC
VT-d/IR
8259
Local
APIC
8259 enumerator &
default_legacy_pic
HTIRQ
DMAR
MSI
HPET
ioapic_chip
ht_irq_chip
msi_chip
lapic_chip
hpet_msi_ty
pe
dmar_msi_ty
pe
IOAPIC
Call
Call
Overwrite
Interrupt
remapping drivers
CPU vector management
18
X86 Interrupt Management Limitations
•Hard to reserve IRQ numbers for IOAPIC hotplug
•
IRQ number for IOAPIC is statically allocated at the low end during boot
•Software architecture doesn’t confirm to hardware architecture
•
Hard to understand and maintain
•Unnecessary dependency on IOAPIC for MSI/HPET/DMAR/HTIRQ
•
They really depends on Local APIC instead of IOAPIC
So improvements are needed here!
•Dynamically allocate IRQ number for IOAPIC on demand
•Make software architecture confirm to hardware architecture for easy maintenance
19
Agenda
X86 Hardware Interrupt Architecture
X86 Software Interrupt Architecture
Reconstruct with Hierarchy IRQ Domain
Q/A
20
Background: Details about Legacy PIC and IOAPIC
•X86 is not a perfect, or even a dirty, world for
backward compatibility
•
Always challenging to deal with legacy ISA IRQs
•
PIC mode, APIC mode, Virtual Wire mode
21
First Step: Use IRQ Domain to Manage IOAPIC
•A straight forward step to use IRQ domain to manage IOAPIC
•
One IRQ domain for each IOAPIC controller
•
Each IOAPIC pin is an interrupt source
•
IRQ domain callbacks to program IOAPIC entries
•Dynamical IRQ management
•
Allocate IRQ number on demand
•
Free IRQ number when not needed any more
•
RISK: some drivers assume IRQ == IOAPIC pin
•Three IRQ number allocation policies
•
LEGACY: ensure IRQ == pin number to support IOAPIC managing legacy ISA IRQs
•
STRICT: ensure IRQ == pin number to support platform without 8259 but assuming IRQ == pin number
•
DYNAMIC: dynamically allocate IRQ number for IOAPIC not managing legacy ISA IRQs
22
Second Step:How about This Hierarchy IRQ Domain?
Match Software Architecture with Hardware Architecture
CPU Core
•From hardware point of view, x86 has
a hierarchy interrupt topology with
multiple interrupt controllers
•So build a hierarchy software
architecture to match hardware
architecture
•Use irqdomain to match interrupt
controller
•Each interrupt controller driver only
talks to corresponding hardware and
its parent irqdomain
Local APIC
VT-d/IR
8259
8259
enume
rator
&
defau
lt_le
gacy_
pic
Local APIC
HTIRQ
DMAR
MSI
HPET
IOAPIC
CPU vector domain
remapping domain
HTIRQ
domain
DMAR
domain
IOAPIC
domain
HPET
domain
MSI
domain
23
How to Implement Hierarchy IRQ Domain?
•Design Rules: Keep existing IRQ domain
interfaces as is and add new interfaces to
support hierarchy IRQ domain
Root IRQ
domain
irq_data
IRQ domain2
irq_data
IRQ domain1
irq_data
embedded in
irq_desc
•Enhance to existing Data Structures
•
Add “parent” to struct irq_domain
•
Add “parent_data to struct irq_data
•
Add new callbacks to struct irq_domain_ops
•New hierarchy IRQ domain interfaces
•
irq_domain_alloc_irqs()
•
irq_domain_free_irqs()
•
irq_domain_activate_irq()
•
irq_domain_deactivate_irq()
irq_get_irq_data(irq)
24
How Hierarchy IRQ Domain Works?
•Interrupt Source Enumerator: create irqdomain for each interrupt controller
•
Enumerate interrupt controllers
•
Provide callbacks to manage those interrupt controllers
•Interrupt Source Enumerator: call irq_domain_alloc_irqs(hwirq) to create IRQ for interrupt sources
•Allocate IRQ number and resources to manage the interrupt source
•Set flow handler and irq_chip for new IRQs
•Interrupt Core: call irq_domain_alloc_irqs() to program interrupt controllers when irq_startup() is called
•Program interrupt controller to correctly delivery the interrupt
•Device Drivers: no changes needed
25
Step Three: Stacked IRQ Chip
--What’s the Problem?
OK, they belong
to IOAPIC
•Bad Encapsulation
•Hard to Maintain
static struct irq_chip ioapic_chip __read_mostly = {
.name
= "IO-APIC",
.irq_startup
= startup_ioapic_irq,
.irq_mask
= mask_ioapic_irq,
.irq_unmask
= unmask_ioapic_irq,
.irq_ack
= ack_apic_edge,
.irq_eoi
= ack_apic_level,
.irq_set_affinity
= native_ioapic_set_affinity,
.irq_retrigger
= ioapic_retrigger_irq,
};
Worse, they belong to
Local APIC, and may be
override by IRQ
remapping driver
Bad, it belongs to
Local APIC
26
Step Three: Stacked IRQ Chip
--The Solution
irq_data
irq_chip3
call
•We already have one struct irq_data for each interrupt
controller driver, so we could set dedicated struct
irq_chip into struct irq_data for each interrupt controller
irq_data
•Each interrupt controller driver(irq_chip) only manages
its hardware, and asks help from parent controller when
needed
irq_data
embedded in
irq_desc
irq_chip2
call
irq_chip1
static struct irq_chip ioapic_chip __read_mostly = {
.name
= "IO-APIC",
.irq_startup
= startup_ioapic_irq,
.irq_mask
= mask_ioapic_irq,
.irq_unmask
= unmask_ioapic_irq,
.irq_ack
= irq_chip_ack_parent,
.irq_eoi
= ioapic_ack_level,
.irq_set_affinity
= ioapic_set_affinity,
.irq_print_chip
= irq_remapping_print_chip,
};
27
Effects of Reconstruction
•All communication between interrupt controllers are through public IRQ domain interfaces
•Each interrupt controller drivers are almost independent of other interrupt controller drivers
•
There are still some exceptions, especially for interrupt remapping
•Minimized information shared among interrupt controller drivers
•Refined Interrupt Remapping Interfaces
•MSI is now independent of IOAPIC
struct irq_cfg {
struct irq_pin_list
*irq_2_pin;
cpumask_var_t
domain;
cpumask_var_t
old_domain;
u8
vector;
u8
move_in_progress : 1;
#ifdef CONFIG_IRQ_REMAP
u8
remapped : 1;
union {
struct irq_2_iommu irq_2_iommu;
struct irq_2_irte irq_2_irte;
};
#endif
};
struct irq_cfg {
unsigned int dest_apicid;
u8
vector;
};
28
Agenda
X86 Hardware Interrupt Architecture
X86 Software Interrupt Architecture
Reconstruct with Hierarchy IRQ Domain
Q/A
29
Current Development Status
--We are targeting v3.19 merging window
•The first part, using IRQ domain to manage IOAPIC, has been merged into v3.17.
•The second part, enabling IOAPIC hotplug, is targeting v3.19
•
https://lkml.org/lkml/2014/9/24/206
•The third part, preparing for enabling hierarchy irqdomain, is targeting v3.19
•
https://lkml.org/lkml/2014/9/11/101
•The fourth part, converting local APIC, remapping drivers, MSI, DMAR, HPET to hierarchy irqdomain, is
targeting v3.19
•
https://lkml.org/lkml/2014/9/26/538
•The fifth part, converting IOAPIC to hierarchy irqdomain and cleaning up code, is targeting v3.19 too
•
https://lkml.org/lkml/2014/9/26/611
30
Apply to Other Architectures
•ARM MediaTek is trying to adopt the new hierarchy IRQ domain mechanism
•
http://www.spinics.net/lists/arm-kernel/msg368808.html
•Sure it will be used by other architectures/interrupt controllers too
31
Long-term Goal – No IRQ Number Anymore
--Treat Each Interrupt Source as an Object
•IRQ number is a legacy concept to globally number all interrupt sources with system
•
Good for flat pin-based IRQ controller
•The Future – One Interrupt Object for Each Interrupt Source
•
Just pass on the interrupt object handler instead of an “unsigned int irq”
Device
Driver
Device
Driver
IRQ domain is
object for
interrupt controller
IRQ descriptor is
object for
interrupt source
Device
Driver
Device
Driver
Interrupt Management Core
Device Driver Interfaces
IRQ Domain
IRQ Descriptor
Interrupt
Source
Enumerator
Manager
Misc Helpers
Interrupt Controller
Interfaces
Interrupt
Delivery
Interrupt
Controller
Driver
32
33