So lets dtrace! :)
# mpstat 1
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 3 0 0 419 218 191 0 0 24 0 25 0 0 0 100
1 0 0 0 2 0 0 0 0 49 0 0 0 100 0 0
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 10 0 0 649 309 347 0 0 25 0 174 0 2 0 98
1 0 0 0 2 0 0 0 0 50 0 0 0 100 0 0
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 2 0 0 651 313 366 11 0 22 0 193 8 2 0 90
1 0 0 0 2 0 0 0 0 50 0 0 0 100 0 0
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 0 0 0 870 453 563 0 0 26 0 174 0 2 0 98
1 0 0 0 2 0 0 0 0 49 0 0 0 100 0 0
^C
#
All we can see on cpu 1 is that there are some interrupts but no csw/icsw/syscl/migr at all.
It looks like it's stuck entirely in kernel.
To confirm that nothing from user space is actually calling some system calls:
# dtrace -n sched:::on-cpu'/cpu == 1/{self->t=timestamp;}' \
-n sched:::off-cpu'/self->t/{@[execname,pid]=sum(timestamp-self->t);self->t=0;}' \
-n tick-2s'{printa(@);}'
dtrace: description 'sched:::on-cpu' matched 3 probes
dtrace: description 'sched:::off-cpu' matched 3 probes
dtrace: description 'tick-2s' matched 1 probe
CPU ID FUNCTION:NAME
0 42146 :tick-2s
0 42146 :tick-2s
^C
#
So nothing was put by scheduler on cpu 1.
Lets check what's running in kernel on cpu 0 first (for comparison with cpu 1 later):
# dtrace -n profile-101'/cpu == 0/{@[stack()]=count();}' -n tick-5s'{printa(@);exit(0);}'
dtrace: description 'profile-101' matched 1 probe
dtrace: description 'tick-5s' matched 1 probe
CPU ID FUNCTION:NAME
0 42148 :tick-5s
unix`ddi_get16+0x10
genunix`callout_execute+0xd6
genunix`taskq_thread+0xbc
unix`thread_start+0x8
1
unix`ddi_io_put16+0x10
genunix`callout_execute+0xd6
genunix`taskq_thread+0xbc
unix`thread_start+0x8
1
unix`_sys_sysenter_post_swapgs+0xab
1
2
unix`cpu_halt+0xb2
unix`idle+0x89
unix`thread_start+0x8
496
#
So in 99% it's being IDLE.
Now lets do the same for CPU 1.
# dtrace -n profile-101'/cpu == 1/{@[stack()]=count();}' -n tick-5s'{printa(@);exit(0);}'
dtrace: description 'profile-101' matched 1 probe
dtrace: description 'tick-5s' matched 1 probe
CPU ID FUNCTION:NAME
0 42148 :tick-5s
[...]
unix`bcopy+0x14
usba`usba_vlog+0xc7
usba`usb_dprintf2+0xa4
uhci`uhci_intr+0x14c
unix`av_dispatch_autovect+0x78
unix`intr_thread+0x50
7
unix`ddi_get16+0x10
uhci`uhci_isoc_update_sw_frame_number+0x9
uhci`uhci_intr+0xce
unix`av_dispatch_autovect+0x78
unix`intr_thread+0x50
78
unix`ddi_io_put16+0x10
unix`av_dispatch_autovect+0x78
unix`intr_thread+0x50
127
unix`ddi_get16+0x10
unix`av_dispatch_autovect+0x78
unix`intr_thread+0x50
190
#
Above output confirms we're stuck in some interrupt, probably related to usb.
[checking cvs.opensolaris.org for usba_vlog function]
# dtrace -n fbt:usba:usba_vlog:entry'{@[stringof arg3]=count();}' -n tick-5s'{exit(0);}'
dtrace: description 'fbt:usba:usba_vlog:entry' matched 1 probe
dtrace: description 'tick-5s' matched 1 probe
CPU ID FUNCTION:NAME
0 42148 :tick-5s
uhci_intr: Controller halted 720725
#
http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/io/usb/hcd/uhci/uhci.c#893
887 /*
888 * This should not occur. It occurs only if a HC controller
889 * experiences internal problem.
890 */
891 if (intr_status & USBSTS_REG_HC_HALTED) {
892 USB_DPRINTF_L2(PRINT_MASK_INTR, uhcip->uhci_log_hdl,
893 "uhci_intr: Controller halted");
894 cmd_reg = Get_OpReg16(USBCMD);
895 Set_OpReg16(USBCMD, (cmd_reg | USBCMD_REG_HC_RUN));
896 }
I'm not sure how bad it is but it doesn't look good.
Lets look around for USB.
# cfgadm -al
Ap_Id Type Receptacle Occupant Condition
usb0/1 unknown empty unconfigured ok
usb0/2 unknown empty unconfigured ok
usb1/1 unknown empty unconfigured ok
usb1/2 unknown empty unconfigured ok
usb2/1 unknown empty unconfigured ok
usb2/2 unknown empty unconfigured ok
usb3/1 unknown empty unconfigured ok
usb3/2 unknown empty unconfigured ok
usb4/1 unknown empty unconfigured ok
usb4/2 unknown empty unconfigured ok
usb4/3 unknown empty unconfigured ok
usb4/4 unknown empty unconfigured ok
usb4/5 unknown empty unconfigured ok
usb4/6 unknown empty unconfigured ok
usb4/7 unknown empty unconfigured ok
usb4/8 unknown empty unconfigured ok
# mpstat 1
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 3 0 0 419 218 191 0 0 24 0 25 0 0 0 100
1 0 0 0 2 0 0 0 0 49 0 0 0 100 0 0
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 10 0 0 666 311 363 3 0 17 0 176 0 2 0 98
1 0 0 0 2 0 0 0 0 50 0 0 0 100 0 0
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 0 0 0 678 315 360 0 0 23 0 159 0 0 0 100
1 0 0 0 2 0 0 0 0 50 0 0 0 100 0 0
^C
Didn't help.
# modinfo |grep -i usb
83 fffffffff01c1000 14000 56 1 ehci (USB EHCI Driver 1.14)
84 fffffffff01d5000 28dd8 - 1 usba (USBA: USB Architecture 2.0 1.59)
85 fffffffff01fd000 c6d8 58 1 uhci (USB UHCI Controller Driver 1.47)
# devfsadm -vC
# mpstat 1
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 3 0 0 419 218 191 0 0 24 0 25 0 0 0 100
1 0 0 0 2 0 0 0 0 49 0 0 0 100 0 0
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 10 0 0 664 315 363 0 0 23 0 181 0 0 0 100
1 0 0 0 2 0 0 0 0 50 0 0 0 100 0 0
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 0 0 0 947 501 694 1 0 26 0 279 0 2 0 98
1 0 0 0 2 0 0 0 0 51 0 0 0 100 0 0
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 0 0 0 650 307 356 0 0 25 0 175 1 1 0 98
1 0 0 0 2 0 0 0 0 50 0 0 0 100 0 0
^C
Still didn't help.
# modunload -i 84
can't unload the module: Device busy
# modunload -i 85
can't unload the module: Device busy
# modunload -i 83
can't unload the module: Device busy
# modunload -i 84
can't unload the module: Device busy
# modunload -i 85
can't unload the module: Device busy
# cfgadm -al
cfgadm: Configuration administration not supported
# modinfo |grep -i usb
83 fffffffff01c1000 14000 56 1 ehci (USB EHCI Driver 1.14)
84 fffffffff01d5000 28dd8 - 1 usba (USBA: USB Architecture 2.0 1.59)
85 fffffffff01fd000 c6d8 58 1 uhci (USB UHCI Controller Driver 1.47)
#
# mpstat 1
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 3 0 0 419 218 191 0 0 24 0 25 0 0 0 100
1 0 0 0 2 0 0 0 0 49 0 0 0 100 0 0
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 0 0 0 737 453 337 1 40 5 0 187 0 1 0 99
1 10 0 0 164 0 323 1 36 6 0 110 0 1 0 99
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 0 0 0 526 310 221 0 26 0 0 102 0 0 0 100
1 0 0 0 92 0 149 0 27 0 0 64 0 0 0 100
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 0 0 1 520 308 203 0 21 0 0 93 1 0 0 99
1 0 0 0 100 0 168 1 29 3 0 90 0 1 0 99
^C
# dtrace -n fbt:usba:usba_vlog:entry'{@[stringof arg3]=count();}' -n tick-5s'{exit(0);}'
dtrace: description 'fbt:usba:usba_vlog:entry' matched 1 probe
dtrace: description 'tick-5s' matched 1 probe
CPU ID FUNCTION:NAME
1 42148 :tick-5s
#
That's much better - both CPUs are IDLE now.
Looks like some problem with USB device/driver.
ps. I wonder what intrstat(1M) would show - "unfortunately" the problem gone...
No comments:
Post a Comment