Discussion:
[lm-sensors] w83793 related question
Carsten Aulbert
2009-04-14 09:27:46 UTC
Permalink
Hi all,

on Supermicro PDMSL-LN2+ boards there is a w83793 chip which seems to
work fine, however, it seems we are hitting quite a few error messages
and I don't know how to get rid off them cleanly

typical messages are:

[1953397.341265] w83793 0-002f: set bank to 2 failed, fall back to bank
0, read reg 0x215 error
[1975470.881264] w83793 0-002f: set bank to 2 failed, fall back to bank
0, read reg 0x210 error
[1996226.542514] w83793 0-002f: set bank to 0 failed, fall back to bank
2, read reg 0x8c error
[2007778.861264] w83793 0-002f: set bank to 0 failed, fall back to bank
2, read reg 0x88 error
[2030598.800018] w83793 0-002f: set bank to 0 failed, fall back to bank
2, read reg 0x78 error
[2047548.600033] w83793 0-002f: set bank to 0 failed, fall back to bank
2, read reg 0x78 error
[2061616.870018] i801_smbus 0000:00:1f.3: Transaction timeout
[2061616.890025] i801_smbus 0000:00:1f.3: Failed terminating the transaction
[2061616.890233] w83793 0-002f: set bank to 0 failed, fall back to bank
2, read reg 0xb2 error
[2061616.890600] i801_smbus 0000:00:1f.3: SMBus is busy, can't use it!
[2061616.890812] i801_smbus 0000:00:1f.3: SMBus is busy, can't use it!
[2061616.891014] w83793 0-002f: set bank to 0 failed, fall back to bank
2, read reg 0xc3 error
[2061616.891384] i801_smbus 0000:00:1f.3: SMBus is busy, can't use it!
[2061616.891585] w83793 0-002f: set bank to 0 failed, fall back to bank
2, read reg 0xc4 error
[2061616.891948] i801_smbus 0000:00:1f.3: SMBus is busy, can't use it!
[2061616.892146] w83793 0-002f: set bank to 0 failed, fall back to bank
2, read reg 0xc5 error
[2061616.892515] i801_smbus 0000:00:1f.3: SMBus is busy, can't use it!
[2061616.892716] w83793 0-002f: set bank to 0 failed, fall back to bank
2, read reg 0x52 error
[2061616.893082] i801_smbus 0000:00:1f.3: SMBus is busy, can't use it!
[2061616.893281] w83793 0-002f: set bank to 0 failed, fall back to bank
2, read reg 0x53 error
[2061616.893653] i801_smbus 0000:00:1f.3: SMBus is busy, can't use it!
[2061616.893858] w83793 0-002f: set bank to 0 failed, fall back to bank
2, read reg 0x54 error
[2061616.894228] i801_smbus 0000:00:1f.3: SMBus is busy, can't use it!
[2061616.894427] w83793 0-002f: set bank to 0 failed, fall back to bank
2, read reg 0x55 error
[2061616.894790] i801_smbus 0000:00:1f.3: SMBus is busy, can't use it!
[2061616.894988] w83793 0-002f: set bank to 0 failed, fall back to bank
2, read reg 0x56 error
[2061616.895356] i801_smbus 0000:00:1f.3: SMBus is busy, can't use it!
[2061616.895555] w83793 0-002f: set bank to 0 failed, fall back to bank
2, read reg 0x57 error
[2091449.930030] w83793 0-002f: set bank to 0 failed, fall back to bank
2, read reg 0x8c error
[2122602.861265] w83793 0-002f: set bank to 2 failed, fall back to bank
0, read reg 0x210 error

Google only shows me lots of references to the Linux kernel, but that
does not really help me much right now.

System is running pretty recent 2.6.27 kernel and lm-sensors 3.0.2.

Any hint how i can get rid off these?

Cheers and thanks a lot in advance

Carsten
Rudolf Marek
2009-04-19 09:11:40 UTC
Permalink
Hi,

Seems the w83793 chip itself is fine. However someone else want to talk to your
i2c controller. Try unloading the acpi thermal module. Please send here

cat /proc/acpi/dsdt > dsdt.bin
iasl -d dsdt.bin

Result should be dsdt.dsl (or asl)

Thanks,
Rudolf
Carsten Aulbert
2009-04-20 10:21:32 UTC
Permalink
Hi Rudolf,
Post by Rudolf Marek
Seems the w83793 chip itself is fine. However someone else want to talk
to your i2c controller. Try unloading the acpi thermal module. Please
send here
Module is not loaded (nor compiled directly into the kernel)
Post by Rudolf Marek
cat /proc/acpi/dsdt > dsdt.bin
iasl -d dsdt.bin
Result should be dsdt.dsl (or asl)
I've attached thee output.

Cheers

Carsten
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dsdt.dsl.gz
Type: application/gzip
Size: 11703 bytes
Desc: not available
URL: <http://lists.lm-sensors.org/pipermail/lm-sensors/attachments/20090420/cf899985/attachment.bin>
Rudolf Marek
2009-04-20 19:41:15 UTC
Permalink
Hi
The problem is elsewhere. ACPI code is OK. Maybe the SMM mode is into a play, or
i2c hw problem might be too.

Can you recompile the kernel with i2c debugging on? (for i2c-i801) module?

Does the errors occcur right after start, or they are all of sudden?

if you unload the w83793 driver but leave i2c-i801 driver loaded and do some
script like:

while true ; do
i2cdump -y 0 0x2f
sleep 5
done

You will get errors too?

Jean, maybe we should move this to i2c ml?

Rudolf
Carsten Aulbert
2009-04-20 19:44:15 UTC
Permalink
Hi,
Post by Rudolf Marek
Can you recompile the kernel with i2c debugging on? (for i2c-i801) module?
Will do that on a machine.
Post by Rudolf Marek
Does the errors occcur right after start, or they are all of sudden?
Good question, I've 1680 of these boxes here and I just get the strpped
down log files ;)
Post by Rudolf Marek
if you unload the w83793 driver but leave i2c-i801 driver loaded and do
while true ; do
i2cdump -y 0 0x2f
sleep 5
done
You will get errors too?
I'll have a look.
Post by Rudolf Marek
Jean, maybe we should move this to i2c ml?
tell me where to move to :)

Cheers

Carsten
Carsten Aulbert
2009-04-20 20:40:51 UTC
Permalink
Post by Rudolf Marek
Hi
The problem is elsewhere. ACPI code is OK. Maybe the SMM mode is into a
play, or i2c hw problem might be too.
Can you recompile the kernel with i2c debugging on? (for i2c-i801) module?
Does the errors occcur right after start, or they are all of sudden?
if you unload the w83793 driver but leave i2c-i801 driver loaded and do
while true ; do
i2cdump -y 0 0x2f
sleep 5
done
Small problem:

I've enabled debugging in the kernel
n0550:~# zgrep I2C /proc/config.gz
CONFIG_I2C=m
CONFIG_I2C_BOARDINFO=y
CONFIG_I2C_CHARDEV=m
CONFIG_I2C_HELPER_AUTO=y
# I2C Hardware Bus support
# CONFIG_I2C_ALI1535 is not set
# CONFIG_I2C_ALI1563 is not set
# CONFIG_I2C_ALI15X3 is not set
# CONFIG_I2C_AMD756 is not set
# CONFIG_I2C_AMD8111 is not set
CONFIG_I2C_I801=m
CONFIG_I2C_ISCH=m
CONFIG_I2C_PIIX4=m
CONFIG_I2C_NFORCE2=m
# CONFIG_I2C_NFORCE2_S4985 is not set
# CONFIG_I2C_SIS5595 is not set
# CONFIG_I2C_SIS630 is not set
# CONFIG_I2C_SIS96X is not set
# CONFIG_I2C_VIA is not set
# CONFIG_I2C_VIAPRO is not set
# I2C system bus drivers (mostly embedded / system-on-chip)
# CONFIG_I2C_OCORES is not set
# CONFIG_I2C_SIMTEC is not set
# External I2C/SMBus adapter drivers
# CONFIG_I2C_PARPORT_LIGHT is not set
# CONFIG_I2C_TAOS_EVM is not set
# CONFIG_I2C_TINY_USB is not set
# Graphics adapter I2C/DDC channel drivers
# CONFIG_I2C_VOODOO3 is not set
# Other I2C/SMBus bus drivers
# CONFIG_I2C_PCA_PLATFORM is not set
# CONFIG_I2C_STUB is not set
# Miscellaneous I2C Chip support
CONFIG_I2C_DEBUG_CORE=y
CONFIG_I2C_DEBUG_ALGO=y
CONFIG_I2C_DEBUG_BUS=y
CONFIG_I2C_DEBUG_CHIP=y
# I2C RTC drivers

and got i2cdump installed, created the device entry

and when I do a single call
i2cdump -y 0 0x2f

the output is
No size specified (using byte-data access)
0 1 2 3 4 5 6 7 8 9 a b c d e f 0123456789abcdef
00: 00 00 00 00 00 bd bb b1 bb 64 64 5e c8 a3 7b 13 .....????dd^??{?
10: a0 b9 97 ff 20 70 d1 7a cf cc c8 3e 1c 90 80 80 ???. p?z???>????
20: 20 1d 0b 00 77 00 73 00 77 0f ff 0f ff 0f ff 0f ??.w.s.w?.?.?.?
30: ff 0f ff 0f ff 0f ff 00 00 00 00 ff ff ff ff ff .?.?.?..........
40: 09 00 00 00 40 00 f7 ef e3 7f 3f 00 00 e0 0f 00 ?... at .?????..??.
50: 06 1e 01 00 00 00 00 00 28 01 00 00 1f 00 55 03 ???.....(?..?.U?
60: b9 73 c5 b2 9d 8e ff ff 23 20 2b 1c 7c 65 d9 c4 ?s????..# +?|e??
70: 83 77 db c6 db c6 e5 bb 50 4b 55 50 3c 37 af af ?w??????PKUP<7??
80: 3c 37 55 50 3c 37 55 50 50 4b 55 50 32 2d 55 50 <7UP<7UPPKUP2-UP
90: 07 68 07 68 07 68 07 68 07 68 07 68 07 68 07 68 ?h?h?h?h?h?h?h?h
a0: 07 68 07 68 ff ff ff ff f7 f6 f6 f6 00 00 ff ff ?h?h....????....
b0: 00 00 3f 3f 3f 3f 3f 3f 3f 3f 3f 89 89 89 89 89 ..??????????????
c0: 89 89 89 02 03 7f ff 00 00 ff ff ff ff ff ff ff ??????..........
d0: 00 46 46 46 XX 00 f0 ff 80 01 80 01 80 01 80 01 .FFFX.?.????????
e0: bb c0 82 ff 80 2a fb 13 00 00 88 00 ff ff ff ff ???.?*??..?.....
f0: 00 00 00 00 00 00 60 80 1b 00 ff 00 00 10 00 00 ......`??....?..

more interestingly, the sysog gets flooded with messages:
Apr 20 22:39:13 n0550 kernel: [ 617.540836] i2c-adapter i2c-0: ioctl,
cmd=0x705, arg=0x7fffcbf214d8
Apr 20 22:39:13 n0550 kernel: [ 617.540873] i2c-adapter i2c-0: ioctl,
cmd=0x703, arg=0x2f
Apr 20 22:39:13 n0550 kernel: [ 617.540912] i2c-adapter i2c-0: ioctl,
cmd=0x720, arg=0x7fffcbf21480
Apr 20 22:39:13 n0550 kernel: [ 617.563768] i2c-adapter i2c-0: ioctl,
cmd=0x720, arg=0x7fffcbf21480
Apr 20 22:39:13 n0550 kernel: [ 617.583989] i2c-adapter i2c-0: ioctl,
cmd=0x720, arg=0x7fffcbf21480
Apr 20 22:39:13 n0550 kernel: [ 617.603769] i2c-adapter i2c-0: ioctl,
cmd=0x720, arg=0x7fffcbf21480
Apr 20 22:39:13 n0550 kernel: [ 617.623768] i2c-adapter i2c-0: ioctl,
cmd=0x720, arg=0x7fffcbf21480
[...]
Apr 20 22:39:17 n0550 kernel: [ 621.743765] i2c-adapter i2c-0: ioctl,
cmd=0x720, arg=0x7fffcbf21480
Apr 20 22:39:17 n0550 kernel: [ 621.763766] i2c-adapter i2c-0: ioctl,
cmd=0x720, arg=0x7fffcbf21480
Apr 20 22:39:17 n0550 kernel: [ 621.783765] i2c-adapter i2c-0: ioctl,
cmd=0x720, arg=0x7fffcbf21480
Apr 20 22:39:17 n0550 kernel: [ 621.803757] i801_smbus 0000:00:1f.3:
Lost arbitration
Apr 20 22:39:17 n0550 kernel: [ 621.803794] i2c-adapter i2c-0: ioctl,
cmd=0x720, arg=0x7fffcbf21480
Apr 20 22:39:18 n0550 kernel: [ 622.803767] i2c-adapter i2c-0: ioctl,
cmd=0x720, arg=0x7fffcbf21480
Apr 20 22:39:18 n0550 kernel: [ 622.823765] i2c-adapter i2c-0: ioctl,
cmd=0x720, arg=0x7fffcbf21480
Apr 20 22:39:18 n0550 kernel: [ 622.843768] i2c-adapter i2c-0: ioctl,
cmd=0x720, arg=0x7fffcbf21480
[...]
Apr 20 22:39:19 n0550 kernel: [ 623.563766] i2c-adapter i2c-0: ioctl,
cmd=0x720, arg=0x7fffcbf21480
Apr 20 22:39:19 n0550 kernel: [ 623.583765] i2c-adapter i2c-0: ioctl,
cmd=0x720, arg=0x7fffcbf21480
Apr 20 22:39:19 n0550 kernel: [ 623.603765] i2c-adapter i2c-0: ioctl,
cmd=0x720, arg=0x7fffcbf21480
Apr 20 22:39:19 n0550 kernel: [ 623.623765] i2c-adapter i2c-0: ioctl,
cmd=0x720, arg=0x7fffcbf21480


Does this help a bit?

Cheers

Carsten
Rudolf Marek
2009-04-22 20:16:19 UTC
Permalink
This post might be inappropriate. Click to display it.
Carsten Aulbert
2009-04-22 20:27:06 UTC
Permalink
Hi
Post by Rudolf Marek
1) the board is equipped with some kind of remote management conroller,
which is also doing transactions on I2C
Is there some?
Yes there is an IPMI card on these boards. I'm not sure if they use I2C
but I think that would make sense. We are not actively querying values
via the IPMI card, however it does it by itself.
Post by Rudolf Marek
I would still need to know how often this problem actually happens.
It seems bursty, e.g. like many times within a second and then its
silent for some time, e.g. one node has this disribution:

n0047:~# dmesg | egrep '(i801_smbus|w83793)' | cut -d. -f 1 | \
tr -d '[' | sort -n | uniq -c
1 8
1 25241
1 26501
1 53262
1 123160
1 132928
196 137025
1 144900
1 156875
1 198766
1 229545
1 266396
1 280566
1 285294
1 286873
1 298844
1 312390
1 318375
1 337578
12 367501
1 380728
1 422294
1 433634
1 444343

second column shows seconds since last boot.

Does this help, or shall I do it on every node and think about a way to
show this in 2d?

Cheers

Carsten
Jean Delvare
2009-04-22 20:57:03 UTC
Permalink
Post by Rudolf Marek
Hi,
I have been thinking what it possibly could be. It seems like there might be two
1) the board is equipped with some kind of remote management conroller, which is
also doing transactions on I2C
Is there some?
2) the SMM mode is doing its own i2c transactions, interfering with the driver.
The problem is that SMM iss transparent to OS.The only thing could possibly done
is a BIOS change, or one could add some code to acquire global ACPI lock. It
would require to modify the driver and add there those functions.
What makes you think there is a relation between SMM and ACPI?
Post by Rudolf Marek
3) Make a driver more bullet proof. Maybe it would be possible somehow to force
the driver to have a longer timeouts, more retries etc.
This is pointless. If another entity is accessing the chip without
proper locking, no amount of timeouts or retries will help. In this
scenario, reported errors are in fact the best thing than can happen.
The worst case is silent misbehavior.
Post by Rudolf Marek
I would still need to know how often this problem actually happens.
--
Jean Delvare
Rudolf Marek
2009-04-23 06:57:22 UTC
Permalink
Post by Jean Delvare
What makes you think there is a relation between SMM and ACPI?
There should be. See ACPI specs and Global lock chapter. It should be a hw lock
between ACPI and SMM code touching the hw. Maybe we are lucky and something like
this is really implemented.

But since the 1) is true I would suspect that we just see some other transactions.
Post by Jean Delvare
Post by Rudolf Marek
3) Make a driver more bullet proof. Maybe it would be possible somehow to force
the driver to have a longer timeouts, more retries etc.
This is pointless. If another entity is accessing the chip without
proper locking, no amount of timeouts or retries will help. In this
scenario, reported errors are in fact the best thing than can happen.
The worst case is silent misbehavior.
It is not pointless if this entity is just observed. I mean it is not using the
i2c hw like our driver but it is using the bus itself.

Thanks,
Rudolf

Loading...