sata_via

All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed

* sata_via
@ 2008-04-04 20:51 Rich West
  2008-04-04 22:04 ` sata_via Jeff Garzik
  0 siblings, 1 reply; 13+ messages in thread
From: Rich West @ 2008-04-04 20:51 UTC (permalink / raw
  To: linux-kernel

On my mythtv backend system, the recordings volume tends to get pounded
rather hard (up to 5 recordings (some HD) at once with multiple frontend
systems reading from that same volume).  I recently (4 months ago)
upgraded the system to a motherboard that happened to have the VIA
chipset on it.

Since that time, I have had some bizarre problems with that volume. 
After a seemingly random amount of time, the kernel would report an
error with the volume and put it in read-only mode.  However, it would
not really be in read-only mode, but it would be completely
inaccessible.  Unmounting the volume would be successful, but
re-mounting the volume would fail.

I've replaced the drive (with an identical one), tested memory, changed
filesystems (it was LVM + ext3, then just ext3) and the problem persists.

Running 2.6.24.4-64 (Fedora 8).

A larger snippet from the messages log is (dmesg gets cleared after reboot):
Apr  3 16:47:27 mythtv1 kernel: ata4.00: exception Emask 0x0 SAct 0x0
SErr 0x0 action 0x2 frozen
Apr  3 16:47:27 mythtv1 kernel: ata4.00: cmd
c8/00:00:77:31:21/00:00:00:00:00/e1 tag 0 dma 131072 in
Apr  3 16:47:27 mythtv1 kernel:          res
40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 16:47:27 mythtv1 kernel: ata4.00: status: { DRDY }
Apr  3 16:47:27 mythtv1 kernel: ata4: soft resetting link
Apr  3 16:47:57 mythtv1 kernel: ata4.00: qc timeout (cmd 0x27)
Apr  3 16:47:57 mythtv1 kernel: ata4.00: failed to read native max
address (err_mask=0x4)
Apr  3 16:47:57 mythtv1 kernel: ata4.00: HPA support seems broken, will
skip HPA handling
Apr  3 16:47:57 mythtv1 kernel: ata4.00: revalidation failed (errno=-5)
Apr  3 16:47:57 mythtv1 kernel: ata4: failed to recover some devices,
retrying in 5 secs
Apr  3 16:48:02 mythtv1 kernel: ata4: soft resetting link
Apr  3 16:48:02 mythtv1 kernel: ata4.00: configured for UDMA/133
Apr  3 16:48:02 mythtv1 kernel: ata4: EH complete
Apr  3 16:49:02 mythtv1 kernel: ata4.00: exception Emask 0x0 SAct 0x0
SErr 0x0 action 0x2 frozen
Apr  3 16:49:02 mythtv1 kernel: ata4.00: cmd
c8/00:00:77:31:21/00:00:00:00:00/e1 tag 0 dma 131072 in
Apr  3 16:49:02 mythtv1 kernel:          res
40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 16:49:02 mythtv1 kernel: ata4.00: status: { DRDY }
Apr  3 16:49:02 mythtv1 kernel: ata4: soft resetting link
Apr  3 16:49:03 mythtv1 kernel: ata4.00: configured for UDMA/133
Apr  3 16:49:03 mythtv1 kernel: ata4: EH complete

It is almost as if I am hitting some bug that is causing the drive to
fall off, but I really don't know where else to look or where else to
turn...

I'm tempted to just go back to using a PATA drive (smaller.  :(  ) to
avoid the problem.  I'm just at a loss as to how it can actually be solved.

-Rich

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: sata_via
  2008-04-04 20:51 sata_via Rich West
@ 2008-04-04 22:04 ` Jeff Garzik
  2008-04-04 23:44   ` sata_via Rich West
  0 siblings, 1 reply; 13+ messages in thread
From: Jeff Garzik @ 2008-04-04 22:04 UTC (permalink / raw
  To: Rich West; +Cc: linux-kernel, Linux IDE mailing list

Rich West wrote:
> On my mythtv backend system, the recordings volume tends to get pounded
> rather hard (up to 5 recordings (some HD) at once with multiple frontend
> systems reading from that same volume).  I recently (4 months ago)
> upgraded the system to a motherboard that happened to have the VIA
> chipset on it.
> 
> Since that time, I have had some bizarre problems with that volume. 
> After a seemingly random amount of time, the kernel would report an
> error with the volume and put it in read-only mode.  However, it would
> not really be in read-only mode, but it would be completely
> inaccessible.  Unmounting the volume would be successful, but
> re-mounting the volume would fail.
> 
> I've replaced the drive (with an identical one), tested memory, changed
> filesystems (it was LVM + ext3, then just ext3) and the problem persists.
> 
> Running 2.6.24.4-64 (Fedora 8).
> 
> A larger snippet from the messages log is (dmesg gets cleared after 
> reboot):
> Apr  3 16:47:27 mythtv1 kernel: ata4.00: exception Emask 0x0 SAct 0x0
> SErr 0x0 action 0x2 frozen
> Apr  3 16:47:27 mythtv1 kernel: ata4.00: cmd
> c8/00:00:77:31:21/00:00:00:00:00/e1 tag 0 dma 131072 in
> Apr  3 16:47:27 mythtv1 kernel:          res
> 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
> Apr  3 16:47:27 mythtv1 kernel: ata4.00: status: { DRDY }
> Apr  3 16:47:27 mythtv1 kernel: ata4: soft resetting link
> Apr  3 16:47:57 mythtv1 kernel: ata4.00: qc timeout (cmd 0x27)
> Apr  3 16:47:57 mythtv1 kernel: ata4.00: failed to read native max
> address (err_mask=0x4)
> Apr  3 16:47:57 mythtv1 kernel: ata4.00: HPA support seems broken, will
> skip HPA handling
> Apr  3 16:47:57 mythtv1 kernel: ata4.00: revalidation failed (errno=-5)
> Apr  3 16:47:57 mythtv1 kernel: ata4: failed to recover some devices,
> retrying in 5 secs
> Apr  3 16:48:02 mythtv1 kernel: ata4: soft resetting link
> Apr  3 16:48:02 mythtv1 kernel: ata4.00: configured for UDMA/133
> Apr  3 16:48:02 mythtv1 kernel: ata4: EH complete
> Apr  3 16:49:02 mythtv1 kernel: ata4.00: exception Emask 0x0 SAct 0x0
> SErr 0x0 action 0x2 frozen
> Apr  3 16:49:02 mythtv1 kernel: ata4.00: cmd
> c8/00:00:77:31:21/00:00:00:00:00/e1 tag 0 dma 131072 in
> Apr  3 16:49:02 mythtv1 kernel:          res
> 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
> Apr  3 16:49:02 mythtv1 kernel: ata4.00: status: { DRDY }
> Apr  3 16:49:02 mythtv1 kernel: ata4: soft resetting link
> Apr  3 16:49:03 mythtv1 kernel: ata4.00: configured for UDMA/133
> Apr  3 16:49:03 mythtv1 kernel: ata4: EH complete
> 
> It is almost as if I am hitting some bug that is causing the drive to
> fall off, but I really don't know where else to look or where else to
> turn...
> 
> I'm tempted to just go back to using a PATA drive (smaller.  :(  ) to
> avoid the problem.  I'm just at a loss as to how it can actually be solved.

This timeout/DRDY message has been a common one recently.  Some of the 
issues causing this may be resolved in 2.6.25-rc, can you try that?

Also, if you could build and test some older kernels to see when this 
behavior first appeared, that would be quite helpful.

Overall, a timeout _might_ be a problem with libata (the kernel SATA 
drivers), or it _might_ be a problem with your system's interrupt 
delivery (sometimes an ACPI or BIOS problem).  Try booting with 'noapic' 
or 'acpi=off'.

	Jeff





^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: sata_via
  2008-04-04 22:04 ` sata_via Jeff Garzik
@ 2008-04-04 23:44   ` Rich West
  2008-04-05  7:15     ` sata_via Wander Winkelhorst
  0 siblings, 1 reply; 13+ messages in thread
From: Rich West @ 2008-04-04 23:44 UTC (permalink / raw
  To: Jeff Garzik; +Cc: linux-kernel, Linux IDE mailing list



Jeff Garzik wrote:
> Rich West wrote:
>> On my mythtv backend system, the recordings volume tends to get pounded
>> rather hard (up to 5 recordings (some HD) at once with multiple frontend
>> systems reading from that same volume).  I recently (4 months ago)
>> upgraded the system to a motherboard that happened to have the VIA
>> chipset on it.
>>
>> Since that time, I have had some bizarre problems with that volume. 
>> After a seemingly random amount of time, the kernel would report an
>> error with the volume and put it in read-only mode.  However, it would
>> not really be in read-only mode, but it would be completely
>> inaccessible.  Unmounting the volume would be successful, but
>> re-mounting the volume would fail.
>>
>> I've replaced the drive (with an identical one), tested memory, changed
>> filesystems (it was LVM + ext3, then just ext3) and the problem 
>> persists.
>>
>> Running 2.6.24.4-64 (Fedora 8).
>>
>> A larger snippet from the messages log is (dmesg gets cleared after 
>> reboot):
>> Apr  3 16:47:27 mythtv1 kernel: ata4.00: exception Emask 0x0 SAct 0x0
>> SErr 0x0 action 0x2 frozen
>> Apr  3 16:47:27 mythtv1 kernel: ata4.00: cmd
>> c8/00:00:77:31:21/00:00:00:00:00/e1 tag 0 dma 131072 in
>> Apr  3 16:47:27 mythtv1 kernel:          res
>> 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
>> Apr  3 16:47:27 mythtv1 kernel: ata4.00: status: { DRDY }
>> Apr  3 16:47:27 mythtv1 kernel: ata4: soft resetting link
>> Apr  3 16:47:57 mythtv1 kernel: ata4.00: qc timeout (cmd 0x27)
>> Apr  3 16:47:57 mythtv1 kernel: ata4.00: failed to read native max
>> address (err_mask=0x4)
>> Apr  3 16:47:57 mythtv1 kernel: ata4.00: HPA support seems broken, will
>> skip HPA handling
>> Apr  3 16:47:57 mythtv1 kernel: ata4.00: revalidation failed (errno=-5)
>> Apr  3 16:47:57 mythtv1 kernel: ata4: failed to recover some devices,
>> retrying in 5 secs
>> Apr  3 16:48:02 mythtv1 kernel: ata4: soft resetting link
>> Apr  3 16:48:02 mythtv1 kernel: ata4.00: configured for UDMA/133
>> Apr  3 16:48:02 mythtv1 kernel: ata4: EH complete
>> Apr  3 16:49:02 mythtv1 kernel: ata4.00: exception Emask 0x0 SAct 0x0
>> SErr 0x0 action 0x2 frozen
>> Apr  3 16:49:02 mythtv1 kernel: ata4.00: cmd
>> c8/00:00:77:31:21/00:00:00:00:00/e1 tag 0 dma 131072 in
>> Apr  3 16:49:02 mythtv1 kernel:          res
>> 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
>> Apr  3 16:49:02 mythtv1 kernel: ata4.00: status: { DRDY }
>> Apr  3 16:49:02 mythtv1 kernel: ata4: soft resetting link
>> Apr  3 16:49:03 mythtv1 kernel: ata4.00: configured for UDMA/133
>> Apr  3 16:49:03 mythtv1 kernel: ata4: EH complete
>>
>> It is almost as if I am hitting some bug that is causing the drive to
>> fall off, but I really don't know where else to look or where else to
>> turn...
>>
>> I'm tempted to just go back to using a PATA drive (smaller.  :(  ) to
>> avoid the problem.  I'm just at a loss as to how it can actually be 
>> solved.
>
> This timeout/DRDY message has been a common one recently.  Some of the 
> issues causing this may be resolved in 2.6.25-rc, can you try that?
>
> Also, if you could build and test some older kernels to see when this 
> behavior first appeared, that would be quite helpful.
>
> Overall, a timeout _might_ be a problem with libata (the kernel SATA 
> drivers), or it _might_ be a problem with your system's interrupt 
> delivery (sometimes an ACPI or BIOS problem).  Try booting with 
> 'noapic' or 'acpi=off'.
>

Thanks for the quick response.

I know this problem was happening with all of the Fedora 7 supplied 
kernels (from initial release up until about a week ago) and has 
happened with each of the Fedora 8 supplied kernels.  I'll try rolling 
2.6.25-rc to see if the problem resurfaces.  Unfortunately, I don't know 
what collision of events causes this problem to erupt, but it usually 
happens within 7 days of a reboot (some times within hours of a reboot).

I'll give noapic a try, but (dumb question) what does acpi=off buy?

-Rich

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: sata_via
  2008-04-04 23:44   ` sata_via Rich West
@ 2008-04-05  7:15     ` Wander Winkelhorst
       [not found]       ` <47F7982E.5080701@wesmo.com>
  0 siblings, 1 reply; 13+ messages in thread
From: Wander Winkelhorst @ 2008-04-05  7:15 UTC (permalink / raw
  To: Rich West; +Cc: Jeff Garzik, linux-kernel, Linux IDE mailing list

On Sat, Apr 5, 2008 at 1:44 AM, Rich West <Rich.West@wesmo.com> wrote:

>  I'll give noapic a try, but (dumb question) what does acpi=off buy?

acpi=off surprisingly turns off ACPI
read more about acpi at:
http://en.wikipedia.org/wiki/Acpi

-Wander

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: sata_via
@ 2008-04-05 16:08 Marin Mitov
  0 siblings, 0 replies; 13+ messages in thread
From: Marin Mitov @ 2008-04-05 16:08 UTC (permalink / raw
  To: linux-kernel

Hi Rich,

What is the output of command:

grep CONFIG_IRQBALANCE .config

If:

CONFIG_IRQBALANCE=y

try disabling it.

Regards

Marin Mitov

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: sata_via
       [not found]       ` <47F7982E.5080701@wesmo.com>
@ 2008-04-05 21:51         ` Jeff Garzik
  2008-04-13  2:45           ` sata_via Tejun Heo
  0 siblings, 1 reply; 13+ messages in thread
From: Jeff Garzik @ 2008-04-05 21:51 UTC (permalink / raw
  To: Rich West; +Cc: Wander Winkelhorst, linux-kernel, Linux IDE mailing list

Rich West wrote:
> I was only curious as to what turning off power management support would 
> buy with regard to the sata timeout issue.

ACPI is not only power management.  It is all the information your 
hardware conveys to your OS, about the setup and workings of your hardware.

Without ACPI, non-PM things like SMP, laptop dock/undock, and many other 
gadgets fail to function (or are configured sub-optimally).

ACPI sets up interrupt routing, and Linux history is _loaded_ with 
_years_ of problem reports that appear as timeouts, only to be resolved 
as ACPI interrupt bugs (aka BIOS bugs, since ACPI tables come from BIOS).

	Jeff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: sata_via
  2008-04-05 21:51         ` sata_via Jeff Garzik
@ 2008-04-13  2:45           ` Tejun Heo
  2008-04-13  4:19             ` sata_via Rich West
  0 siblings, 1 reply; 13+ messages in thread
From: Tejun Heo @ 2008-04-13  2:45 UTC (permalink / raw
  To: Jeff Garzik
  Cc: Rich West, Wander Winkelhorst, linux-kernel,
	Linux IDE mailing list

Jeff Garzik wrote:
> Rich West wrote:
>> I was only curious as to what turning off power management support 
>> would buy with regard to the sata timeout issue.
> 
> 
> ACPI is not only power management.  It is all the information your 
> hardware conveys to your OS, about the setup and workings of your hardware.
> 
> Without ACPI, non-PM things like SMP, laptop dock/undock, and many other 
> gadgets fail to function (or are configured sub-optimally).
> 
> ACPI sets up interrupt routing, and Linux history is _loaded_ with 
> _years_ of problem reports that appear as timeouts, only to be resolved 
> as ACPI interrupt bugs (aka BIOS bugs, since ACPI tables come from BIOS).

Also, please give a shot at the sacred "irqpoll".

-- 
tejun

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: sata_via
  2008-04-13  2:45           ` sata_via Tejun Heo
@ 2008-04-13  4:19             ` Rich West
  2008-04-13  4:36               ` sata_via Rich West
  0 siblings, 1 reply; 13+ messages in thread
From: Rich West @ 2008-04-13  4:19 UTC (permalink / raw
  To: Tejun Heo
  Cc: Jeff Garzik, Wander Winkelhorst, linux-kernel,
	Linux IDE mailing list

Tejun Heo wrote:
> Jeff Garzik wrote:
>> Rich West wrote:
>>> I was only curious as to what turning off power management support 
>>> would buy with regard to the sata timeout issue.
>>
>>
>> ACPI is not only power management.  It is all the information your 
>> hardware conveys to your OS, about the setup and workings of your 
>> hardware.
>>
>> Without ACPI, non-PM things like SMP, laptop dock/undock, and many 
>> other gadgets fail to function (or are configured sub-optimally).
>>
>> ACPI sets up interrupt routing, and Linux history is _loaded_ with 
>> _years_ of problem reports that appear as timeouts, only to be 
>> resolved as ACPI interrupt bugs (aka BIOS bugs, since ACPI tables 
>> come from BIOS).
>
> Also, please give a shot at the sacred "irqpoll".

I've had bad luck in the past (with other machiens) when adding 
"irqpoll" in that it had locked up the entire machine (not this one, but 
others) after a very short period of time.  My only attempt at using it, 
though, was to try to address an X server related issue, but this is a 
non-user machine, so X isn't necessary, and, hence, it's at runlevel 3 
all of the time.

So far, noapic and acpi=off seem to be working.. The system has been up 
8 days without any problems.. previously, a problem with the SATA drive 
(details of which were part of the first few messages in this thread) 
would surface anywhere between 15 minutes and 5-7 days after a reboot.

However, since I started this email, I tried firing up X (with noapci 
and acpi=off both set), the machine locks up entirely.

-Rich

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: sata_via
  2008-04-13  4:19             ` sata_via Rich West
@ 2008-04-13  4:36               ` Rich West
  2008-04-14  0:39                 ` sata_via Tejun Heo
  0 siblings, 1 reply; 13+ messages in thread
From: Rich West @ 2008-04-13  4:36 UTC (permalink / raw
  To: Tejun Heo
  Cc: Jeff Garzik, Wander Winkelhorst, linux-kernel,
	Linux IDE mailing list

Rich West wrote:
> Tejun Heo wrote:
>> Jeff Garzik wrote:
>>> Rich West wrote:
>>>> I was only curious as to what turning off power management support 
>>>> would buy with regard to the sata timeout issue.
>>>
>>>
>>> ACPI is not only power management.  It is all the information your 
>>> hardware conveys to your OS, about the setup and workings of your 
>>> hardware.
>>>
>>> Without ACPI, non-PM things like SMP, laptop dock/undock, and many 
>>> other gadgets fail to function (or are configured sub-optimally).
>>>
>>> ACPI sets up interrupt routing, and Linux history is _loaded_ with 
>>> _years_ of problem reports that appear as timeouts, only to be 
>>> resolved as ACPI interrupt bugs (aka BIOS bugs, since ACPI tables 
>>> come from BIOS).
>>
>> Also, please give a shot at the sacred "irqpoll".
>
> I've had bad luck in the past (with other machiens) when adding 
> "irqpoll" in that it had locked up the entire machine (not this one, 
> but others) after a very short period of time.  My only attempt at 
> using it, though, was to try to address an X server related issue, but 
> this is a non-user machine, so X isn't necessary, and, hence, it's at 
> runlevel 3 all of the time.
>
> So far, noapic and acpi=off seem to be working.. The system has been 
> up 8 days without any problems.. previously, a problem with the SATA 
> drive (details of which were part of the first few messages in this 
> thread) would surface anywhere between 15 minutes and 5-7 days after a 
> reboot.
>
> However, since I started this email, I tried firing up X (with noapci 
> and acpi=off both set), the machine locks up entirely.

Although a reboot with irqpoll set managed to fix that X problem. :)

-Rich

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: sata_via
  2008-04-13  4:36               ` sata_via Rich West
@ 2008-04-14  0:39                 ` Tejun Heo
  2008-04-14  8:13                   ` sata_via Thomas Renninger
  0 siblings, 1 reply; 13+ messages in thread
From: Tejun Heo @ 2008-04-14  0:39 UTC (permalink / raw
  To: Rich West, Thomas Renninger
  Cc: Jeff Garzik, Wander Winkelhorst, linux-kernel,
	Linux IDE mailing list

Hi,

On Sun, 2008-04-13 at 00:36 -0400, Rich West wrote:
> Rich West wrote:
> > Tejun Heo wrote:
> >> Jeff Garzik wrote:
> >>> Rich West wrote:
> >>>> I was only curious as to what turning off power management support 
> >>>> would buy with regard to the sata timeout issue.
> >>>
> >>>
> >>> ACPI is not only power management.  It is all the information your 
> >>> hardware conveys to your OS, about the setup and workings of your 
> >>> hardware.
> >>>
> >>> Without ACPI, non-PM things like SMP, laptop dock/undock, and many 
> >>> other gadgets fail to function (or are configured sub-optimally).
> >>>
> >>> ACPI sets up interrupt routing, and Linux history is _loaded_ with 
> >>> _years_ of problem reports that appear as timeouts, only to be 
> >>> resolved as ACPI interrupt bugs (aka BIOS bugs, since ACPI tables 
> >>> come from BIOS).
> >>
> >> Also, please give a shot at the sacred "irqpoll".
> >
> > I've had bad luck in the past (with other machiens) when adding 
> > "irqpoll" in that it had locked up the entire machine (not this one, 
> > but others) after a very short period of time.  My only attempt at 
> > using it, though, was to try to address an X server related issue, but 
> > this is a non-user machine, so X isn't necessary, and, hence, it's at 
> > runlevel 3 all of the time.
> >
> > So far, noapic and acpi=off seem to be working.. The system has been 
> > up 8 days without any problems.. previously, a problem with the SATA 
> > drive (details of which were part of the first few messages in this 
> > thread) would surface anywhere between 15 minutes and 5-7 days after a 
> > reboot.
> >
> > However, since I started this email, I tried firing up X (with noapci 
> > and acpi=off both set), the machine locks up entirely.
> 
> Although a reboot with irqpoll set managed to fix that X problem. :)
> 

IRQ routing seems hosed on your machine.  Thomas any ideas?

-- 
tejun


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: sata_via
  2008-04-14  0:39                 ` sata_via Tejun Heo
@ 2008-04-14  8:13                   ` Thomas Renninger
  2008-04-14  8:32                     ` sata_via Peter Gervai
  2008-04-15  2:40                     ` sata_via Tejun Heo
  0 siblings, 2 replies; 13+ messages in thread
From: Thomas Renninger @ 2008-04-14  8:13 UTC (permalink / raw
  To: Tejun Heo
  Cc: Rich West, Jeff Garzik, Wander Winkelhorst, linux-kernel,
	Linux IDE mailing list, Peter Gervai, Marin Mitov

On Mon, 2008-04-14 at 09:39 +0900, Tejun Heo wrote:
> Hi,
> 
> On Sun, 2008-04-13 at 00:36 -0400, Rich West wrote:
> > Rich West wrote:
> > > Tejun Heo wrote:
> > >> Jeff Garzik wrote:
> > >>> Rich West wrote:
> > >>>> I was only curious as to what turning off power management support 
> > >>>> would buy with regard to the sata timeout issue.
> > >>>
> > >>>
> > >>> ACPI is not only power management.  It is all the information your 
> > >>> hardware conveys to your OS, about the setup and workings of your 
> > >>> hardware.
> > >>>
> > >>> Without ACPI, non-PM things like SMP, laptop dock/undock, and many 
> > >>> other gadgets fail to function (or are configured sub-optimally).
> > >>>
> > >>> ACPI sets up interrupt routing, and Linux history is _loaded_ with 
> > >>> _years_ of problem reports that appear as timeouts, only to be 
> > >>> resolved as ACPI interrupt bugs (aka BIOS bugs, since ACPI tables 
> > >>> come from BIOS).
> > >>
> > >> Also, please give a shot at the sacred "irqpoll".
> > >
> > > I've had bad luck in the past (with other machiens) when adding 
> > > "irqpoll" in that it had locked up the entire machine (not this one, 
> > > but others) after a very short period of time.  My only attempt at 
> > > using it, though, was to try to address an X server related issue, but 
> > > this is a non-user machine, so X isn't necessary, and, hence, it's at 
> > > runlevel 3 all of the time.
> > >
> > > So far, noapic and acpi=off seem to be working.. The system has been 
> > > up 8 days without any problems.. previously, a problem with the SATA 
> > > drive (details of which were part of the first few messages in this 
> > > thread) would surface anywhere between 15 minutes and 5-7 days after a 
> > > reboot.
> > >
> > > However, since I started this email, I tried firing up X (with noapci 
> > > and acpi=off both set), the machine locks up entirely.
> > 
> > Although a reboot with irqpoll set managed to fix that X problem. :)
> > 
> 
> IRQ routing seems hosed on your machine.  Thomas any ideas?

Not really.
First, I'd try to be able to reproduce this more quickly, an IO
benchmark or similar (bonnie?).


----------------------------------------------
Peter Gervai posted a similar report recently, Elias Oltmanns had an
idea (on linux-ide or linux-acpi list):
Subject: Re: Hard freeze / interrupt-related death / instability
Wed, 09 Apr 2008

(but Tejun was already involved into that, at least in the link posted
there, but reading it up might be worth it).
Is this the same problem?

----------------------------------------------
Marin Mitov answered in this thread (only answering to lkml, truncating
the CC list..., got overseen?):
Hi Rich,

What is the output of command:

grep CONFIG_IRQBALANCE .config

If:

CONFIG_IRQBALANCE=y

try disabling it.
----------------------------------------------


   Thomas


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: sata_via
  2008-04-14  8:13                   ` sata_via Thomas Renninger
@ 2008-04-14  8:32                     ` Peter Gervai
  2008-04-15  2:40                     ` sata_via Tejun Heo
  1 sibling, 0 replies; 13+ messages in thread
From: Peter Gervai @ 2008-04-14  8:32 UTC (permalink / raw
  To: trenn
  Cc: Tejun Heo, Rich West, Jeff Garzik, Wander Winkelhorst,
	linux-kernel, Linux IDE mailing list, Marin Mitov

On Mon, Apr 14, 2008 at 10:13 AM, Thomas Renninger <trenn@suse.de> wrote:

>  > IRQ routing seems hosed on your machine.  Thomas any ideas?
>
>  Not really.
>  First, I'd try to be able to reproduce this more quickly, an IO
>  benchmark or similar (bonnie?).
>
>
>  ----------------------------------------------
>  Peter Gervai posted a similar report recently, Elias Oltmanns had an
>  idea (on linux-ide or linux-acpi list):
>  Subject: Re: Hard freeze / interrupt-related death / instability
>  Wed, 09 Apr 2008
>
>  (but Tejun was already involved into that, at least in the link posted
>  there, but reading it up might be worth it).
>  Is this the same problem?

Mmhm, as I was (kindly) cc'd I reply my results so far, but since
there was a mention of IRQBALANCE, I checked myself, since I was
pretty sure I have switched that off:
# zegrep IRQ /proc/config.gz
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_IRQBALANCE=y
CONFIG_HT_IRQ=y
CONFIG_IDEPCI_SHARE_IRQ=y
CONFIG_HPET_RTC_IRQ=y
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
# CONFIG_DEBUG_SHIRQ is not set

Darn, I'd say. I go and switch balancer off, it doesn't really do much
good for me anyway. (What about userland balancer, does it hose things
as well?)

But apart from that I'm runnig under " pci=nomsi " for the last 4 days
without a freeze (which usually occured every 1-2 days, but I'll wait
for more, and I will try to bonnie++ myself in the meantime). It did
not freeze under "acpi_irq_balance" either for 6 days, but the machine
overall felt a bit jerky (pretty ugly irq overlaps I guess).

If any of you prefer me to run under other options, or to apply that
patch I was advised to, I'll just do that. Just voice yourself.  (I
have an open request for "noapic acpi_irq_nobalance" and the patch.)

Thanks for your time, as always (and I try to share mine :)).

-- 
 byte-byte,
 grin

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: sata_via
  2008-04-14  8:13                   ` sata_via Thomas Renninger
  2008-04-14  8:32                     ` sata_via Peter Gervai
@ 2008-04-15  2:40                     ` Tejun Heo
  1 sibling, 0 replies; 13+ messages in thread
From: Tejun Heo @ 2008-04-15  2:40 UTC (permalink / raw
  To: trenn
  Cc: Rich West, Jeff Garzik, Wander Winkelhorst, linux-kernel,
	Linux IDE mailing list, Peter Gervai, Marin Mitov

Thomas Renninger wrote:
> ----------------------------------------------
> Peter Gervai posted a similar report recently, Elias Oltmanns had an
> idea (on linux-ide or linux-acpi list):
> Subject: Re: Hard freeze / interrupt-related death / instability
> Wed, 09 Apr 2008
> 
> (but Tejun was already involved into that, at least in the link posted
> there, but reading it up might be worth it).
> Is this the same problem?

Can't really tell.  Didn't look like a low level driver problem tho.

-- 
tejun

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2008-04-15  2:40 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-04 20:51 sata_via Rich West
2008-04-04 22:04 ` sata_via Jeff Garzik
2008-04-04 23:44   ` sata_via Rich West
2008-04-05  7:15     ` sata_via Wander Winkelhorst
     [not found]       ` <47F7982E.5080701@wesmo.com>
2008-04-05 21:51         ` sata_via Jeff Garzik
2008-04-13  2:45           ` sata_via Tejun Heo
2008-04-13  4:19             ` sata_via Rich West
2008-04-13  4:36               ` sata_via Rich West
2008-04-14  0:39                 ` sata_via Tejun Heo
2008-04-14  8:13                   ` sata_via Thomas Renninger
2008-04-14  8:32                     ` sata_via Peter Gervai
2008-04-15  2:40                     ` sata_via Tejun Heo
  -- strict thread matches above, loose matches on Subject: below --
2008-04-05 16:08 sata_via Marin Mitov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.