All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [lm-sensors] lm-sensors: which temperature sensor is lying ?
@ 2014-07-12 14:57 Toerless Eckert
  2014-07-12 17:29 ` Guenter Roeck
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: Toerless Eckert @ 2014-07-12 14:57 UTC (permalink / raw
  To: lm-sensors

ECS GF7100-M3 MOBO (ca. 2008'ish).  Core2 CPU 6400@2.13GHz (60W),
never tried to bother with sensors. Now i tried to upgrade the
CPU to a quad core (90W), and that one crashes, but only after
>= 24 hours under full CPU. Tried various better CPU heatsinks,
but still crashes, so i start wondering what the real temperatures are.
And thats when i am getting confused by the sensors output because
it seems to be contradictory and i can not find good explanations:

coretemp-isa-0000
Adapter: ISA adapter
Core 0:       +68.0°C  (high = +84.0°C, crit = +100.0°C)
Core 1:       +69.0°C  (high = +84.0°C, crit = +100.0°C)

w83627dhg-isa-0a10
Adapter: ISA adapter
...
fan1:           0 RPM  (min = 10546 RPM, div = 128)  ALARM
fan2:         888 RPM  (min = 1562 RPM, div = 8)  ALARM
fan3:           0 RPM  (min =  878 RPM, div = 128)  ALARM
fan5:           0 RPM  (min = 1757 RPM, div = 128)  ALARM
temp1:        +40.0°C  (high = +31.0°C, hyst = +93.0°C)  sensor = thermistor
temp2:        +38.0°C  (high =  -0.5°C, hyst =  -1.0°C)  ALARM  sensor = diode
temp3:         +2.5°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor

fan2 is the CPU fan. I can tune it from ca. 850 to ca 2800, but
the increase does have astounding little impact on the temperature
readings.

temp1 never changes, i guess this is on some other chip - northbridge ?

temp2 must be CPU. With Core2 CPU its 28C idle and goes up to 38C full CPU.
Core 0/1 with Core2 CPU are ~55C idle and 68C full CPU.

With Quad core CPU, Core0/1/2/3 are about 50C idle and go up to
77C under full CPU load (CPU 0 always highest, the other 5C lower).
temp2 with Quad core CPU is 30C idle and 40C under full load.
With worse CPU cooler i had Core 0 go above 84C and then i started to
actually see more mcelog errors (even shorter than 24 hours).

So, now i wonder if both Core 0/1/2/3 and temp2 can be correct, or if
maybe one is wrong - or in general: whats the bloody temperature of
my CPUs really.

And i can not find a good web page that explains what coretemp-isa
vs w83627dhg-* are and how to validate that their readings are correct.

I am guessing, the coretemp-isa-000 sensor is actually IN the
CPU, but whether or not that means that the temperate values are
read correctly, i can not say. And temp2 is a temperature sensor
on the Mobo below the CPU, but whether or not that sensor reading
is configured correctly.. i can not say either.

If thats right, i still can't believe both sensors are correctly
set up. In steady state full CPU load i can not see how the under-the-CPU
temperature could be 30C lower than the in-CPU ones.

So ... what temperature does my CPU have and/or how can i make
sure both sensors are set up correctly ?

Thanks
    Toerless

_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [lm-sensors] lm-sensors: which temperature sensor is lying ?
  2014-07-12 14:57 [lm-sensors] lm-sensors: which temperature sensor is lying ? Toerless Eckert
@ 2014-07-12 17:29 ` Guenter Roeck
  2014-07-12 18:17 ` Toerless Eckert
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Guenter Roeck @ 2014-07-12 17:29 UTC (permalink / raw
  To: lm-sensors

On 07/12/2014 07:57 AM, Toerless Eckert wrote:
> ECS GF7100-M3 MOBO (ca. 2008'ish).  Core2 CPU 6400@2.13GHz (60W),
> never tried to bother with sensors. Now i tried to upgrade the
> CPU to a quad core (90W), and that one crashes, but only after
>> = 24 hours under full CPU. Tried various better CPU heatsinks,
> but still crashes, so i start wondering what the real temperatures are.
> And thats when i am getting confused by the sensors output because
> it seems to be contradictory and i can not find good explanations:
>
> coretemp-isa-0000
> Adapter: ISA adapter
> Core 0:       +68.0°C  (high = +84.0°C, crit = +100.0°C)
> Core 1:       +69.0°C  (high = +84.0°C, crit = +100.0°C)
>
The CPU reports the difference to the critical temperature as integer value,
where a difference of '1' roughly means 1 degree C. coretemp translates that
into an absolute temperature. The value can be highly inaccurate at low
temperatures, but gets more accurate when it gets close to the critical
temperature limit.

What is the exact CPU model ? It might be useful to know if coretemp reads
the critical limit from the CPU or estimates it. Older CPUs don't provide
the register to read it from the CPU so coretemp needs to guess it.
Output of /proc/cpuinfo would help.

> w83627dhg-isa-0a10
> Adapter: ISA adapter
> ...
> fan1:           0 RPM  (min = 10546 RPM, div = 128)  ALARM
> fan2:         888 RPM  (min = 1562 RPM, div = 8)  ALARM
> fan3:           0 RPM  (min =  878 RPM, div = 128)  ALARM
> fan5:           0 RPM  (min = 1757 RPM, div = 128)  ALARM
> temp1:        +40.0°C  (high = +31.0°C, hyst = +93.0°C)  sensor = thermistor
> temp2:        +38.0°C  (high =  -0.5°C, hyst =  -1.0°C)  ALARM  sensor = diode
> temp3:         +2.5°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
>
> fan2 is the CPU fan. I can tune it from ca. 850 to ca 2800, but
> the increase does have astounding little impact on the temperature
> readings.
>
> temp1 never changes, i guess this is on some other chip - northbridge ?
>
> temp2 must be CPU. With Core2 CPU its 28C idle and goes up to 38C full CPU.
> Core 0/1 with Core2 CPU are ~55C idle and 68C full CPU.
>

Unlikely. One would need to see the datasheet / schematics of the board
to get an idea what is connected. W83627DHG supports direct temperature
measurement from the CPU through PECI. Either that is not connected
on your board, or the chip is not configured correctly.

> With Quad core CPU, Core0/1/2/3 are about 50C idle and go up to
> 77C under full CPU load (CPU 0 always highest, the other 5C lower).
> temp2 with Quad core CPU is 30C idle and 40C under full load.
> With worse CPU cooler i had Core 0 go above 84C and then i started to
> actually see more mcelog errors (even shorter than 24 hours).
>
That doesn't look that bad. Sure, 84C is a bit high, but 77C is ok.
MCE log even at that temperature is a bit odd, though - the CPU
should only start complaining if it gets close to the critical limit.

Just to give you a reference point, this is what I see right now
with an i7-4790K running at full load @ 4.2GHz:

coretemp-isa-0000
Adapter: ISA adapter
Physical id 0:  +82.0°C  (high = +80.0°C, crit = +100.0°C)
Core 0:         +78.0°C  (high = +80.0°C, crit = +100.0°C)
Core 1:         +82.0°C  (high = +80.0°C, crit = +100.0°C)
Core 2:         +78.0°C  (high = +80.0°C, crit = +100.0°C)
Core 3:         +76.0°C  (high = +80.0°C, crit = +100.0°C)

As you can see, some of the temperatures are above 'high', but
not even close to the critical limit.

Problem though is that fan control is driven from the W83627DHG,
and it looks like this chip is not aware that the CPU is running hot,
meaning it does not increase fan speed as it should.

What temperatures do you see in the BIOS ?

> So, now i wonder if both Core 0/1/2/3 and temp2 can be correct, or if
> maybe one is wrong - or in general: whats the bloody temperature of
> my CPUs really.
>
> And i can not find a good web page that explains what coretemp-isa
> vs w83627dhg-* are and how to validate that their readings are correct.
>
> I am guessing, the coretemp-isa-000 sensor is actually IN the
> CPU, but whether or not that means that the temperate values are
> read correctly, i can not say. And temp2 is a temperature sensor

That is correct. For information about accuracy, I would recommend
the Intel CPU datasheet. It usually has a chapter describing the
temperature sensors.

> on the Mobo below the CPU, but whether or not that sensor reading
> is configured correctly.. i can not say either.
>
> If thats right, i still can't believe both sensors are correctly
> set up. In steady state full CPU load i can not see how the under-the-CPU
> temperature could be 30C lower than the in-CPU ones.
>
> So ... what temperature does my CPU have and/or how can i make
> sure both sensors are set up correctly ?
>
coretemp is the best you can get as long as you read the reported temperature
not as face value but as "difference to maximum".

The W83627DHG settings are  more critical, really, as it should control
fan speed based on CPU temperature. Something seems to be wrong there.
Unfortunately, you'll need support from the board vendor. Anything wrong
there is wrong because the BIOS programs it that way. Messing with it
from Linux would technically be possible by writing directly into chip
registers, but I would not recommend it because you _might_ fry the board
if you write a bad value into the wrong location.

Do you run the latest BIOS ? It might make sense to ensure that the board
and the BIOS actually support the CPU you are using.

Guenter


_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [lm-sensors] lm-sensors: which temperature sensor is lying ?
  2014-07-12 14:57 [lm-sensors] lm-sensors: which temperature sensor is lying ? Toerless Eckert
  2014-07-12 17:29 ` Guenter Roeck
@ 2014-07-12 18:17 ` Toerless Eckert
  2014-07-12 18:43 ` Guenter Roeck
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Toerless Eckert @ 2014-07-12 18:17 UTC (permalink / raw
  To: lm-sensors

inline

On Sat, Jul 12, 2014 at 10:29:45AM -0700, Guenter Roeck wrote:
> The CPU reports the difference to the critical temperature as integer value,
> where a difference of '1' roughly means 1 degree C. coretemp translates that
> into an absolute temperature. The value can be highly inaccurate at low
> temperatures, but gets more accurate when it gets close to the critical
> temperature limit.
> 
> What is the exact CPU model ? It might be useful to know if coretemp reads
> the critical limit from the CPU or estimates it. Older CPUs don't provide
> the register to read it from the CPU so coretemp needs to guess it.
> Output of /proc/cpuinfo would help.

As i said, Core2Duo 6400, see cpuinfo at the end.

> >w83627dhg-isa-0a10
> >Adapter: ISA adapter
> >...
> >fan1:           0 RPM  (min = 10546 RPM, div = 128)  ALARM
> >fan2:         888 RPM  (min = 1562 RPM, div = 8)  ALARM
> >fan3:           0 RPM  (min =  878 RPM, div = 128)  ALARM
> >fan5:           0 RPM  (min = 1757 RPM, div = 128)  ALARM
> >temp1:        +40.0°C  (high = +31.0°C, hyst = +93.0°C)  sensor = thermistor
> >temp2:        +38.0°C  (high =  -0.5°C, hyst =  -1.0°C)  ALARM  sensor = diode
> >temp3:         +2.5°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
> >
> >fan2 is the CPU fan. I can tune it from ca. 850 to ca 2800, but
> >the increase does have astounding little impact on the temperature
> >readings.
> >
> >temp1 never changes, i guess this is on some other chip - northbridge ?
> >
> >temp2 must be CPU. With Core2 CPU its 28C idle and goes up to 38C full CPU.
> >Core 0/1 with Core2 CPU are ~55C idle and 68C full CPU.
> >
> 
> Unlikely. One would need to see the datasheet / schematics of the board
> to get an idea what is connected. W83627DHG supports direct temperature
> measurement from the CPU through PECI. Either that is not connected
> on your board, or the chip is not configured correctly.

So PECI are pins on the CPU into a temperature sensor on the CPU ?

But why do you say that is not connected or incorrectly configured ?

> >With Quad core CPU, Core0/1/2/3 are about 50C idle and go up to
> >77C under full CPU load (CPU 0 always highest, the other 5C lower).
> >temp2 with Quad core CPU is 30C idle and 40C under full load.
> >With worse CPU cooler i had Core 0 go above 84C and then i started to
> >actually see more mcelog errors (even shorter than 24 hours).
> >
> That doesn't look that bad. Sure, 84C is a bit high, but 77C is ok.
> MCE log even at that temperature is a bit odd, though - the CPU
> should only start complaining if it gets close to the critical limit.

I just tested on the dual-core CPU, stopping the CPU fan manually.
The CPU started to emit mcelog throttle messages when  the Core 0
sensor reached 100C - which took a few minutes, at that time temp2 sensor was
at 68C.

How much of this error generation is really hard-coded by the CPU
vs. potentially wrong linux driver/config ? If it is known that
this has nothing to do with anyhing linux could do wrong, but its purely the
CPU and its known to have 100 degree trippoint when it throttles ... that
would make me start beliving those high Cpu 0 readings, but otherwise
i rather doubt them.

> Just to give you a reference point, this is what I see right now
> with an i7-4790K running at full load @ 4.2GHz:
> 
> coretemp-isa-0000
> Adapter: ISA adapter
> Physical id 0:  +82.0°C  (high = +80.0°C, crit = +100.0°C)
> Core 0:         +78.0°C  (high = +80.0°C, crit = +100.0°C)
> Core 1:         +82.0°C  (high = +80.0°C, crit = +100.0°C)
> Core 2:         +78.0°C  (high = +80.0°C, crit = +100.0°C)
> Core 3:         +76.0°C  (high = +80.0°C, crit = +100.0°C)

Ok, but what do you see on full idle ? I just can't believe that
a Core 0 sensor temperature of now 58C and a temp2 value of 31C is
both correct.

Alas, i only have another linux with quad-core AMD, and that shows nicely idling
at 32C and full load not above 42 and the CPU and temp sensors look
comparable.

> As you can see, some of the temperatures are above 'high', but
> not even close to the critical limit.
> 
> Problem though is that fan control is driven from the W83627DHG,
> and it looks like this chip is not aware that the CPU is running hot,
> meaning it does not increase fan speed as it should.

I am not using fancontrol, its just the boards automatic PWM
control. when i manually stopped the fan, and then later restarted
it, i could see that the board PWM control works fine, but its
definitely based on temp2 reading: it went full spead as long as it
was above 50C on temp2, and then throttled down.

> 
> What temperatures do you see in the BIOS ?

Between 30C and 40C.

> >So, now i wonder if both Core 0/1/2/3 and temp2 can be correct, or if
> >maybe one is wrong - or in general: whats the bloody temperature of
> >my CPUs really.
> >
> >And i can not find a good web page that explains what coretemp-isa
> >vs w83627dhg-* are and how to validate that their readings are correct.
> >
> >I am guessing, the coretemp-isa-000 sensor is actually IN the
> >CPU, but whether or not that means that the temperate values are
> >read correctly, i can not say. And temp2 is a temperature sensor
> 
> That is correct. For information about accuracy, I would recommend
> the Intel CPU datasheet. It usually has a chapter describing the
> temperature sensors.
> 
> >on the Mobo below the CPU, but whether or not that sensor reading
> >is configured correctly.. i can not say either.
> >
> >If thats right, i still can't believe both sensors are correctly
> >set up. In steady state full CPU load i can not see how the under-the-CPU
> >temperature could be 30C lower than the in-CPU ones.
> >
> >So ... what temperature does my CPU have and/or how can i make
> >sure both sensors are set up correctly ?
> >
> coretemp is the best you can get as long as you read the reported temperature
> not as face value but as "difference to maximum".
> 
> The W83627DHG settings are  more critical, really, as it should control
> fan speed based on CPU temperature. Something seems to be wrong there.
> Unfortunately, you'll need support from the board vendor. Anything wrong
> there is wrong because the BIOS programs it that way. Messing with it
> from Linux would technically be possible by writing directly into chip
> registers, but I would not recommend it because you _might_ fry the board
> if you write a bad value into the wrong location.
> 
> Do you run the latest BIOS ? It might make sense to ensure that the board
> and the BIOS actually support the CPU you are using.

Yeah, its a 2008 board, but runs latest BIOS.

Cheers
   Toerless

> Guenter

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Core(TM)2 CPU          6400  @ 2.13GHz
stepping        : 6
microcode       : 0xcb
cpu MHz         : 2133.411
cache size      : 2048 KB
physical id     : 0
siblings        : 2
core id         : 1
cpu cores       : 2
apicid          : 1
initial apicid  : 1
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arc
h_perfmon pebs bts aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm dtherm tpr_shadow
bogomips        : 4266.82
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:


_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [lm-sensors] lm-sensors: which temperature sensor is lying ?
  2014-07-12 14:57 [lm-sensors] lm-sensors: which temperature sensor is lying ? Toerless Eckert
  2014-07-12 17:29 ` Guenter Roeck
  2014-07-12 18:17 ` Toerless Eckert
@ 2014-07-12 18:43 ` Guenter Roeck
  2014-07-12 22:30 ` Toerless Eckert
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Guenter Roeck @ 2014-07-12 18:43 UTC (permalink / raw
  To: lm-sensors

On 07/12/2014 11:17 AM, Toerless Eckert wrote:
> inline
>
> On Sat, Jul 12, 2014 at 10:29:45AM -0700, Guenter Roeck wrote:
>> The CPU reports the difference to the critical temperature as integer value,
>> where a difference of '1' roughly means 1 degree C. coretemp translates that
>> into an absolute temperature. The value can be highly inaccurate at low
>> temperatures, but gets more accurate when it gets close to the critical
>> temperature limit.
>>
>> What is the exact CPU model ? It might be useful to know if coretemp reads
>> the critical limit from the CPU or estimates it. Older CPUs don't provide
>> the register to read it from the CPU so coretemp needs to guess it.
>> Output of /proc/cpuinfo would help.
>
> As i said, Core2Duo 6400, see cpuinfo at the end.
>

I thought you saw the problem with the quad core CPU. Am I missing something ?
The 6400 is not a quad core CPU.

>>> w83627dhg-isa-0a10
>>> Adapter: ISA adapter
>>> ...
>>> fan1:           0 RPM  (min = 10546 RPM, div = 128)  ALARM
>>> fan2:         888 RPM  (min = 1562 RPM, div = 8)  ALARM
>>> fan3:           0 RPM  (min =  878 RPM, div = 128)  ALARM
>>> fan5:           0 RPM  (min = 1757 RPM, div = 128)  ALARM
>>> temp1:        +40.0°C  (high = +31.0°C, hyst = +93.0°C)  sensor = thermistor
>>> temp2:        +38.0°C  (high =  -0.5°C, hyst =  -1.0°C)  ALARM  sensor = diode
>>> temp3:         +2.5°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
>>>
>>> fan2 is the CPU fan. I can tune it from ca. 850 to ca 2800, but
>>> the increase does have astounding little impact on the temperature
>>> readings.
>>>
>>> temp1 never changes, i guess this is on some other chip - northbridge ?
>>>
>>> temp2 must be CPU. With Core2 CPU its 28C idle and goes up to 38C full CPU.
>>> Core 0/1 with Core2 CPU are ~55C idle and 68C full CPU.
>>>
>>
>> Unlikely. One would need to see the datasheet / schematics of the board
>> to get an idea what is connected. W83627DHG supports direct temperature
>> measurement from the CPU through PECI. Either that is not connected
>> on your board, or the chip is not configured correctly.
>
> So PECI are pins on the CPU into a temperature sensor on the CPU ?
>
Yes.

> But why do you say that is not connected or incorrectly configured ?
>
If it was configured correctly it should show exactly the same temperatures
as coretemp.

>>> With Quad core CPU, Core0/1/2/3 are about 50C idle and go up to
>>> 77C under full CPU load (CPU 0 always highest, the other 5C lower).
>>> temp2 with Quad core CPU is 30C idle and 40C under full load.
>>> With worse CPU cooler i had Core 0 go above 84C and then i started to
>>> actually see more mcelog errors (even shorter than 24 hours).
>>>
>> That doesn't look that bad. Sure, 84C is a bit high, but 77C is ok.
>> MCE log even at that temperature is a bit odd, though - the CPU
>> should only start complaining if it gets close to the critical limit.
>
> I just tested on the dual-core CPU, stopping the CPU fan manually.
> The CPU started to emit mcelog throttle messages when  the Core 0
> sensor reached 100C - which took a few minutes, at that time temp2 sensor was
> at 68C.
>
That is what I would expect to see.

> How much of this error generation is really hard-coded by the CPU
> vs. potentially wrong linux driver/config ? If it is known that
> this has nothing to do with anyhing linux could do wrong, but its purely the
> CPU and its known to have 100 degree trippoint when it throttles ... that
> would make me start beliving those high Cpu 0 readings, but otherwise
> i rather doubt them.
>
MCE errors are created by the CPU. Linux only reacts to it.

>> Just to give you a reference point, this is what I see right now
>> with an i7-4790K running at full load @ 4.2GHz:
>>
>> coretemp-isa-0000
>> Adapter: ISA adapter
>> Physical id 0:  +82.0°C  (high = +80.0°C, crit = +100.0°C)
>> Core 0:         +78.0°C  (high = +80.0°C, crit = +100.0°C)
>> Core 1:         +82.0°C  (high = +80.0°C, crit = +100.0°C)
>> Core 2:         +78.0°C  (high = +80.0°C, crit = +100.0°C)
>> Core 3:         +76.0°C  (high = +80.0°C, crit = +100.0°C)
>
> Ok, but what do you see on full idle ? I just can't believe that
> a Core 0 sensor temperature of now 58C and a temp2 value of 31C is
> both correct.
>
coretemp-isa-0000
Adapter: ISA adapter
Physical id 0:  +32.0°C  (high = +80.0°C, crit = +100.0°C)
Core 0:         +32.0°C  (high = +80.0°C, crit = +100.0°C)
Core 1:         +30.0°C  (high = +80.0°C, crit = +100.0°C)
Core 2:         +26.0°C  (high = +80.0°C, crit = +100.0°C)
Core 3:         +30.0°C  (high = +80.0°C, crit = +100.0°C)

> Alas, i only have another linux with quad-core AMD, and that shows nicely idling
> at 32C and full load not above 42 and the CPU and temp sensors look
> comparable.
>
That is an apples-to-oranges comparison, though. With the same logic
I could argue that all six servers I have online right now are fine,
therefore you don't have a problem.

>> As you can see, some of the temperatures are above 'high', but
>> not even close to the critical limit.
>>
>> Problem though is that fan control is driven from the W83627DHG,
>> and it looks like this chip is not aware that the CPU is running hot,
>> meaning it does not increase fan speed as it should.
>
> I am not using fancontrol, its just the boards automatic PWM
> control. when i manually stopped the fan, and then later restarted
> it, i could see that the board PWM control works fine, but its
> definitely based on temp2 reading: it went full spead as long as it
> was above 50C on temp2, and then throttled down.
>
Automatic fan control is what I meant. Guess if the chip is configured
to run fans at full speed if the temperature shows 50 degrees C you
might be ok. Question though is if temp2 gets there with the quad
core CPU. It might be that the quad core CPU needs a lower limit
to start running fans at full speed. Just guessing, though.

>>
>> What temperatures do you see in the BIOS ?
>
> Between 30C and 40C.
>
>>> So, now i wonder if both Core 0/1/2/3 and temp2 can be correct, or if
>>> maybe one is wrong - or in general: whats the bloody temperature of
>>> my CPUs really.
>>>
>>> And i can not find a good web page that explains what coretemp-isa
>>> vs w83627dhg-* are and how to validate that their readings are correct.
>>>
>>> I am guessing, the coretemp-isa-000 sensor is actually IN the
>>> CPU, but whether or not that means that the temperate values are
>>> read correctly, i can not say. And temp2 is a temperature sensor
>>
>> That is correct. For information about accuracy, I would recommend
>> the Intel CPU datasheet. It usually has a chapter describing the
>> temperature sensors.
>>
>>> on the Mobo below the CPU, but whether or not that sensor reading
>>> is configured correctly.. i can not say either.
>>>
>>> If thats right, i still can't believe both sensors are correctly
>>> set up. In steady state full CPU load i can not see how the under-the-CPU
>>> temperature could be 30C lower than the in-CPU ones.
>>>
>>> So ... what temperature does my CPU have and/or how can i make
>>> sure both sensors are set up correctly ?
>>>
>> coretemp is the best you can get as long as you read the reported temperature
>> not as face value but as "difference to maximum".
>>
>> The W83627DHG settings are  more critical, really, as it should control
>> fan speed based on CPU temperature. Something seems to be wrong there.
>> Unfortunately, you'll need support from the board vendor. Anything wrong
>> there is wrong because the BIOS programs it that way. Messing with it
>> from Linux would technically be possible by writing directly into chip
>> registers, but I would not recommend it because you _might_ fry the board
>> if you write a bad value into the wrong location.
>>
>> Do you run the latest BIOS ? It might make sense to ensure that the board
>> and the BIOS actually support the CPU you are using.
>
> Yeah, its a 2008 board, but runs latest BIOS.
>
Is the new CPU listed as supported ? Also, again, can you give me the model
of the quad core CPU ?

Thanks,
Guenter

> Cheers
>     Toerless
>
>> Guenter
>
> processor       : 1
> vendor_id       : GenuineIntel
> cpu family      : 6
> model           : 15
> model name      : Intel(R) Core(TM)2 CPU          6400  @ 2.13GHz
> stepping        : 6
> microcode       : 0xcb
> cpu MHz         : 2133.411
> cache size      : 2048 KB
> physical id     : 0
> siblings        : 2
> core id         : 1
> cpu cores       : 2
> apicid          : 1
> initial apicid  : 1
> fdiv_bug        : no
> hlt_bug         : no
> f00f_bug        : no
> coma_bug        : no
> fpu             : yes
> fpu_exception   : yes
> cpuid level     : 10
> wp              : yes
> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
> pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arc
> h_perfmon pebs bts aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm dtherm tpr_shadow
> bogomips        : 4266.82
> clflush size    : 64
> cache_alignment : 64
> address sizes   : 36 bits physical, 48 bits virtual
> power management:
>
>
>


_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [lm-sensors] lm-sensors: which temperature sensor is lying ?
  2014-07-12 14:57 [lm-sensors] lm-sensors: which temperature sensor is lying ? Toerless Eckert
                   ` (2 preceding siblings ...)
  2014-07-12 18:43 ` Guenter Roeck
@ 2014-07-12 22:30 ` Toerless Eckert
  2014-07-12 23:06 ` Guenter Roeck
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Toerless Eckert @ 2014-07-12 22:30 UTC (permalink / raw
  To: lm-sensors

On Sat, Jul 12, 2014 at 11:43:15AM -0700, Guenter Roeck wrote:
> >As i said, Core2Duo 6400, see cpuinfo at the end.
> 
> I thought you saw the problem with the quad core CPU. Am I missing something ?
> The 6400 is not a quad core CPU.

The differences between Core 0/1 sensors and term2 sensors are the same
whether i use my good old proven 6400 or the new quad-core. So right
now i want to stick to my old CPU and figure out that i understand whats
going wrong with the sensors and ultimately know what my old 6400 temperature
is. ... And then i can go back to the quad-core. 

> >So PECI are pins on the CPU into a temperature sensor on the CPU ?
> Yes.
> 
> >But why do you say that is not connected or incorrectly configured ?
> >
> If it was configured correctly it should show exactly the same temperatures
> as coretemp.

Ok, so how do i then know whether the Core0/1 readings or the temp2 reading
is misconfigured...

> >I just tested on the dual-core CPU, stopping the CPU fan manually.
> >The CPU started to emit mcelog throttle messages when  the Core 0
> >sensor reached 100C - which took a few minutes, at that time temp2 sensor was
> >at 68C.
> >
> That is what I would expect to see.

Right. So thats why i am not worrying about the fan right now ;-)

> >How much of this error generation is really hard-coded by the CPU
> >vs. potentially wrong linux driver/config ? If it is known that
> >this has nothing to do with anyhing linux could do wrong, but its purely the
> >CPU and its known to have 100 degree trippoint when it throttles ... that
> >would make me start beliving those high Cpu 0 readings, but otherwise
> >i rather doubt them.
> >
> MCE errors are created by the CPU. Linux only reacts to it.

Ok, but in the MCE error it does not say the trip temperature, so
i wonder if one can validate that the trip temperature is really
100C for the 6400 CPU. Because if it is, then i would trust the Core 0/1
sensor readings more and conclude the temp2 is wrong... and wonder if/how
i can fixup some lm_sensors config to fix it up.

> >Alas, i only have another linux with quad-core AMD, and that shows nicely idling
> >at 32C and full load not above 42 and the CPU and temp sensors look
> >comparable.
> >
> That is an apples-to-oranges comparison, though. With the same logic
> I could argue that all six servers I have online right now are fine,
> therefore you don't have a problem.

I just brought it up for two reasons:
- My other linux does have consistent info across different sensors
- If AMD is really runing cooler, maybe my next mobo should be AMD again ;-)
  (but the idea of course here is to keep this running as long as possible).

> Automatic fan control is what I meant. Guess if the chip is configured
> to run fans at full speed if the temperature shows 50 degrees C you
> might be ok. Question though is if temp2 gets there with the quad
> core CPU. It might be that the quad core CPU needs a lower limit
> to start running fans at full speed. Just guessing, though.

Yeah, but as stated up front. Lets forget the quad core CPU:

temp2 shows me temperatures between 30C and 60C, and when i stop the
fan and restart, i see the mobo fan control change speed at 50C on temp2,
which is also what is configured in the BIOS. If i go after restart into
the BIOS i see a temperate between 30C and 40C which makes me think
that the BIOS does rely on the temp2 sensor and that the BIOS thinks
the CPU has temperatures between 30C and 60C.  Which is inconsistent
with the higher temp readings on the Core sensors: - 50C..100C

> >Yeah, its a 2008 board, but runs latest BIOS.
> >
> Is the new CPU listed as supported ? Also, again, can you give me the model
> of the quad core CPU ?

Again, lets forget the quad core right now. these are all right now numbers
with the proven old dual core.

Cheers
    Toerless

_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [lm-sensors] lm-sensors: which temperature sensor is lying ?
  2014-07-12 14:57 [lm-sensors] lm-sensors: which temperature sensor is lying ? Toerless Eckert
                   ` (3 preceding siblings ...)
  2014-07-12 22:30 ` Toerless Eckert
@ 2014-07-12 23:06 ` Guenter Roeck
  2014-07-13 18:34 ` Toerless Eckert
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Guenter Roeck @ 2014-07-12 23:06 UTC (permalink / raw
  To: lm-sensors

On 07/12/2014 03:30 PM, Toerless Eckert wrote:
> On Sat, Jul 12, 2014 at 11:43:15AM -0700, Guenter Roeck wrote:
>>> As i said, Core2Duo 6400, see cpuinfo at the end.
>>
>> I thought you saw the problem with the quad core CPU. Am I missing something ?
>> The 6400 is not a quad core CPU.
>
> The differences between Core 0/1 sensors and term2 sensors are the same
> whether i use my good old proven 6400 or the new quad-core. So right
> now i want to stick to my old CPU and figure out that i understand whats
> going wrong with the sensors and ultimately know what my old 6400 temperature
> is. ... And then i can go back to the quad-core.
>
>>> So PECI are pins on the CPU into a temperature sensor on the CPU ?
>> Yes.
>>
>>> But why do you say that is not connected or incorrectly configured ?
>>>
>> If it was configured correctly it should show exactly the same temperatures
>> as coretemp.
>
> Ok, so how do i then know whether the Core0/1 readings or the temp2 reading
> is misconfigured...
>


[1] suggests that tjmax for E600 series should be either 70 or 80 degrees C.
Other links [2] suggest that it might be 85 degrees C or 100 degrees C,
though that link is older. This suggests that the 100 you have configured
may be wrong, and that the real temperature may be 20 or even 30 degrees
lower. This in turn would suggest that the temp2 reading might be the
correct (or better) one.

You can set tjmax with the tjmax module parameter. For example,
'modprobe coretemp tjmax€' would set tjmax to 80 degrees C.

Ultimately that doesn't matter much, though, since only the difference
between tjmax (shown as critical temperature) and the current temperature
is relevant, and your system is well below the critical temperature,
at least with the dual core CPU.

>>> I just tested on the dual-core CPU, stopping the CPU fan manually.
>>> The CPU started to emit mcelog throttle messages when  the Core 0
>>> sensor reached 100C - which took a few minutes, at that time temp2 sensor was
>>> at 68C.
>>>
>> That is what I would expect to see.
>
> Right. So thats why i am not worrying about the fan right now ;-)
>
>>> How much of this error generation is really hard-coded by the CPU
>>> vs. potentially wrong linux driver/config ? If it is known that
>>> this has nothing to do with anyhing linux could do wrong, but its purely the
>>> CPU and its known to have 100 degree trippoint when it throttles ... that
>>> would make me start beliving those high Cpu 0 readings, but otherwise
>>> i rather doubt them.
>>>
>> MCE errors are created by the CPU. Linux only reacts to it.
>
> Ok, but in the MCE error it does not say the trip temperature, so
> i wonder if one can validate that the trip temperature is really
> 100C for the 6400 CPU. Because if it is, then i would trust the Core 0/1
> sensor readings more and conclude the temp2 is wrong... and wonder if/how
> i can fixup some lm_sensors config to fix it up.
>
If you can, I don't know how.

>>> Alas, i only have another linux with quad-core AMD, and that shows nicely idling
>>> at 32C and full load not above 42 and the CPU and temp sensors look
>>> comparable.
>>>
>> That is an apples-to-oranges comparison, though. With the same logic
>> I could argue that all six servers I have online right now are fine,
>> therefore you don't have a problem.
>
> I just brought it up for two reasons:
> - My other linux does have consistent info across different sensors
> - If AMD is really runing cooler, maybe my next mobo should be AMD again ;-)
>    (but the idea of course here is to keep this running as long as possible).
>
Your call, really, which CPU to use.

>> Automatic fan control is what I meant. Guess if the chip is configured
>> to run fans at full speed if the temperature shows 50 degrees C you
>> might be ok. Question though is if temp2 gets there with the quad
>> core CPU. It might be that the quad core CPU needs a lower limit
>> to start running fans at full speed. Just guessing, though.
>
> Yeah, but as stated up front. Lets forget the quad core CPU:
>
> temp2 shows me temperatures between 30C and 60C, and when i stop the
> fan and restart, i see the mobo fan control change speed at 50C on temp2,
> which is also what is configured in the BIOS. If i go after restart into
> the BIOS i see a temperate between 30C and 40C which makes me think
> that the BIOS does rely on the temp2 sensor and that the BIOS thinks
> the CPU has temperatures between 30C and 60C.  Which is inconsistent
> with the higher temp readings on the Core sensors: - 50C..100C
>
But you don't have a problem with the dual core CPU, or do you ?
I think you are chasing the wrong problem. You insist in seeing the correct
and same temperature on both coretemp and temp2, but that doesn't really matter.
Again, the only thing that matters is how close the reported temperature gets
to the critical temperature.

In other words, even if you get both coretemp and temp2 output to agree,
you'll still see the problem with the quad core CPU.

>>> Yeah, its a 2008 board, but runs latest BIOS.
>>>
>> Is the new CPU listed as supported ? Also, again, can you give me the model
>> of the quad core CPU ?
>
> Again, lets forget the quad core right now. these are all right now numbers
> with the proven old dual core.
>
Do you see any errors with the old CPU ? I thought you didn't.

At this point I would suggest to play with the tjmax parameter until you get
all the temperatures to agree. I would suggest to do some more research
to ensure that you select the correct tjmax for your CPU. Then repeat the
same with the quad core CPU. My suspicion is that the BIOS may not set the
limits for the quad core CPU correctly, which may cause it to run hot.

Guenter

---
[1] http://www.tomshardware.co.uk/intel-dts-specs,news-29460.html
[2] http://www.tomshardware.com/forum/245128-29-e6300-6400-stepping-computronix


_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [lm-sensors] lm-sensors: which temperature sensor is lying ?
  2014-07-12 14:57 [lm-sensors] lm-sensors: which temperature sensor is lying ? Toerless Eckert
                   ` (4 preceding siblings ...)
  2014-07-12 23:06 ` Guenter Roeck
@ 2014-07-13 18:34 ` Toerless Eckert
  2014-07-13 19:08 ` Guenter Roeck
  2014-07-16  5:40 ` Toerless Eckert
  7 siblings, 0 replies; 9+ messages in thread
From: Toerless Eckert @ 2014-07-13 18:34 UTC (permalink / raw
  To: lm-sensors

Is there a way to figure out whether the temperatures
from "coretemp-isa-0000, Adapter: ISA adapter" are coming from PECI
or not ?

Eg: Is there a way to ask the CPU itself for these temperature sensor
data without using those PECI pins ?

Cheers
    Toerless

_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [lm-sensors] lm-sensors: which temperature sensor is lying ?
  2014-07-12 14:57 [lm-sensors] lm-sensors: which temperature sensor is lying ? Toerless Eckert
                   ` (5 preceding siblings ...)
  2014-07-13 18:34 ` Toerless Eckert
@ 2014-07-13 19:08 ` Guenter Roeck
  2014-07-16  5:40 ` Toerless Eckert
  7 siblings, 0 replies; 9+ messages in thread
From: Guenter Roeck @ 2014-07-13 19:08 UTC (permalink / raw
  To: lm-sensors

On 07/13/2014 11:34 AM, Toerless Eckert wrote:
> Is there a way to figure out whether the temperatures
> from "coretemp-isa-0000, Adapter: ISA adapter" are coming from PECI
> or not ?
>
> Eg: Is there a way to ask the CPU itself for these temperature sensor
> data without using those PECI pins ?
>

That is exactly what coretemp is doing; it reads the temperature directly
from CPU registers. The SuperIO chip gets the data through PECI. It is
exactly the same data, though, ie the underlying sensor in the CPU is the same.
See [1] and [2] for some more details.

Neither PECI nor the internal sensor provide absolute temperatures,
but only a difference to the maximum permitted temperature or TjMax.

Both coretemp and the SuperIO chip driver have to be configured for the
maximum temperature in order to be able to calculate and report the temperature
in degrees C. On top of that, the reported value is known to be inaccurate
for lower temperatures. This means it can only really be relied on for high
temperatures, and the value reported for low temperatures can be highly
inaccurate (it can easily be 30 degrees C off for Atom CPUs, for example).
More recent Intel CPUs have a register to read Tjmax, but older CPUs
like yours don't, and there is always a guessing game what the correct
TjMax value for a given CPU may be. This becomes even more complicated
if different CPU revisions have different values for TjMax, as seems to be
the case for your dual core CPU. All one can hope for is that the BIOS programs
the SuperIO chip correctly, though it is quite common that this is not the case,
especially with older CPUs where the value of TjMax can not be read from the CPU
itself. Since SuperIO chips depend on the absolute temperature for automatic
fan speed control, this can cause problems with fan control, especially if the
CPU runs hotter than the SuperIO chip believes. I _suspect_ this may be what is
happening in your case with the quad core CPU, but obviously that is difficult
to confirm since we don't know any details about it.

Guenter

---
[1] http://en.wikipedia.org/wiki/Platform_Environment_Control_Interface
[2] http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/cpu-monitoring-dts-peci-paper.pdf

_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [lm-sensors] lm-sensors: which temperature sensor is lying ?
  2014-07-12 14:57 [lm-sensors] lm-sensors: which temperature sensor is lying ? Toerless Eckert
                   ` (6 preceding siblings ...)
  2014-07-13 19:08 ` Guenter Roeck
@ 2014-07-16  5:40 ` Toerless Eckert
  7 siblings, 0 replies; 9+ messages in thread
From: Toerless Eckert @ 2014-07-16  5:40 UTC (permalink / raw
  To: lm-sensors

Thanks, Guenther,

So, wrt to coretemp: 

1. /usr/src/linux/Documentation/hwmon/coretemp isn't too shabby,
   but i wish there was coretemp(8) with a user facing version of it.
   Lots of linux systems without source tree loaded.

   Salient information: driver for intel CPU, reads temperatures
   from CPU, independent of chipset, reported value by CPU is delta to
   TJmax temperature, depending on CPU, TJmax may be guessed by driver
   and/or known from CPU, if guessed, it can be fixed by module parameter.

2. This brings me to the problem: coretemp does not expose whether or
   not TJmax is guessed or known "authoritatively" (from CPU, from
   table in coretemp.c). If you know its not authoritative from CPU,
   you've got some incentive trying to hunt down what might be a
   correct tjmax value to load the module with. Maybe the status
   of tjmax should be exported via some appropriate /sys object and
   that displayed by sensors().

   There was for example no info in Documentation/hwmon/coretemp for
   my Conree E6400, and TJmax was just guessed by the driver with the
   default of 100C. When i set it to 70C, then the temp values
   become pretty much like what the w83627dhg shows me. And Tcase
   is spec'ed by intel at 61C, so maybe 70C is quite right (or 75C,
   somewhere in the neighborhood)..

3. To me it looks wrong that coretemp.c reads out MSR 1A2H 15:8
   and uses that to calculate ttarget - when tjmax is just guessed
   because MSR 1A2H 23:16 is zero - as is on my E6400 Core2 duo.

   And coretemp.c already has the code to not expose ttarget, it
   should just invoke it whenever it can't determine tjmax either
   from MSR.

   Just saying ;-)

Cheers
    Toerless
On Sun, Jul 13, 2014 at 12:08:38PM -0700, Guenter Roeck wrote:
> On 07/13/2014 11:34 AM, Toerless Eckert wrote:
> >Is there a way to figure out whether the temperatures
> >from "coretemp-isa-0000, Adapter: ISA adapter" are coming from PECI
> >or not ?
> >
> >Eg: Is there a way to ask the CPU itself for these temperature sensor
> >data without using those PECI pins ?
> >
> 
> That is exactly what coretemp is doing; it reads the temperature directly
> from CPU registers. The SuperIO chip gets the data through PECI. It is
> exactly the same data, though, ie the underlying sensor in the CPU is the same.
> See [1] and [2] for some more details.
> 
> Neither PECI nor the internal sensor provide absolute temperatures,
> but only a difference to the maximum permitted temperature or TjMax.
> 
> Both coretemp and the SuperIO chip driver have to be configured for the
> maximum temperature in order to be able to calculate and report the temperature
> in degrees C. On top of that, the reported value is known to be inaccurate
> for lower temperatures. This means it can only really be relied on for high
> temperatures, and the value reported for low temperatures can be highly
> inaccurate (it can easily be 30 degrees C off for Atom CPUs, for example).
> More recent Intel CPUs have a register to read Tjmax, but older CPUs
> like yours don't, and there is always a guessing game what the correct
> TjMax value for a given CPU may be. This becomes even more complicated
> if different CPU revisions have different values for TjMax, as seems to be
> the case for your dual core CPU. All one can hope for is that the BIOS programs
> the SuperIO chip correctly, though it is quite common that this is not the case,
> especially with older CPUs where the value of TjMax can not be read from the CPU
> itself. Since SuperIO chips depend on the absolute temperature for automatic
> fan speed control, this can cause problems with fan control, especially if the
> CPU runs hotter than the SuperIO chip believes. I _suspect_ this may be what is
> happening in your case with the quad core CPU, but obviously that is difficult
> to confirm since we don't know any details about it.
> 
> Guenter
> 
> ---
> [1] http://en.wikipedia.org/wiki/Platform_Environment_Control_Interface
> [2] http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/cpu-monitoring-dts-peci-paper.pdf

-- 
---
Toerless.Eckert@informatik.uni-erlangen.de
/CÞ/AÔ00/P=uni-erlangen/OU=informatik/Sìkert/G=Toerless/

_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-07-16  5:40 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-12 14:57 [lm-sensors] lm-sensors: which temperature sensor is lying ? Toerless Eckert
2014-07-12 17:29 ` Guenter Roeck
2014-07-12 18:17 ` Toerless Eckert
2014-07-12 18:43 ` Guenter Roeck
2014-07-12 22:30 ` Toerless Eckert
2014-07-12 23:06 ` Guenter Roeck
2014-07-13 18:34 ` Toerless Eckert
2014-07-13 19:08 ` Guenter Roeck
2014-07-16  5:40 ` Toerless Eckert

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.