All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* "No probed ethernet devices" caused by inaccurate msec_delay()
@ 2014-01-27  2:56 Sangjin Han
       [not found] ` <CAPG33HRE3kqHEtn55e_YUoSuwFVTcYz0ZZVjDUXr17TmXqZDQA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Sangjin Han @ 2014-01-27  2:56 UTC (permalink / raw
  To: dev-VfR2kkLFssw

Hi,

I encountered this error message when I tried to use the testpmd application.

Cause: No probed ethernet devices - check that
CONFIG_RTE_LIBRTE_IGB_PMD=y and that CONFIG_RTE_LIBRTE_EM_PMD=y and
that CONFIG_RTE_LIBRTE_IXGBE_PMD=y in your configuration file

which is caused by rte_eth_dev_count() == 0. However, my 82599 ports
are already unbound from ixgbe. (I have two Xeon X5560 (@ 2.80GHz)
processors and two X520-DA2 cards).

I googled for possible causes and came across a similar case:
http://openetworking.blogspot.com/2014/01/debugging-no-probed-ethernet-devices.html

Based on the article, I dug into the source code, and found the cause:

ixgbe_82599.c: ixgbe_reset_pipeline_82599()
...
for (i = 0; i < 10; i++) {
        msec_delay(4);
        anlp1_reg = IXGBE_READ_REG(hw, IXGBE_ANLP1);
        if (anlp1_reg & IXGBE_ANLP1_AN_STATE_MASK)
                break;
}

if (!(anlp1_reg & IXGBE_ANLP1_AN_STATE_MASK)) {
        DEBUGOUT("auto negotiation not completed\n");
        ret_val = IXGBE_ERR_RESET_FAILED;
        goto reset_pipeline_out;
}
...

The number of iterations (== 10) in the for loop was not enough. In my
case, it needed to be at least 12, then everything worked fine.

The issue was that msec_delay() is not very accurate on my system.
While it reads the CPU Hz info from /proc/cpuinfo, it may not reflect
the actual TSCs/sec. Since I did not disable the P-State feature ,
/proc/cpuinfo reports 1.6GHz, but my TSC counter is 2.8GHz. As a
result, msec_delay(4) only waited 2.x milliseconds, which in turn
causes the failure.

I think /proc/cpuinfo is not a reliable way to get
eal_tsc_resolution_hz, since it varies based on the current CPU clock
frequency. Enforcing applications to run at the max frequency can be
too restrictive. It would be nice if I can bypass
set_tsc_freq_from_cpuinfo() in set_tsc_freq().

Thanks,
Sangjin

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: "No probed ethernet devices" caused by inaccurate msec_delay()
       [not found] ` <CAPG33HRE3kqHEtn55e_YUoSuwFVTcYz0ZZVjDUXr17TmXqZDQA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-01-27  9:19   ` Thomas Monjalon
       [not found]     ` <201401271019.00293.thomas.monjalon-pdR9zngts4EAvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Thomas Monjalon @ 2014-01-27  9:19 UTC (permalink / raw
  To: Sangjin Han; +Cc: dev-VfR2kkLFssw

Hello,

27/01/2014 03:56, Sangjin Han:
> Cause: No probed ethernet devices - check that
> CONFIG_RTE_LIBRTE_IGB_PMD=y and that CONFIG_RTE_LIBRTE_EM_PMD=y and
> that CONFIG_RTE_LIBRTE_IXGBE_PMD=y in your configuration file
[...] 
> I googled for possible causes and came across a similar case:
> http://openetworking.blogspot.com/2014/01/debugging-no-probed-ethernet-devi
> ces.html
[...]
>         msec_delay(4);
[...]
> I think /proc/cpuinfo is not a reliable way to get
> eal_tsc_resolution_hz, since it varies based on the current CPU clock
> frequency. Enforcing applications to run at the max frequency can be
> too restrictive.

Indeed, as described in the quick start page, the highest frequency must be 
set: http://dpdk.org/doc/quick-start

> It would be nice if I can bypass set_tsc_freq_from_cpuinfo() in
> set_tsc_freq().

I think it would not solve the problem because your clock is varying and the 
TSC calibration must be updated accordingly with different values by core.

Feel free to submit a patch if you find a smart solution.
-- 
Thomas

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: "No probed ethernet devices" caused by inaccurate msec_delay()
       [not found]     ` <201401271019.00293.thomas.monjalon-pdR9zngts4EAvxtiuMwx3w@public.gmane.org>
@ 2014-01-28  1:16       ` Sangjin Han
       [not found]         ` <CAPG33HQW9onUHh+M7w26hB-0+us2dtY1A3Bj68b-+K1fnEt5KA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Sangjin Han @ 2014-01-28  1:16 UTC (permalink / raw
  To: Thomas Monjalon; +Cc: dev-VfR2kkLFssw

Hi,

>> It would be nice if I can bypass set_tsc_freq_from_cpuinfo() in
>> set_tsc_freq().
>
> I think it would not solve the problem because your clock is varying and the
> TSC calibration must be updated accordingly with different values by core.

Reasonably new Intel CPUs (including Nehalem) has a constant TSC rate,
regardless of the current P/C-state (constant_tsc and nonstop_tsc
flags in /proc/cpuinfo). So TSC calibration is unnecessary even with
variable clock frequency on those CPUs.

Also, it seems that there is no guarantee that the TSC rate is
identical to the CPU max clock frequency. While it happens to be true
for Intel CPUs, this article from AMD says,
(https://lkml.org/lkml/2005/11/4/173)

"The rate of the invariant TSC is implementation-dependent and will
likely *not* be the frequency of the processor core [...]"

It would be great if someone can actually measure TSC rate on AMD
processors to verify this.

I would like to suggest two possible options:

1. If we can assume that the TSC rate always equals to the max clock
frequency, then we can use
/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq instead of
/proc/cpuinfo (which reflects cpuinfo_cur_freq).

2. If we can't (AMD?), we can simply get rid of
set_tsc_freq_from_cpuinfo() and fall back to set_tsc_freq_from_clock()
or set_tsc_freq_ballback() instead. I always get reasonably good
accuracy with those two functions -- the only drawback is that it
takes 0.5 - 1 second for applications to boot up. Not sure if it is a
big deal or not, though.

---

Besides the TSC frequency, the 4ms * 10 delay in
ixgbe_reset_pipeline_82599() seems too tight. On my system, it
succeeds only after 7 (or so) iterations with correct msec_delay().
The per-iteration delay (4ms; in the kernel ixgbe driver, it is set to
be 4-8ms) and/or the number of iterations (10) should be increased, I
suppose.

Sangjin

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: "No probed ethernet devices" caused by inaccurate msec_delay()
       [not found]         ` <CAPG33HQW9onUHh+M7w26hB-0+us2dtY1A3Bj68b-+K1fnEt5KA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-01-28 16:23           ` Thomas Monjalon
       [not found]             ` <201401281723.22155.thomas.monjalon-pdR9zngts4EAvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Thomas Monjalon @ 2014-01-28 16:23 UTC (permalink / raw
  To: Sangjin Han; +Cc: dev-VfR2kkLFssw

28/01/2014 02:16, Sangjin Han:
> >> It would be nice if I can bypass set_tsc_freq_from_cpuinfo() in
> >> set_tsc_freq().
> > 
> > I think it would not solve the problem because your clock is varying and
> > the TSC calibration must be updated accordingly with different values by
> > core.

[...]
> Also, it seems that there is no guarantee that the TSC rate is
> identical to the CPU max clock frequency.

So you may submit a revert of the commit a46154b9c6bc
(timer: get TSC frequency from /proc/cpuinfo)

> I would like to suggest two possible options:
> 
> 1. If we can assume that the TSC rate always equals to the max clock
> frequency, then we can use
> /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq instead of
> /proc/cpuinfo (which reflects cpuinfo_cur_freq).
> 
> 2. If we can't (AMD?), we can simply get rid of
> set_tsc_freq_from_cpuinfo() and fall back to set_tsc_freq_from_clock()
> or set_tsc_freq_ballback() instead. I always get reasonably good
> accuracy with those two functions -- the only drawback is that it
> takes 0.5 - 1 second for applications to boot up. Not sure if it is a
> big deal or not, though.

Maybe that you can choose between these two methods with a runtime option.

> Besides the TSC frequency, the 4ms * 10 delay in
> ixgbe_reset_pipeline_82599() seems too tight. On my system, it
> succeeds only after 7 (or so) iterations with correct msec_delay().
> The per-iteration delay (4ms; in the kernel ixgbe driver, it is set to
> be 4-8ms) and/or the number of iterations (10) should be increased, I
> suppose.

Feel free to submit a patch.

-- 
Thomas

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: "No probed ethernet devices" caused by inaccurate msec_delay()
       [not found]             ` <201401281723.22155.thomas.monjalon-pdR9zngts4EAvxtiuMwx3w@public.gmane.org>
@ 2014-01-28 18:13               ` Stephen Hemminger
  0 siblings, 0 replies; 5+ messages in thread
From: Stephen Hemminger @ 2014-01-28 18:13 UTC (permalink / raw
  To: Thomas Monjalon; +Cc: dev-VfR2kkLFssw

TSC has lots of platform related issues. It is not guaranteed sync'd across physical
packages and AMD boxes have lots of problems.

Why does delay_ms not just use nanosleep() and let the OS worry about it?
On a related note, I have found that putting the worker (non master) threads
into real time scheduling class also helps.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-01-28 18:13 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-27  2:56 "No probed ethernet devices" caused by inaccurate msec_delay() Sangjin Han
     [not found] ` <CAPG33HRE3kqHEtn55e_YUoSuwFVTcYz0ZZVjDUXr17TmXqZDQA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-01-27  9:19   ` Thomas Monjalon
     [not found]     ` <201401271019.00293.thomas.monjalon-pdR9zngts4EAvxtiuMwx3w@public.gmane.org>
2014-01-28  1:16       ` Sangjin Han
     [not found]         ` <CAPG33HQW9onUHh+M7w26hB-0+us2dtY1A3Bj68b-+K1fnEt5KA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-01-28 16:23           ` Thomas Monjalon
     [not found]             ` <201401281723.22155.thomas.monjalon-pdR9zngts4EAvxtiuMwx3w@public.gmane.org>
2014-01-28 18:13               ` Stephen Hemminger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.