Re: ipmitool fru write 0 - does not update "baseboard" FRU

Openbmc archive mirror
 help / color / mirror / Atom feed

From: Willy Tu <wltu@google.com>
To: Oskar Senft <osk@google.com>
Cc: OpenBMC Maillist <openbmc@lists.ozlabs.org>,
	Ali El-Haj-Mahmoud <aaelhaj@google.com>
Subject: Re: ipmitool fru write 0 - does not update "baseboard" FRU
Date: Thu, 9 May 2024 09:17:19 -0700	[thread overview]
Message-ID: <CAHwn2XmDgOuBMZDNkn0cQidOu2uGdW2Gx4JjQ6khUFvgSndX=w@mail.gmail.com> (raw)
In-Reply-To: <CABoTLcRGzSPdu4On1wfCFqSfqZNozpH+3J=ST6==toqQ5NVXeA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 11159 bytes --]

I am not aware of a way to differentiate between the clients. Might just
have to error out as you suggested.

Willy Tu

On Tue, May 7, 2024 at 5:05 PM Oskar Senft <osk@google.com> wrote:

> I got some code changes that look right, but haven't done much testing
> yet. I realized that a read cache is useful, so I kept it. I split read and
> write cache by defining a new class that keeps all the related data
> together.
>
> Any suggestions on how to distinguish different clients? Or maybe we just
> error out when we receive a request for a different FRU while we're still
> not done with the first one?
>
> Oskar.
>
> On Tue, May 7, 2024 at 7:49 PM Willy Tu <wltu@google.com> wrote:
>
>> I think we should make it so that if a different IPMI client tries to
>> write the Fru we prevent it and only allow one write at a time. I think the
>> fruCache for Read is mostly from some commands that is dealing with one
>> device ID multiple times. When reading subsequent ids... then it doesn't do
>> anything.
>>
>> I think we will need more discussion on that topic since it will be a
>> larger refactor to make that work.
>>
>> Sounds good. Please let me know how it goes.
>>
>> Willy Tu
>>
>> On Tue, May 7, 2024 at 10:32 AM Oskar Senft <osk@google.com> wrote:
>>
>>> Hi Willy
>>>
>>> Thanks for your input!
>>>
>>> From what I can tell, the current implementation will fail in wondrous
>>> ways if there's more than one IPMI client trying to write FRU at the same
>>> time. The existing getFru guards against changing target devId between
>>> calls to not hand-out the same cache for different requests. However, this
>>> will clearly break when different IPMI clients attempt to write the same or
>>> different FRUs at the same time.
>>>
>>> We could argue whether that's a supported use case or if we just assume
>>> that'll never happen ... it does seem like quite a bit of an edge case,
>>> though.
>>>
>>> I do see it as an issue if there were multiple clients with only one
>>> writing, but others reading - that'll fail in similarly weird ways.
>>>
>>> I'm wondering: Do we want to have the fruCache for _reads_ at all? It
>>> seems actually quite wrong, since subsequent reads for the same FRU would
>>> always return the same result, even if the FRU changed through some other
>>> mechanism.
>>>
>>> Let me work on a fix that would use the cache only for writing and would
>>> keep it around until the timeout expired.
>>>
>>> Oskar.
>>>
>>> On Fri, May 3, 2024 at 11:38 AM Willy Tu <wltu@google.com> wrote:
>>>
>>>> Hi Oskar,
>>>>
>>>> > C) ipmiStorageWriteFruData clears the cache immediately after
>>>> WriteFru was called. Maybe we should keep that cache around until the
>>>> timeout expires?
>>>>
>>>> It seems like this is an issue of multiple clients taking to ipmid. In
>>>> the middle of writing... There is another client that is reading or
>>>> something
>>>> else and will reset the fruCache as you mentioned. In that case, I
>>>> think it may be best to refactor it out to use another getFru method...
>>>> maybe like getFruToWrite
>>>>
>>>> Maybe just something like this
>>>>
>>>> ```
>>>> std::vector<uint8_t> getFruToWrite(...){
>>>>   if (writeTimer->isRunning()) {
>>>>     return fruCacheForWrite;
>>>>   }
>>>>   auto [_, fruCacheForWrite] = getFru(...);
>>>>   return fruCacheForWrite;
>>>> }
>>>> ```
>>>>
>>>> Also need to change `writeFruCache` and such to use `fruCacheForWrite`
>>>> and such.
>>>>
>>>> Please let me know if that works for you. Feel free to reach out
>>>> internally for faster feedback.
>>>>
>>>> Willy Tu
>>>>
>>>> On Thu, May 2, 2024 at 11:32 AM Oskar Senft <osk@google.com> wrote:
>>>>
>>>>> Hi everyone
>>>>>
>>>>> tl;dr - Attempting "ipmitool fru write" with an input file that
>>>>> contains additional bytes beyond the actual FRU data does not actually
>>>>> update the FRU on OpenBMC at head w/ entity-manager.
>>>>>
>>>>> Details:
>>>>>
>>>>> I ran into an issue with updating the "baseboard" FRU (0), which is
>>>>> stored as /etc/fru/baseboard.fru.bin. However, I don't think this is
>>>>> specific to FRU 0 and could apply to other updateable FRUs in the same
>>>>> way.
>>>>>
>>>>>
>>>>> 1. Start off with a "minimal" /etc/fru/baseboard.fru.bin which just
>>>>> contains chassis type (that's required for entity-manager's fru-device
>>>>> to recognize the file.
>>>>>
>>>>> root@akita:~# ls -al /etc/fru/baseboard.fru.bin
>>>>> -rw-r--r--    1 root     root           512 Sep 21 05:21
>>>>> /etc/fru/baseboard.fru.bin
>>>>>
>>>>> root@akita:~# hexdump -C /etc/fru/baseboard.fru.bin
>>>>> 00000000  01 00 01 00 00 00 00 fe  01 01 17 c0 c0 c1 00 a6
>>>>> |................|
>>>>> 00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
>>>>> |................|
>>>>> *
>>>>> 00000200
>>>>>
>>>>> root@akita:~# ipmitool fru print 0
>>>>>  Chassis Type          : Rack Mount Chassis
>>>>>  Chassis Area Checksum : OK
>>>>>
>>>>>
>>>>> 2. Prepare a FRU file with additional data, e.g. with frugy:
>>>>>
>>>>> (frugy) osk@osk:~$ cat demo.yml
>>>>> ChassisInfo:
>>>>>   type: 23
>>>>>   part_number: '4711'
>>>>>   serial_number: '12345'
>>>>>
>>>>> (frugy) osk@osk:~$ frugy demo.yml -o demo.bin
>>>>>
>>>>> (frugy) osk@osk:~$ ls -al demo.bin
>>>>> -rw-r----- 1 osk primarygroup 24 May  2 13:51 demo.bin
>>>>>
>>>>> (frugy) osk@osk:~$ hexdump -C demo.bin
>>>>> 00000000  01 00 01 00 00 00 00 fe  01 02 17 c4 34 37 31 31
>>>>> |............4711|
>>>>> 00000010  c5 31 32 33 34 35 c1 d0                           |.12345..|
>>>>> 00000018
>>>>>
>>>>> Note that frugy generates a minimal binary by default. However, other
>>>>> processes may not and instead produce a fixed size binary blob. This
>>>>> happens, e.g. when performing "ipmitool fru read" which returns the
>>>>> whole contents of the underlying storage. A subsequent "ipmitool fru
>>>>> write" with that blob will fail as explained here.
>>>>>
>>>>> To simulate this here, increase the file to 256 bytes:
>>>>>
>>>>> (frugy) osk@osk:~$ cp demo.bin demo-256.bin
>>>>> (frugy) osk@osk:~$ dd if=/dev/zero bs=1 seek=256 count=0
>>>>> of=demo-256.bin
>>>>> 0+0 records in
>>>>> 0+0 records out
>>>>> 0 bytes copied, 5.1138e-05 s, 0.0 kB/s
>>>>>
>>>>> (frugy) osk@osk:~$ ls -al demo-256.bin
>>>>> -rw-r----- 1 osk primarygroup 256 May  2 13:57 demo-256.bin
>>>>>
>>>>> (frugy) osk@osk:~$ hexdump -C demo-256.bin
>>>>> 00000000  01 00 01 00 00 00 00 fe  01 02 17 c4 34 37 31 31
>>>>> |............4711|
>>>>> 00000010  c5 31 32 33 34 35 c1 d0  00 00 00 00 00 00 00 00
>>>>> |.12345..........|
>>>>> 00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
>>>>> |................|
>>>>> *
>>>>> 00000100
>>>>>
>>>>>
>>>>> 3. Write the newly generated FRU:
>>>>>
>>>>> root@akita:~# ipmitool fru write 0 demo-256.bin
>>>>> Fru Size         : 512 bytes
>>>>> Size to Write    : 256 bytes
>>>>>
>>>>> Wait ~10 seconds for the fru-device process to reload the contents.
>>>>>
>>>>>
>>>>> 4. Re-read the FRU contents:
>>>>>
>>>>> root@akita:~# ipmitool fru print 0
>>>>>  Chassis Type          : Rack Mount Chassis
>>>>>  Chassis Area Checksum : OK
>>>>>
>>>>>
>>>>> 5. For comparison, write only the minimal FRU:
>>>>>
>>>>> root@akita:~# ipmitool fru write 0 demo.bin
>>>>> Fru Size         : 512 bytes
>>>>> Size to Write    : 24 bytes
>>>>>
>>>>> Wait ~10 seconds.
>>>>>
>>>>> root@akita:~# ipmitool fru print 0
>>>>>  Chassis Type          : Rack Mount Chassis
>>>>>  Chassis Part Number   : 4711
>>>>>  Chassis Serial        : 12345
>>>>>  Chassis Area Checksum : OK
>>>>>
>>>>>
>>>>> I dug into this and found that this is caused by an interaction
>>>>> between
>>>>> https://github.com/openbmc/phosphor-host-ipmid/blob/master/dbus-sdr/storagecommands.cpp
>>>>> and
>>>>> https://github.com/openbmc/entity-manager/blob/master/src/fru_device.cpp
>>>>> .
>>>>>
>>>>> Here's what happens:
>>>>> - The "ipmitool fru write" request is handled by storagecommands.cpp
>>>>> ipmiStorageWriteFruData.
>>>>>
>>>>> - ipmiStorageWriteFruData receives the whole FRU blob from IPMI across
>>>>> several calls. On the initial call, it requests the current FRU blob
>>>>> via D-bus from fru-device and caches it in the process. It then
>>>>> overwrites sections of that cache with the received data from IPMI.
>>>>>
>>>>> - ipmiStorageWriteFruData uses information from the FRU header to
>>>>> check whether it received all the bytes that make up the new FRU. Note
>>>>> that this could be fewer bytes than the size of the cached blob. Once
>>>>> it receives all the bytes for the new FRU, it calls
>>>>> /xyz/openbmc_project/FruDevice WriteFru via D-Bus with the full FRU
>>>>> blob (i.e. the full cache with modifications on top). After that the
>>>>> cache is cleared.
>>>>>
>>>>> - The D-Bus call to WriteFru is handled by fru_device.cpp writeFRU. It
>>>>> first performs a sanity check and then writes the blob to the
>>>>> underlying device (or the /etc/fru/baseboard.fru.bin for FRU 0). It
>>>>> subsequently calls rescanBusses() which actually executes after about
>>>>> 1 second.
>>>>>
>>>>> - Meanwhile, "ipmitool fru write" continues to happily send additional
>>>>> bytes to ipmiStorageWriteFruData (as in step #3 above). Since its
>>>>> cache has just been cleared, it retrieves the current FRU from
>>>>> fru-device again. However, since that has not yet completed
>>>>> "rescanBusses", it actually received the original FRU again, not the
>>>>> modified version. The above cycle repeats, but this time the original
>>>>> FRU with additional modifications from the additional bytes.
>>>>>
>>>>> In the best case (if the new FRU data is larger than the original FRU
>>>>> data), the result is that the FRU did not get updated at all, since we
>>>>> effectively just wrote back the original FRU. However, if the new FRU
>>>>> data is smaller than the original FRU data, this may result in
>>>>> corrupted FRU data to be persisted.
>>>>>
>>>>>
>>>>> I was trying to figure out how to best fix that, but couldn't come up
>>>>> with a design that would still work. Some ideas:
>>>>>
>>>>> A)  I think what we're missing is feedback from fru-device to ipmid
>>>>> that the FRU write operation has actually completed and the FRU data
>>>>> was re-read. The ipmid should probably not accept any additional write
>>>>> requests until the previous write request has completed and the new
>>>>> FRU data is available.
>>>>>
>>>>> B) If ipmiStorageWriteFruData cannot detect the expected size of the
>>>>> FRU data, it relies on a timeout to eventually write the blob if no
>>>>> more data was received from IPMI. I wonder if this mechanism is "too
>>>>> clever" and we should simply always just wait for the timeout?
>>>>>
>>>>> C) ipmiStorageWriteFruData clears the cache immediately after WriteFru
>>>>> was called. Maybe we should keep that cache around until the timeout
>>>>> expires?
>>>>>
>>>>> Thoughts?
>>>>>
>>>>> Thanks
>>>>> Oskar.
>>>>>
>>>>

[-- Attachment #2: Type: text/html, Size: 12899 bytes --]

next prev parent reply	other threads:[~2024-05-09 16:18 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-02 18:31 ipmitool fru write 0 - does not update "baseboard" FRU Oskar Senft
2024-05-03 15:38 ` Willy Tu
2024-05-07 17:32   ` Oskar Senft
2024-05-07 23:49     ` Willy Tu
2024-05-08  0:04       ` Oskar Senft
2024-05-09 16:17         ` Willy Tu [this message]
2024-05-20 13:03           ` Oskar Senft

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHwn2XmDgOuBMZDNkn0cQidOu2uGdW2Gx4JjQ6khUFvgSndX=w@mail.gmail.com' \
    --to=wltu@google.com \
    --cc=aaelhaj@google.com \
    --cc=openbmc@lists.ozlabs.org \
    --cc=osk@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).