From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.3 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 62B4CC47094 for ; Mon, 7 Jun 2021 18:00:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 44573610FB for ; Mon, 7 Jun 2021 18:00:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231222AbhFGSCr (ORCPT ); Mon, 7 Jun 2021 14:02:47 -0400 Received: from m43-7.mailgun.net ([69.72.43.7]:15205 "EHLO m43-7.mailgun.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231196AbhFGSCp (ORCPT ); Mon, 7 Jun 2021 14:02:45 -0400 DKIM-Signature: a=rsa-sha256; v=1; c=relaxed/relaxed; d=mg.codeaurora.org; q=dns/txt; s=smtp; t=1623088854; h=In-Reply-To: Content-Type: MIME-Version: References: Message-ID: Subject: Cc: To: From: Date: Sender; bh=MhNgR0FkG4rgjfZcUWlvfsKR+eDoYx1dBxdGDjf2ZHo=; b=lrOxqneZEPVBMwSk7xuHns93bPYwlRdlmrbCfDlHPRg7MoPrf1hlFrB9Ss+Iq0+Qfbj4d1WH 73PQg6DfkIZ5gkXLuI4wSgarGZ6fYRUfzXS6D904IwT83QC3vhiTJo7S+Tke5CIiFES+ykMI HFIhtoZJrpiuDfa9E6sE4CSkVX8= X-Mailgun-Sending-Ip: 69.72.43.7 X-Mailgun-Sid: WyI0MWYwYSIsICJsaW51eC1rZXJuZWxAdmdlci5rZXJuZWwub3JnIiwgImJlOWU0YSJd Received: from smtp.codeaurora.org (ec2-35-166-182-171.us-west-2.compute.amazonaws.com [35.166.182.171]) by smtp-out-n03.prod.us-east-1.postgun.com with SMTP id 60be5ebee27c0cc77fc2a0ef (version=TLS1.2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256); Mon, 07 Jun 2021 18:00:30 GMT Sender: jackp=codeaurora.org@mg.codeaurora.org Received: by smtp.codeaurora.org (Postfix, from userid 1001) id BC1EBC43145; Mon, 7 Jun 2021 18:00:29 +0000 (UTC) Received: from jackp-linux.qualcomm.com (i-global254.qualcomm.com [199.106.103.254]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) (Authenticated sender: jackp) by smtp.codeaurora.org (Postfix) with ESMTPSA id 10ED3C433D3; Mon, 7 Jun 2021 18:00:27 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 smtp.codeaurora.org 10ED3C433D3 Authentication-Results: aws-us-west-2-caf-mail-1.web.codeaurora.org; dmarc=none (p=none dis=none) header.from=codeaurora.org Authentication-Results: aws-us-west-2-caf-mail-1.web.codeaurora.org; spf=fail smtp.mailfrom=jackp@codeaurora.org Date: Mon, 7 Jun 2021 11:00:23 -0700 From: Jack Pham To: Felipe Balbi Cc: Alexandru Elisei , Greg Kroah-Hartman , p.zabel@pengutronix.de, linux-usb@vger.kernel.org, Linux Kernel Mailing List , arm-mail-list , sanm@codeaurora.org Subject: Re: [BUG] usb: dwc3: Kernel NULL pointer dereference in dwc3_remove() Message-ID: <20210607180023.GA23045@jackp-linux.qualcomm.com> References: <87r1hjcvf6.fsf@kernel.org> <70be179c-d36b-de6f-6efc-2888055b1312@arm.com> <8272121c-ac8a-1565-a047-e3a16dcf13b0@arm.com> <877djbc8xq.fsf@kernel.org> <20210603173632.GA25299@jackp-linux.qualcomm.com> <87mts6avnn.fsf@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87mts6avnn.fsf@kernel.org> User-Agent: Mutt/1.9.4 (2018-02-28) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Felipe, On Fri, Jun 04, 2021 at 11:20:12AM +0300, Felipe Balbi wrote: > Jack Pham writes: > >> >>>> Alexandru Elisei writes: > >> >>>>> I've been able to bisect the panic and the offending commit is 568262bf5492 ("usb: > >> >>>>> dwc3: core: Add shutdown callback for dwc3"). I can provide more diagnostic > >> >>>>> information if needed and I can help test the fix. > >> >>>> if you simply revert that commit in HEAD, does the problem really go > >> >>>> away? > >> >>> Kernel built from commit 324c92e5e0ee, which is the kernel tip today, the panic is > >> >>> there. Reverting the offending commit, 568262bf5492, makes the panic disappear. > >> >> Want to send a revert so I can take it now? > >> > > >> > I can send a revert, but Felipe was asking Sandeep (the commit author) for a fix, > >> > so I'll leave it up to Felipe to decide how to proceed. > >> > >> I'm okay with a revert. Feel free to add my Acked-by: Felipe Balbi > >> or it. > >> > >> Sandeep, please send a new version that doesn't encounter the same > >> issue. Make sure to test by reloading the driver in a tight loop for > >> several iterations. > > > > This would probably be tricky to test on other "glue" drivers as the > > problem appears to be specific only to dwc3_of_simple. It looks like > > both dwc3_of_simple and the dwc3 core now (due to 568262bf5492) each > > implement respective .shutdown callbacks. The latter is simply a wrapper > > around dwc3_remove(). And from the panic call stack above we see that > > dwc3_of_simple_shutdown() calls of_platform_depopulate() which will > > again call dwc3_remove() resulting in the double remove. > > > > So would an alternative approach be to protect against dwc3_remove() > > getting called multiple times? IMO it'd be a bit messy to have to add > > no, I don't think so. That sounds like a workaround. We should be able > to guarantee that ->remove() doesn't get called twice using the driver > model properly. Completely fair. So then having a .shutdown callback that directly calls dwc3_remove() is probably not the right thing to do as it completely bypasses the driver model so if and when the driver core does later release the device from the driver that's how we end up with the double remove. > > additional checks there to know if it had already been called. So maybe > > avoid it altogether--should dwc3_of_simple_shutdown() just skip calling > > of_platform_depopulate()? > > I don't know what the idiomatic is nowadays, but at least early on, we > had to call depopulate. So any suggestions on how to fix the original issue Sandeep was trying to fix with 568262bf5492? Maybe implement .shutdown in dwc3_qcom and have it follow what dwc3_of_simple does with of_platform_depopulate()? But then wouldn't other "glues" want/need to follow suit? Jack -- The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project