From nobody Mon Dec 15 23:02:19 2025 Delivered-To: importer2@patchew.org Received-SPF: pass (zohomail.com: domain of vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; envelope-from=linux-kernel-owner@vger.kernel.org; helo=vger.kernel.org; Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zohomail.com: domain of vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass(p=none dis=none) header.from=kernel.org ARC-Seal: i=1; a=rsa-sha256; t=1620207751; cv=none; d=zohomail.com; s=zohoarc; b=QKxrPX4PepXEZ3zbFeIqubeBTHBTCEwCWW54hGR6vFInjM34fYGy7NVERxBlDMUx1bQ6O8IyWx0q9oTSUFnF0vxtapkCppwoUhXu0aN+N+hM2oKiym6CfYfo/h8Zy7BStVFTyPW6T9KAYflYuTTa15CbaYz7JO7IE3IOyw/3Wvk= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1620207751; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Id:MIME-Version:Message-ID:References:Sender:Subject:To; bh=dWFvl+pxMJ9njY5SGXkCEwZH/atF+a5H1EPd2cRdPRw=; b=NO6ezXUvkKo92mOATMCAOQL9kw1LTDRO8MKOR26LGGwspcdPaaxqwf5bDKoXge1gcFSn52Mgtg+reecv81/A/FZ5ps7hVXw6vnCNl2oDk9mjqWcv7rAFs0L4+bYmRwfVC236cr2sX7ApM4FPqsI6XKLx6rTXbpZso4inCLH7VH4= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=fail; spf=pass (zohomail.com: domain of vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass header.from= (p=none dis=none) header.from= Return-Path: Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mx.zohomail.com with SMTP id 162020775139059.676712216275064; Wed, 5 May 2021 02:42:31 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232750AbhEEJnZ (ORCPT ); Wed, 5 May 2021 05:43:25 -0400 Received: from mail.kernel.org ([198.145.29.99]:48336 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232229AbhEEJnQ (ORCPT ); Wed, 5 May 2021 05:43:16 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id E0525613F1; Wed, 5 May 2021 09:42:18 +0000 (UTC) Received: by mail.kernel.org with local (Exim 4.94) (envelope-from ) id 1leE2q-00AHvx-NG; Wed, 05 May 2021 11:42:16 +0200 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1620207738; bh=k34dpqKksd63uP3WoiZAOKOaT0okl+LOGKtsaw4Wk3E=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=CSdcfnwMScG5s9vfbU3g9YmCUyWl/x61+5MGF4Mwd2YpY2lCOMAlN9BX8GphMMvbE kjkHkpNMUgrHzHUGt4nYBMXbBmvymIui4D6UjTMQ2ZuA2CEGxmrMR8IhXjRtD1cjGB wzXybke5PnXZQGfVZXjAYgAm/DIdytox4zwRRfsXgpIecapYIkLZVlDMPP733TDu5e Dyz9oHbvPPTgKDzybTYdpHsinm7TEVvRAPqeZO7oBCl/f/eDNDiW7chhaC3erDxKpP jzDcEpJU+ha1hjFYBy1nDCIigJsXTAgnm67AxGnM9dIS/j19CrPyMPN3u7N+7EQxtZ Up9PD0awFliwg== From: Mauro Carvalho Chehab Cc: linuxarm@huawei.com, mauro.chehab@huawei.com, Mauro Carvalho Chehab , Andy Gross , Bjorn Andersson , Hans Verkuil , Mauro Carvalho Chehab , Stanimir Varbanov , linux-arm-msm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-media@vger.kernel.org Subject: [PATCH 03/25] media: venus: Rework error fail recover logic Date: Wed, 5 May 2021 11:41:53 +0200 Message-Id: <419e346f01af5423485202d624fc144756bd2b11.1620207353.git.mchehab+huawei@kernel.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: Mauro Carvalho Chehab To: unlisted-recipients:; (no To-header on input) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-ZohoMail-DKIM: fail (Header signature does not verify) Content-Type: text/plain; charset="utf-8" The Venus code has a sort of watchdog that attempts to recover from IP errors, implemented as a delayed work job, which calls venus_sys_error_handler(). Right now, it has several issues: 1. It assumes that PM runtime resume never fails 2. It internally runs two while() loops that also assume that PM runtime will never fail to go idle: while (pm_runtime_active(core->dev_dec) || pm_runtime_active(core->dev_enc= )) msleep(10); ... while (core->pmdomains[0] && pm_runtime_active(core->pmdomains[0])) usleep_range(1000, 1500); 3. It uses an OR to merge all return codes and then report to the user 4. If the hardware never recovers, it keeps running on every 10ms, flooding the syslog with 2 messages (so, up to 200 messages per second). Rework the code, in order to prevent that, by: 1. check the return code from PM runtime resume; 2. don't let the while() loops run forever; 3. store the failed event; 4. use warn ratelimited when it fails to recover. Fixes: af2c3834c8ca ("[media] media: venus: adding core part and helper fun= ctions") Signed-off-by: Mauro Carvalho Chehab --- drivers/media/platform/qcom/venus/core.c | 59 +++++++++++++++++++----- 1 file changed, 47 insertions(+), 12 deletions(-) diff --git a/drivers/media/platform/qcom/venus/core.c b/drivers/media/platf= orm/qcom/venus/core.c index 54bac7ec14c5..4d0482743c0a 100644 --- a/drivers/media/platform/qcom/venus/core.c +++ b/drivers/media/platform/qcom/venus/core.c @@ -78,22 +78,32 @@ static const struct hfi_core_ops venus_core_ops =3D { .event_notify =3D venus_event_notify, }; =20 +#define RPM_WAIT_FOR_IDLE_MAX_ATTEMPTS 10 + static void venus_sys_error_handler(struct work_struct *work) { struct venus_core *core =3D container_of(work, struct venus_core, work.work); - int ret =3D 0; + int ret, i, max_attempts =3D RPM_WAIT_FOR_IDLE_MAX_ATTEMPTS; + bool failed =3D false; + const char *err_msg =3D ""; =20 - pm_runtime_get_sync(core->dev); + ret =3D pm_runtime_get_sync(core->dev); + if (ret < 0) { + err_msg =3D "resume runtime PM\n"; + max_attempts =3D 0; + failed =3D true; + } =20 hfi_core_deinit(core, true); =20 - dev_warn(core->dev, "system error has occurred, starting recovery!\n"); - mutex_lock(&core->lock); =20 - while (pm_runtime_active(core->dev_dec) || pm_runtime_active(core->dev_en= c)) + for (i =3D 0; i < max_attempts; i++) { + if (!pm_runtime_active(core->dev_dec) && !pm_runtime_active(core->dev_en= c)) + break; msleep(10); + } =20 venus_shutdown(core); =20 @@ -101,31 +111,56 @@ static void venus_sys_error_handler(struct work_struc= t *work) =20 pm_runtime_put_sync(core->dev); =20 - while (core->pmdomains[0] && pm_runtime_active(core->pmdomains[0])) + for (i =3D 0; i < max_attempts; i++) { + if (!core->pmdomains[0] || !pm_runtime_active(core->pmdomains[0])) + break; usleep_range(1000, 1500); + } =20 hfi_reinit(core); =20 - pm_runtime_get_sync(core->dev); + ret =3D pm_runtime_get_sync(core->dev); + if (ret < 0) { + err_msg =3D "resume runtime PM\n"; + max_attempts =3D 0; + failed =3D true; + } =20 - ret |=3D venus_boot(core); - ret |=3D hfi_core_resume(core, true); + ret =3D venus_boot(core); + if (ret && !failed) { + err_msg =3D "boot Venus\n"; + failed =3D true; + } + + ret =3D hfi_core_resume(core, true); + if (ret && !failed) { + err_msg =3D "resume HFI\n"; + failed =3D true; + } =20 enable_irq(core->irq); =20 mutex_unlock(&core->lock); =20 - ret |=3D hfi_core_init(core); + ret =3D hfi_core_init(core); + if (ret && !failed) { + err_msg =3D "init HFI\n"; + failed =3D true; + } =20 pm_runtime_put_sync(core->dev); =20 - if (ret) { + if (failed) { disable_irq_nosync(core->irq); - dev_warn(core->dev, "recovery failed (%d)\n", ret); + dev_warn_ratelimited(core->dev, + "System error has occurred, recovery failed to %s\n", + err_msg); schedule_delayed_work(&core->work, msecs_to_jiffies(10)); return; } =20 + dev_warn(core->dev, "system error has occurred (recovered)\n"); + mutex_lock(&core->lock); core->sys_error =3D false; mutex_unlock(&core->lock); --=20 2.30.2