Suspend to RAM和Suspend to Idle分析,以及在HiKey上性能对比【转】

时间:2023-03-09 04:05:04
Suspend to RAM和Suspend to Idle分析,以及在HiKey上性能对比【转】

转自:https://www.cnblogs.com/arnoldlu/p/6253665.html

测试环境:AOSP 7.1.1+Kernel 4.4.17 HW:HiKey

Ubuntu 14.04+Kernel 4.4.0-31

联系方式:arnoldlu@qq.com

1. Linux内核suspend状态

Linux内核支持多种类型的睡眠状态,通过设置不同的模块进入低功耗模式来达到省电功能。

目前存在四种模式:suspend to idle、power-on standby(Standby)、suspend to ram(STR)和sudpend to disk(Hibernate),分别对应ACPI状态的S0、S1、S3和S4。

State in Linux Label state ACPI state 注释
#define PM_SUSPEND_ON        ((__force suspend_state_t) 0)        一切正常
#define PM_SUSPEND_FREEZE    ((__force suspend_state_t) 1) freeze Suspend-to-Idle S0 冻结进程+挂起设备+CPU空闲 
#define PM_SUSPEND_STANDBY    ((__force suspend_state_t) 2) standby Standby/Power-on Suspend S1 冻结进程+挂起设备+关闭nonbootCPU 
#define PM_SUSPEND_MEM        ((__force suspend_state_t) 3) mem Suspend-to-RAM S3 仅保留RAM自刷新 
#define PM_SUSPEND_MAX        ((__force suspend_state_t) 4) disk Suspend-to-disk S4

关闭所有设备包括RAM,也被称为Hibernate

从freeze-->standby-->mem睡眠程度越来越深,唤醒花费的时间也越来越多。

Suspend-To-Idle

此状态包括frozen processes+suspended devices+idle processors,具有轻量化的特点;

并且相对于相对于Idle状态能节省更多的功耗,因为此时的用户空间被冻结且I/O设备进入了低功耗状态。

相对于Suspend-To-RAM它具有低延时的优势。

Standby/Power-On Suspend

此状态包括frozen processes+suspended devices+offline nonboot CPUs+suspend low-level system,对CPU的处理更近一步。

所以相对于Suspend-To-Idle节省了更多的功耗,但是由于需要恢复CPU和一些底层功能也花费了更多的时间。

Suspend-to-RAM

此状态使所有的设备进入低功耗状态,仅保留RAM自刷新。

所有的设备和系统状态都保存在RAM中,所有外设被挂起。

(在HiKey的实际测试中,boot CPU是没有关闭的!实际上这里也没有standby,mem和standby基本上没有区别。)

Suspend-to-disk

此状态是最省功耗的模式。

相对Suspend-to-RAMRAM能节省更多功耗的原因是数据会被写入磁盘中,RAM也可以被关闭。

但是这也导致了,更多的恢复延时,在resume的时候读回到RAM,然后在进行系统和设备状态恢复工作。

但是在一般的嵌入式设备上,此种状态不支持。

下面用STR表示Suspend to RAM,STI表示Suspend to Idle。

详情请参考:http://www.linaro.org/blog/suspend-to-idle/

2. Suspend状态,以及STR 和STI区别

写入/sys/power/state不同字符串,可以让系统进入不同睡眠状态。

针对state sysfs节点的写入,最终会进入到state_store这个函数,将字符串转换成上表中不同状态。

Suspend to RAM和Suspend to Idle分析,以及在HiKey上性能对比【转】
state_store(kernel/power/main.c)
-->pm_suspend (kernel/power/suspend.c)-------------处理除freeze、standby、mem三种类型suspend
-->enter_state---------------------------------在进入睡眠之前,做一些准备工作
-->suspend_devices_and_enter
-->suspend_enter-----------------------这里才是freeze与standby/mem区别所在。
-->hibernate---------------------------------------进入suspend to disk流程
Suspend to RAM和Suspend to Idle分析,以及在HiKey上性能对比【转】

STR和STI的最主要区别就是下面一段代码:

Suspend to RAM和Suspend to Idle分析,以及在HiKey上性能对比【转】
static int suspend_enter(suspend_state_t state, bool *wakeup)
{ … /*
* PM_SUSPEND_FREEZE equals
* frozen processes + suspended devices + idle processors.
* Thus we should invoke freeze_enter() soon after
* all the devices are suspended.
*/
//====================================FREEZE===============================================================
if (state == PM_SUSPEND_FREEZE) {------------------------------------如果要进入freeze状态,就会执行此段代码。
trace_suspend_resume(TPS("machine_suspend"), state, true);
freeze_enter();
trace_suspend_resume(TPS("machine_suspend"), state, false);
goto Platform_wake;----------------------------------------------在执行结束跳转到Platform_wake,中间一段绿色代码将会被跳过。所以说freeze和standby、mem相比,多了freeze_enter,少了对non-boot CPUs、arch、syscore的操作。
}
//=====================================MEM===============================================================
error = disable_nonboot_cpus();
if (error || suspend_test(TEST_CPUS)) {
log_suspend_abort_reason("Disabling non-boot cpus failed");
goto Enable_cpus;
} arch_suspend_disable_irqs();
BUG_ON(!irqs_disabled()); error = syscore_suspend();
if (!error) {
*wakeup = pm_wakeup_pending();
if (!(suspend_test(TEST_CORE) || *wakeup)) {
trace_suspend_resume(TPS("machine_suspend"),
state, true);
error = suspend_ops->enter(state);
trace_suspend_resume(TPS("machine_suspend"),
state, false);
events_check_enabled = false;
} else if (*wakeup) {
pm_get_active_wakeup_sources(suspend_abort,
MAX_SUSPEND_ABORT_LEN);
log_suspend_abort_reason(suspend_abort);
error = -EBUSY;
}
syscore_resume();
} arch_suspend_enable_irqs();
BUG_ON(irqs_disabled()); Enable_cpus:
enable_nonboot_cpus(); Platform_wake:
platform_resume_noirq(state);
dpm_resume_noirq(PMSG_RESUME); … }
Suspend to RAM和Suspend to Idle分析,以及在HiKey上性能对比【转】

3 suspend/resume流程梳理

下面分析一下suspend/resume每个细分阶段。

整个suspend可以分为若干阶段,每个阶段函数—>关键节点Trace—>analyze_suspend.py解析Trace—>根据Trace时间画出Timeline图表

这样就可以分析出总的时间差异,每个阶段差异,甚至一个设备suspend/resume、一个子系统suspend/resume的时间差异。

analyze_suspend.py 基于默认基于ftrace进行分析(在指定dmesg的时候,会发现缺失了很多log信息,无法生成timeline类型的html文件),将suspend/resume分为若干阶段。

下面简要介绍一下各个阶段,然后基于此进行代码分析。

在kernel版本大于等与3.15之后,解析需要的所有log信息都可以从ftrace中获取。之前的内核版本还需要借助于dmesg。

由于使用的kernel版本是4.4.17,sysvals.usetraceeventsonly被置位,所以只会parseTraceLog()。

下表中的各个阶段通过解析suspend_resume: XXXXXXX类型的ftrace来获取。

各子模块、子系统的解析通过device_pm_callback_start和device_pm_callback_end来截取时间段,以及这时间段内的callgraph。

Phase名称 ftrace关键词    
suspend_prepare dpm_prepare    
suspend dpm_suspend    
suspend_late dpm_suspend_late    
suspend_noirq dpm_suspend_noirq    
suspend_machine machine_suspend start    
resume_machine machine_suspend end    
resume_noirq dpm_resume_noirq    
resume_early dpm_resume_early    
resume dpm_resume    
resume_complete dpm_complete    

下面是一组suspend/resume执行ftrace log,我们将据此进行各阶段代码分析,包括suspend_enter、suspend_prepare、suspend、suspend_late、suspend_noirq、suspend_machine、resume_machine、resume_noirq、resume_early、resume、resume_complete。

从这里也可以看出freeze和mem/standby除了machine部分不同之外,还少了CPU开关和syscore suspend/resume操作。

suspend_resume: suspend_enter[1] begin
suspend_resume: sync_filesystems[0] begin
suspend_resume: sync_filesystems[0] end
suspend_resume: freeze_processes[0] begin
suspend_resume: freeze_processes[0] end
suspend_resume: suspend_enter[1] end
suspend_resume: dpm_prepare[2] begin
suspend_resume: dpm_prepare[2] end
suspend_resume: dpm_suspend[2] begin
suspend_resume: dpm_suspend[2] end
suspend_resume: dpm_suspend_late[2] begin
suspend_resume: dpm_suspend_late[2] end
suspend_resume: dpm_suspend_noirq[2] begin
suspend_resume: dpm_suspend_noirq[2] end
No CPU_OFF…
No syscore_suspend…
suspend_resume: machine_suspend[1] begin
suspend_resume: machine_suspend[1] end
No suscore_resume…
No CPU_ON…
suspend_resume: dpm_resume_noirq[16] begin
suspend_resume: dpm_resume_noirq[16] end
suspend_resume: dpm_resume_early[16] begin
suspend_resume: dpm_resume_early[16] end
suspend_resume: dpm_resume[16] begin
suspend_resume: dpm_resume[16] end
suspend_resume: dpm_complete[16] begin
suspend_resume: dpm_complete[16] end
suspend_resume: resume_console[1] begin
suspend_resume: resume_console[1] end
suspend_resume: thaw_processes[0] begin
suspend_resume: thaw_processes[0] end

suspend_resume: suspend_enter[3] begin
suspend_resume: sync_filesystems[0] begin
suspend_resume: sync_filesystems[0] end
suspend_resume: freeze_processes[0] begin
suspend_resume: freeze_processes[0] end
suspend_resume: suspend_enter[3] end
suspend_resume: dpm_prepare[2] begin
suspend_resume: dpm_prepare[2] end
suspend_resume: dpm_suspend[2] begin
suspend_resume: dpm_suspend[2] end
suspend_resume: dpm_suspend_late[2] begin
suspend_resume: dpm_suspend_late[2] end
suspend_resume: dpm_suspend_noirq[2] begin
suspend_resume: dpm_suspend_noirq[2] end
suspend_resume: CPU_OFF[1-7] begin/end
suspend_resume: syscore_suspend[0] begin/end
suspend_resume: machine_suspend[3] begin
suspend_resume: machine_suspend[3] end
suspend_resume: syscore_resume[0] begin/end
suspend_resume: CPU_ON[1-7] begin/end
suspend_resume: dpm_resume_noirq[16] begin
suspend_resume: dpm_resume_noirq[16] end
suspend_resume: dpm_resume_early[16] begin
suspend_resume: dpm_resume_early[16] end
suspend_resume: dpm_resume[16] begin
suspend_resume: dpm_resume[16] end
suspend_resume: dpm_complete[16] begin
suspend_resume: dpm_complete[16] end
suspend_resume: resume_console[3] begin
suspend_resume: resume_console[3] end
suspend_resume: thaw_processes[0] begin
suspend_resume: thaw_processes[0] end

在介绍相关代码之前,先介绍一下HiKey使用的platform_suspend_ops:

static const struct platform_suspend_ops psci_suspend_ops = {
    .valid          = suspend_valid_only_mem,  仅支持mem类型的suspend
    .enter          = psci_system_suspend_enter,  睡眠的CPU底层支持
};

freeze的platform_freeze_ops如下:

static const struct platform_freeze_ops acpi_freeze_ops = {
    .begin = acpi_freeze_begin,
    .prepare = acpi_freeze_prepare,
    .restore = acpi_freeze_restore,
    .end = acpi_freeze_end,
};

3.1 suspend_enter

enter_state作为suspend/resume的入口点,完成了绝大部分工作。首先确保系统没有正在进入睡眠状态;然后为suspend做一些准备,使系统进入睡眠并在唤醒后进行必要清理恢复工作。

下面分析一下suspend之前的准备工作,即suspend_enter阶段:

Suspend to RAM和Suspend to Idle分析,以及在HiKey上性能对比【转】
static int enter_state(suspend_state_t state)
{
int error; trace_suspend_resume(TPS("suspend_enter"), state, true);
if (state == PM_SUSPEND_FREEZE) {--------------------------------------是否是freeze类型suspend
#ifdef CONFIG_PM_DEBUG
if (pm_test_level != TEST_NONE && pm_test_level <= TEST_CPUS) {
pr_warning("PM: Unsupported test mode for suspend to idle,"
"please choose none/freezer/devices/platform.\n");
return -EAGAIN;
}
#endif
} else if (!valid_state(state)) {-------------------------------------目前只支持mem类型suspend
return -EINVAL;
}
if (!mutex_trylock(&pm_mutex))
return -EBUSY; if (state == PM_SUSPEND_FREEZE)
freeze_begin();--------------------------------------------------初始化suspend_freeze_state为FREEZE_STATE_NONE #ifndef CONFIG_SUSPEND_SKIP_SYNC
trace_suspend_resume(TPS("sync_filesystems"), 0, true);
printk(KERN_INFO "PM: Syncing filesystems ... ");
sys_sync();----------------------------------------------------------sync文件系统缓存文件,确保数据sync到硬盘。
printk("done.\n");
trace_suspend_resume(TPS("sync_filesystems"), 0, false);
#endif pr_debug("PM: Preparing system for sleep (%s)\n", pm_states[state]);
pm_suspend_clear_flags();
error = suspend_prepare(state);--------------------------------------注意这里面的suspend_prepare和下面的suspend_prepare阶段容易搞混。
if (error)
goto Unlock; if (suspend_test(TEST_FREEZER))
goto Finish; trace_suspend_resume(TPS("suspend_enter"), state, false);
pr_debug("PM: Suspending system (%s)\n", pm_states[state]);
pm_restrict_gfp_mask();
error = suspend_devices_and_enter(state);
pm_restore_gfp_mask(); Finish:
pr_debug("PM: Finishing wakeup.\n");
suspend_finish();---------------------------------------------------解冻,重启进程;发送PM_POST_SUSPEND通知;释放之前分配的console。
Unlock:
mutex_unlock(&pm_mutex);
return error;
}
Suspend to RAM和Suspend to Idle分析,以及在HiKey上性能对比【转】

接着分析一下suspend_prepare函数:

static int suspend_prepare(suspend_state_t state)
{
    int error;

if (!sleep_state_supported(state))  验证suspend状态
        return -EPERM;

pm_prepare_console();  分配一个suspend console

error = pm_notifier_call_chain(PM_SUSPEND_PREPARE);  发送PM_SUSPEND_PREPARE通知消息
    if (error)
        goto Finish;

trace_suspend_resume(TPS("freeze_processes"), 0, true);
    error = suspend_freeze_processes();  冻结进程
    trace_suspend_resume(TPS("freeze_processes"), 0, false);
    if (!error)
        return 0;

suspend_stats.failed_freeze++;
    dpm_save_failed_step(SUSPEND_FREEZE);
Finish:
    pm_notifier_call_chain(PM_POST_SUSPEND);
    pm_restore_console();
    return error;
}

suspend_freeze_process先处理用户空间进程,然后处理内核进程:

static inline int suspend_freeze_processes(void)
{
    int error;

error = freeze_processes();  触发用户空间进程进入freeze状态。当前进程不会被冻结。因为冻结失败的进程会自动被解冻,所以不需要进行错误处理。
    /*
     * freeze_processes() automatically thaws every task if freezing
     * fails. So we need not do anything extra upon error.
     */
    if (error)
        return error;

error = freeze_kernel_threads();  冻结内核线程
    /*
     * freeze_kernel_threads() thaws only kernel threads upon freezing
     * failure. So we have to thaw the userspace tasks ourselves.
     */
    if (error)  由于freeze_kernel_threads冻结失败,只会解冻内核线程。所以还需要对用户空间进程进行解冻。
        thaw_processes();

return error;
}

下面的阶段都在suspend_devices_and_enter中,可以看出这是一个对称的流程,每一阶段的suspend,都有对应的resume。

int suspend_devices_and_enter(suspend_state_t state)
{
    int error;
    bool wakeup = false;

if (!sleep_state_supported(state))
        return -ENOSYS;

error = platform_suspend_begin(state);
    if (error)
        goto Close;

suspend_console();  关闭console子系统,暂停printk打印
    suspend_test_start();
    error = dpm_suspend_start(PMSG_SUSPEND); suspend_prepare(dpm_prepare)、suspend(dpm_suspend)两阶段
    if (error) {
        pr_err("PM: Some devices failed to suspend, or early wake event detected\n");
        log_suspend_abort_reason("Some devices failed to suspend, or early wake event detected");
        goto Recover_platform;
    }
    suspend_test_finish("suspend devices");
    if (suspend_test(TEST_DEVICES))
        goto Recover_platform;

do {
        error = suspend_enter(state, &wakeup);  suspend_late(dpm_suspend_late)、suspend_noirq(dpm_suspend_noirq)、suspend_machine、resume_machine、resume_noirq(dpm_resume_noirq)、resume_early(dpm_resume_early)
    } while (!error && !wakeup && platform_suspend_again(state));

Resume_devices:
    suspend_test_start();
    dpm_resume_end(PMSG_RESUME);  resume(dpm_resume)、resume_complete(dpm_complete)
    suspend_test_finish("resume devices");
    trace_suspend_resume(TPS("resume_console"), state, true);
    resume_console();  打开console子系统,恢复printk打印。
    trace_suspend_resume(TPS("resume_console"), state, false);

Close:
    platform_resume_end(state);
    return error;

Recover_platform:
    platform_recover(state);
    goto Resume_devices;
}

还有必要过一下suspend_enter:

static int suspend_enter(suspend_state_t state, bool *wakeup)
{
    char suspend_abort[MAX_SUSPEND_ABORT_LEN];
    int error, last_dev;

error = platform_suspend_prepare(state); 因为suspend_ops的prepare为空,所以返回0
    if (error)
        goto Platform_finish;

error = dpm_suspend_late(PMSG_SUSPEND);  suspend_late
    if (error) {
        last_dev = suspend_stats.last_failed_dev + REC_FAILED_NUM - 1;
        last_dev %= REC_FAILED_NUM;
        printk(KERN_ERR "PM: late suspend of devices failed\n");
        log_suspend_abort_reason("%s device failed to power down",
            suspend_stats.failed_devs[last_dev]);
        goto Platform_finish;
    }
    error = platform_suspend_prepare_late(state);  执行freeze_ops->prepare()
    if (error)
        goto Devices_early_resume;

error = dpm_suspend_noirq(PMSG_SUSPEND);  suspend_noirq
    if (error) {
        last_dev = suspend_stats.last_failed_dev + REC_FAILED_NUM - 1;
        last_dev %= REC_FAILED_NUM;
        printk(KERN_ERR "PM: noirq suspend of devices failed\n");
        log_suspend_abort_reason("noirq suspend of %s device failed",
            suspend_stats.failed_devs[last_dev]);
        goto Platform_early_resume;
    }
    error = platform_suspend_prepare_noirq(state);
    if (error)
        goto Platform_wake;

if (suspend_test(TEST_PLATFORM))
        goto Platform_wake;

/*
     * PM_SUSPEND_FREEZE equals
     * frozen processes + suspended devices + idle processors.
     * Thus we should invoke freeze_enter() soon after
     * all the devices are suspended.
     */
    if (state == PM_SUSPEND_FREEZE) {  这里是freeze和mem/standy差别所在
        trace_suspend_resume(TPS("machine_suspend"), state, true);
        freeze_enter();  
        trace_suspend_resume(TPS("machine_suspend"), state, false);
        goto Platform_wake;
    }

error = disable_nonboot_cpus(); 关闭所有boot-CPU之外的CPU
    if (error || suspend_test(TEST_CPUS)) {
        log_suspend_abort_reason("Disabling non-boot cpus failed");
        goto Enable_cpus;
    }

arch_suspend_disable_irqs();
    BUG_ON(!irqs_disabled());

error = syscore_suspend();  执行syscore_ops_list上所有syscore_ops的suspend回调函数
    if (!error) {
        *wakeup = pm_wakeup_pending();  检查是否需要终止suspend流程?
        if (!(suspend_test(TEST_CORE) || *wakeup)) {
            trace_suspend_resume(TPS("machine_suspend"),
                state, true);
            error = suspend_ops->enter(state);  调用psci_suspend_ops的enter回调函数,关闭machine
            trace_suspend_resume(TPS("machine_suspend"),
                state, false);  !!!!!!!!!!!!!!!!这里即为唤醒之后的执行路径了!!!!!!!!!!!!!!!!
            events_check_enabled = false;
        } else if (*wakeup) {
            pm_get_active_wakeup_sources(suspend_abort,
                MAX_SUSPEND_ABORT_LEN);
            log_suspend_abort_reason(suspend_abort);
            error = -EBUSY;
        }
        syscore_resume();  执行所有syscore_ops_list的resume回调函数
    }

arch_suspend_enable_irqs();
    BUG_ON(irqs_disabled());

Enable_cpus:
    enable_nonboot_cpus();  打开所有non-boot CPU

Platform_wake:
    platform_resume_noirq(state);
    dpm_resume_noirq(PMSG_RESUME);  resume_noirq

Platform_early_resume:
    platform_resume_early(state);

Devices_early_resume:
    dpm_resume_early(PMSG_RESUME);  resume_early

Platform_finish:
    platform_resume_finish(state);
    return error;
}

3.2 suspend_prepare和suspend

DPM是Device Power Management的意思,这些操作都是针对非系统设备(non-sysdev)进行的。那什么是系统设备呢?下面的machine应该就是所谓的sysdev了。

dpm_prepare实际上就是遍历dpm_list上的所有设备,执行->prepare回调函数。如果设备存在->prepare回电函数,会将设备的prepare阶段打印到ftrace。

int dpm_prepare(pm_message_t state)
{
    int error = 0;

trace_suspend_resume(TPS("dpm_prepare"), state.event, true);
    might_sleep();

mutex_lock(&dpm_list_mtx);
    while (!list_empty(&dpm_list)) {  遍历dpm_list
        struct device *dev = to_device(dpm_list.next);

get_device(dev);
        mutex_unlock(&dpm_list_mtx);

trace_device_pm_callback_start(dev, "", state.event);
        error = device_prepare(dev, state);  执行->prepare回调函数
        trace_device_pm_callback_end(dev, error);

mutex_lock(&dpm_list_mtx);
        if (error) {
            if (error == -EAGAIN) {
                put_device(dev);
                error = 0;
                continue;
            }
            printk(KERN_INFO "PM: Device %s not prepared "
                "for power transition: code %d\n",
                dev_name(dev), error);
            put_device(dev);
            break;
        }
        dev->power.is_prepared = true;
        if (!list_empty(&dev->power.entry))
            list_move_tail(&dev->power.entry, &dpm_prepared_list);  移动设备到dpm_prepared_list
        put_device(dev);
    }
    mutex_unlock(&dpm_list_mtx);
    trace_suspend_resume(TPS("dpm_prepare"), state.event, false);
    return error;
}

dpm_suspend遍历dpm_prepared_list,这点和dpm_prepare有区别。然后执行设备的->suspend回调函数。

int dpm_suspend(pm_message_t state)
{
    ktime_t starttime = ktime_get();
    int error = 0;

trace_suspend_resume(TPS("dpm_suspend"), state.event, true);
    might_sleep();

cpufreq_suspend();

mutex_lock(&dpm_list_mtx);
    pm_transition = state;
    async_error = 0;
    while (!list_empty(&dpm_prepared_list)) { 基于dpm_prepared_list遍历设备
        struct device *dev = to_device(dpm_prepared_list.prev);

get_device(dev);
        mutex_unlock(&dpm_list_mtx);

error = device_suspend(dev);  执行设备->suspend回调函数

mutex_lock(&dpm_list_mtx);
        if (error) {
            pm_dev_err(dev, state, "", error);
            dpm_save_failed_dev(dev_name(dev));
            put_device(dev);
            break;
        }
        if (!list_empty(&dev->power.entry))
            list_move(&dev->power.entry, &dpm_suspended_list);  移动设备到dpm_suspended_list
        put_device(dev);
        if (async_error)
            break;
    }
    mutex_unlock(&dpm_list_mtx);
    async_synchronize_full();
    if (!error)
        error = async_error;
    if (error) {
        suspend_stats.failed_suspend++;
        dpm_save_failed_step(SUSPEND_SUSPEND);
    } else
        dpm_show_time(starttime, state, NULL);
    trace_suspend_resume(TPS("dpm_suspend"), state.event, false);
    return error;
}

3.3 suspend_late和suspend_noirq

dpm_suspend_late基于dpm_suspended_list操作设备,所以这也需要函数之间顺序执行。

int dpm_suspend_late(pm_message_t state)
{
    ktime_t starttime = ktime_get();
    int error = 0;

trace_suspend_resume(TPS("dpm_suspend_late"), state.event, true);
    mutex_lock(&dpm_list_mtx);
    pm_transition = state;
    async_error = 0;

while (!list_empty(&dpm_suspended_list)) {  遍历dpm_suspended_list列表
        struct device *dev = to_device(dpm_suspended_list.prev);

get_device(dev);
        mutex_unlock(&dpm_list_mtx);

error = device_suspend_late(dev);  执行->suspend_late回调函数

mutex_lock(&dpm_list_mtx);
        if (!list_empty(&dev->power.entry))
            list_move(&dev->power.entry, &dpm_late_early_list);  移动设备到dpm_late_early_list

if (error) {
            pm_dev_err(dev, state, " late", error);
            dpm_save_failed_dev(dev_name(dev));
            put_device(dev);
            break;
        }
        put_device(dev);

if (async_error)
            break;
    }
    mutex_unlock(&dpm_list_mtx);
    async_synchronize_full();
    if (!error)
        error = async_error;
    if (error) {
        suspend_stats.failed_suspend_late++;
        dpm_save_failed_step(SUSPEND_SUSPEND_LATE);
        dpm_resume_early(resume_event(state));
    } else {
        dpm_show_time(starttime, state, "late");
    }
    trace_suspend_resume(TPS("dpm_suspend_late"), state.event, false);
    return error;
}

dpm_suspend_noirq基于dpm_late_early_list遍历所有设备。首先阻止设备驱动接收中断信息,然后执行->suspend_noirq回调函数。

int dpm_suspend_noirq(pm_message_t state)
{
    ktime_t starttime = ktime_get();
    int error = 0;

trace_suspend_resume(TPS("dpm_suspend_noirq"), state.event, true);
    cpuidle_pause();  暂停cpuidle功能,退出idle的CPU
    device_wakeup_arm_wake_irqs();  将具有wakeirq的设备设置成wakeup resource
    suspend_device_irqs();  关闭当前所有能够关闭的irq,置成IRQS_SUSPENDED。IRQF_NO_SUSPEND类型的wakeup中断不能被关闭,并且作为wakeup唤醒源的中断不能被关闭。
    mutex_lock(&dpm_list_mtx);
    pm_transition = state;
    async_error = 0;

while (!list_empty(&dpm_late_early_list)) {
        struct device *dev = to_device(dpm_late_early_list.prev);

get_device(dev);
        mutex_unlock(&dpm_list_mtx);

error = device_suspend_noirq(dev);  调用->suspend_noirq回调函数

mutex_lock(&dpm_list_mtx);
        if (error) {
            pm_dev_err(dev, state, " noirq", error);
            dpm_save_failed_dev(dev_name(dev));
            put_device(dev);
            break;
        }
        if (!list_empty(&dev->power.entry))
            list_move(&dev->power.entry, &dpm_noirq_list);  移动设备到dpm_noirq_list
        put_device(dev);

if (async_error)
            break;
    }
    mutex_unlock(&dpm_list_mtx);
    async_synchronize_full();
    if (!error)
        error = async_error;

if (error) {
        suspend_stats.failed_suspend_noirq++;
        dpm_save_failed_step(SUSPEND_SUSPEND_NOIRQ);
        dpm_resume_noirq(resume_event(state));
    } else {
        dpm_show_time(starttime, state, "noirq");
    }
    trace_suspend_resume(TPS("dpm_suspend_noirq"), state.event, false);
    return error;
}

3.4 suspend_machine和resume_machine

freeze和mem/standby在这部分是不同的。

mem/standby直接调用suspend_ops->enter进入对应的睡眠模式。

而freeze就要稍微复杂了:

static void freeze_enter(void)
{
    spin_lock_irq(&suspend_freeze_lock);
    if (pm_wakeup_pending())  检查是否有wakeup信号在处理,如果有则退出当前流程。
        goto out;

suspend_freeze_state = FREEZE_STATE_ENTER;
    spin_unlock_irq(&suspend_freeze_lock);

get_online_cpus();
    cpuidle_resume();  允许使用cpuidle

/* Push all the CPUs into the idle loop. */
    wake_up_all_idle_cpus();  强制所有CPU退出idle状态
    pr_debug("PM: suspend-to-idle\n");
    /* Make the current CPU wait so it can enter the idle loop too. */
    wait_event(suspend_freeze_wait_head,
           suspend_freeze_state == FREEZE_STATE_WAKE);  等待FREEZE_STATE_WAKE事件,进入idle loop
    pr_debug("PM: resume from suspend-to-idle\n");  !!!!!!!!!!!!!!!!这里即为唤醒之后的执行路径了!!!!!!!!!!!!!!!!
    cpuidle_pause();  暂停使用cpuidle
    put_online_cpus();

spin_lock_irq(&suspend_freeze_lock);

out:
    suspend_freeze_state = FREEZE_STATE_NONE;
    spin_unlock_irq(&suspend_freeze_lock);
}

3.5 resume_noirq

执行dpm_noirq_list上设备的resume_noirq回调函数。

void dpm_resume_noirq(pm_message_t state)
{
    struct device *dev;
    ktime_t starttime = ktime_get();

trace_suspend_resume(TPS("dpm_resume_noirq"), state.event, true);
    mutex_lock(&dpm_list_mtx);
    pm_transition = state;

/*
     * Advanced the async threads upfront,
     * in case the starting of async threads is
     * delayed by non-async resuming devices.
     */
    list_for_each_entry(dev, &dpm_noirq_list, power.entry) {
        reinit_completion(&dev->power.completion);
        if (is_async(dev)) {
            get_device(dev);
            async_schedule(async_resume_noirq, dev);
        }
    }

while (!list_empty(&dpm_noirq_list)) {  遍历dpm_noirq_list
        dev = to_device(dpm_noirq_list.next);
        get_device(dev);
        list_move_tail(&dev->power.entry, &dpm_late_early_list);  移动设备到下一级dpm_late_early_list
        mutex_unlock(&dpm_list_mtx);

if (!is_async(dev)) {
            int error;

error = device_resume_noirq(dev, state, false);
            if (error) {
                suspend_stats.failed_resume_noirq++;
                dpm_save_failed_step(SUSPEND_RESUME_NOIRQ);
                dpm_save_failed_dev(dev_name(dev));
                pm_dev_err(dev, state, " noirq", error);
            }
        }

mutex_lock(&dpm_list_mtx);
        put_device(dev);
    }
    mutex_unlock(&dpm_list_mtx);
    async_synchronize_full();
    dpm_show_time(starttime, state, "noirq");
    resume_device_irqs();
    device_wakeup_disarm_wake_irqs();
    cpuidle_resume();
    trace_suspend_resume(TPS("dpm_resume_noirq"), state.event, false);
}

3.6 resume_early

执行前述dpm_late_early_list设备的resume_early回调函数,移动设备到dpm_suspended_list列表。

void dpm_resume_early(pm_message_t state)
{
    struct device *dev;
    ktime_t starttime = ktime_get();

trace_suspend_resume(TPS("dpm_resume_early"), state.event, true);
    mutex_lock(&dpm_list_mtx);
    pm_transition = state;

/*
     * Advanced the async threads upfront,
     * in case the starting of async threads is
     * delayed by non-async resuming devices.
     */
    list_for_each_entry(dev, &dpm_late_early_list, power.entry) {
        reinit_completion(&dev->power.completion);
        if (is_async(dev)) {
            get_device(dev);
            async_schedule(async_resume_early, dev);
        }
    }

while (!list_empty(&dpm_late_early_list)) {
        dev = to_device(dpm_late_early_list.next);
        get_device(dev);
        list_move_tail(&dev->power.entry, &dpm_suspended_list);
        mutex_unlock(&dpm_list_mtx);

if (!is_async(dev)) {
            int error;

error = device_resume_early(dev, state, false);
            if (error) {
                suspend_stats.failed_resume_early++;
                dpm_save_failed_step(SUSPEND_RESUME_EARLY);
                dpm_save_failed_dev(dev_name(dev));
                pm_dev_err(dev, state, " early", error);
            }
        }
        mutex_lock(&dpm_list_mtx);
        put_device(dev);
    }
    mutex_unlock(&dpm_list_mtx);
    async_synchronize_full();
    dpm_show_time(starttime, state, "early");
    trace_suspend_resume(TPS("dpm_resume_early"), state.event, false);
}

3.7 resume

执行所有dpm_suspended_list上设备的resume回调函数。

void dpm_resume(pm_message_t state)
{
    struct device *dev;
    ktime_t starttime = ktime_get();

trace_suspend_resume(TPS("dpm_resume"), state.event, true);
    might_sleep();

mutex_lock(&dpm_list_mtx);
    pm_transition = state;
    async_error = 0;

list_for_each_entry(dev, &dpm_suspended_list, power.entry) {
        reinit_completion(&dev->power.completion);
        if (is_async(dev)) {
            get_device(dev);
            async_schedule(async_resume, dev);
        }
    }

while (!list_empty(&dpm_suspended_list)) {
        dev = to_device(dpm_suspended_list.next);
        get_device(dev);
        if (!is_async(dev)) {
            int error;

mutex_unlock(&dpm_list_mtx);

error = device_resume(dev, state, false);
            if (error) {
                suspend_stats.failed_resume++;
                dpm_save_failed_step(SUSPEND_RESUME);
                dpm_save_failed_dev(dev_name(dev));
                pm_dev_err(dev, state, "", error);
            }

mutex_lock(&dpm_list_mtx);
        }
        if (!list_empty(&dev->power.entry))
            list_move_tail(&dev->power.entry, &dpm_prepared_list);
        put_device(dev);
    }
    mutex_unlock(&dpm_list_mtx);
    async_synchronize_full();
    dpm_show_time(starttime, state, NULL);

cpufreq_resume();
    trace_suspend_resume(TPS("dpm_resume"), state.event, false);
}

3.8 resume_complete

执行所有dpm_prepared_list上设备的complete回调函数。至此dpm_complete结束所有非系统设备的睡眠。

void dpm_complete(pm_message_t state)
{
    struct list_head list;

trace_suspend_resume(TPS("dpm_complete"), state.event, true);
    might_sleep();

INIT_LIST_HEAD(&list);
    mutex_lock(&dpm_list_mtx);
    while (!list_empty(&dpm_prepared_list)) {
        struct device *dev = to_device(dpm_prepared_list.prev);

get_device(dev);
        dev->power.is_prepared = false;
        list_move(&dev->power.entry, &list);
        mutex_unlock(&dpm_list_mtx);

trace_device_pm_callback_start(dev, "", state.event);
        device_complete(dev, state);
        trace_device_pm_callback_end(dev, 0);

mutex_lock(&dpm_list_mtx);
        put_device(dev);
    }
    list_splice(&list, &dpm_list);
    mutex_unlock(&dpm_list_mtx);
    trace_suspend_resume(TPS("dpm_complete"), state.event, false);
}

4 如何让HiKey进入STR/STI并唤醒?

可以通过配置GPIO作为唤醒源,或者通过RTC作为唤醒源,延时一定时间来唤醒。

检查是否存在/sys/class/rtc/rtc0/wakealarm,入不存在则需要打开CONFIG_RTC_DRV_PL031。

写入wakealarm的参数,表示在多少秒之后resume唤醒,退出suspend。

写mem进入state,是系统进入suspend流程。

adb root && adb remount
adb shell "echo +10 > /sys/class/rtc/rtc0/wakealarm && echo mem > /sys/power/state"

5. suspend/resume的latency分析手段

5.1 analyze_suspend.py v3.0

在kernel的scripts中,这个工具可以帮助内核和OS开发者优化suspend/resume时间。

在打开一系列内核选项之后,此工具就可以执行suspend操作,然后抓取dmesg和ftrace数据知道resume结束。

这些数据会按照时间线显示每个设备,并且显示占用最多suspend/resume时间的设备或者子系统的调用关系详图。

执行工具后,会根据时间生成一个子目录,里面包含:html、dmesg和原始ftrace文件。

下面简单看一下工具选项:

Options:
  [general]
    -h          Print this help text
    -v          Print the current tool version
    -verbose    Print extra information during execution and analysis
    -status     Test to see if the system is enabled to run this tool
    -modes      List available suspend modes  显示当前支持的suspend模式
    -m mode     Mode to initiate for suspend ['freeze', 'mem', 'disk'] (default: mem)  设置进入何种模式的suspend
    -rtcwake t  Use rtcwake to autoresume after <t> seconds (default: disabled)  使用rtc来唤醒,参数是间隔时间
  [advanced]
    -f          Use ftrace to create device callgraphs (default: disabled)  基于ftrace生成调用关系图
    -filter "d1 d2 ..." Filter out all but this list of dev names
    -x2         Run two suspend/resumes back to back (default: disabled)
    -x2delay t  Minimum millisecond delay <t> between the two test runs (default: 0 ms)
    -postres t  Time after resume completion to wait for post-resume events (default: 0 S)
    -multi n d  Execute <n> consecutive tests at <d> seconds intervals. The outputs will
                be created in a new subdirectory with a summary page.
  [utilities]
    -fpdt       Print out the contents of the ACPI Firmware Performance Data Table
    -usbtopo    Print out the current USB topology with power info
    -usbauto    Enable autosuspend for all connected USB devices
  [android testing]
    -adb binary Use the given adb binary to run the test on an android device.  参数需要给出adb路径,工具就会对Android设备进行测试,并将结果pull出来。有一点需要注意,在此之前确保adb具有root权限。
                The device should already be connected and with root access.
                Commands will be executed on the device using "adb shell"
  [re-analyze data from previous runs] 针对之前测试数据重新分析
    -ftrace ftracefile  Create HTML output using ftrace input
    -dmesg dmesgfile    Create HTML output using dmesg (not needed for kernel >= 3.15)
    -summary directory  Create a summary of all test in this dir

在了解了工具使用方法之后,就可以进行相关测试了。

5.1.1 Android

./analysze_suspend.py –modes –adb /usr/bin/adb获取当前系统支持的suspend状态。

['freeze', 'mem']

1.Android上测试STR,suspend/resume共5次,每次间隔20秒。

./analyze_suspend.py -adb  /usr/bin/adb -rtcwake 10 -multi 5 20 -f -m mem

2.Android上测试STI,suspend/resume共10次,每次间隔5秒。

./analyze_suspend.py -adb  /usr/bin/adb -rtcwake 10 -multi 5 20 -f -m freeze

测试结果可以在如下获得:

https://github.com/arnoldlu/common-use/tree/master/tools/analyze_suspend/hikey_test

存在的问题:analyze_suspend.py不支持Android的rtcwakeup和callgraph。已经在如下fix:

https://github.com/arnoldlu/common-use/blob/master/tools/analyze_suspend/analyze_suspend.py

5.1.1.1 总体对比

下面是HiKey上测试结果,可以看出两个数据都不够稳定。mem的suspend和resume平均值都比较高。

freeze相比mem的suspend/resume平均值提高了304.3ms/613.5ms。

Suspend to RAM和Suspend to Idle分析,以及在HiKey上性能对比【转】

Suspend to RAM和Suspend to Idle分析,以及在HiKey上性能对比【转】

5.1.1.2 是否suspend CPU

对比如下两幅图,明显看出mem类型的suspend关闭了除CPU0之外的所有CPU;而freeze则没有关闭任何CPU。

non-boot CPUs的suspend/resume时间就达到300ms/200ms。

Suspend to RAM和Suspend to Idle分析,以及在HiKey上性能对比【转】

Suspend to RAM和Suspend to Idle分析,以及在HiKey上性能对比【转】

同时从log中也可以看出mem和freeze的主要区别就在于是否disabling/enabling non-boot CPU。其他设备和子系统的suspend/resume时间基本一致。

同时还可以看出mem的suspend后,系统的timestamp是停止的;而freeze的timestamp还是一直在运行的。可以得出freeze状态持续的时间。

因为先写rtcwake为10s,然后进入睡眠,再唤醒,所以freeze时间是小于10s的。

[ 3385.642962] PM: suspend entry 1970-01-01 00:57:30.580909763 UTC
[ 3385.649165] PM: Syncing filesystems ... done.
[ 3385.661349] Freezing user space processes ... 
[ 3385.671207] dwc2 f72c0000.usb: dwc2_hsotg_ep_stop_xfr: timeout DOEPCTL.EPDisable
[ 3385.678933] dwc2 f72c0000.usb: GINNakEff triggered
[ 3385.685718] (elapsed 0.019 seconds) done.
[ 3385.689860] Freezing remaining freezable tasks ... (elapsed 0.002 seconds) done.
[ 3385.700092] Suspending console(s) (use no_console_suspend to debug)
[ 3385.736020] PM: suspend of devices complete after 27.195 msecs
[ 3385.740811] PM: late suspend of devices complete after 4.765 msecs
[ 3385.743919] PM: noirq suspend of devices complete after 3.090 msecs
Disabling and Enabling non-boot CPUs
[ 3386.209126] PM: noirq resume of devices complete after 1.865 msecs
[ 3386.212066] PM: early resume of devices complete after 2.460 msecs
[ 3386.234729] mmc_host mmc0: Bus speed (slot 0) = 24800000Hz (slot req 400000Hz, actual 400000HZ div = 31)
[ 3386.311480] mmc_host mmc0: Bus speed (slot 0) = 51756522Hz (slot req 52000000Hz, actual 51756522HZ div = 0)
[ 3386.410411] mmc_host mmc2: Bus speed (slot 0) = 24800000Hz (slot req 400000Hz, actual 400000HZ div = 31)
[ 3386.458232] mmc_host mmc2: Bus speed (slot 0) = 24800000Hz (slot req 25000000Hz, actual 24800000HZ div = 0)
[ 3386.458729] PM: resume of devices complete after 246.646 msecs
[ 3386.818770] Restarting tasks ... 
[ 3386.827026] done.
[ 3386.844139] PM: suspend exit 1970-01-01 00:57:40.624589167 UTC

[ 3471.760265] PM: Syncing filesystems ... done.
[ 3471.771897] Freezing user space processes ... 
[ 3471.780407] dwc2 f72c0000.usb: dwc2_hsotg_ep_stop_xfr: timeout DOEPCTL.EPDisable
[ 3471.788105] dwc2 f72c0000.usb: GINNakEff triggered
[ 3471.794916] (elapsed 0.018 seconds) done.
[ 3471.799078] Freezing remaining freezable tasks ... (elapsed 0.002 seconds) done.
[ 3471.809320] Suspending console(s) (use no_console_suspend to debug)
[ 3471.847947] PM: suspend of devices complete after 29.905 msecs
[ 3471.852473] PM: late suspend of devices complete after 4.497 msecs
[ 3471.855611] PM: noirq suspend of devices complete after 3.120 msecs

[ 3481.034722] PM: noirq resume of devices complete after 1.945 msecs
[ 3481.037992] PM: early resume of devices complete after 2.694 msecs
[ 3481.062803] mmc_host mmc0: Bus speed (slot 0) = 24800000Hz (slot req 400000Hz, actual 400000HZ div = 31)
[ 3481.137795] mmc_host mmc0: Bus speed (slot 0) = 51756522Hz (slot req 52000000Hz, actual 51756522HZ div = 0)
[ 3481.234796] mmc_host mmc2: Bus speed (slot 0) = 24800000Hz (slot req 400000Hz, actual 400000HZ div = 31)
[ 3481.278601] mmc_host mmc2: Bus speed (slot 0) = 24800000Hz (slot req 25000000Hz, actual 24800000HZ div = 0)
[ 3481.279396] PM: resume of devices complete after 241.388 msecs
[ 3481.358513] Restarting tasks ... done.
[ 3481.377766] PM: suspend exit 1970-01-01 00:59:15.332218333 UTC

5.1.1.3 resume_console节省时间

对比resume_console可以发现,mem要比freeze多210ms。

5.1.2 Ubuntu

此工具在Ubuntu上显示了更强大的功能。

支持了callgraph功能之后,更能清晰地分析每个设备或者子系统的suspend/resume占用的时间。

sudo ./analyze_suspend.py -rtcwake 10 -multi 5 20 -f -m mem
sudo ./analyze_suspend.py -rtcwake 10 -multi 5 20 -f -m freeze

在对比两种不同suspend模式后,发现freeze花费的时间要比mem少。这也符合预期,但是没有功耗数据?_?。

Suspend to RAM和Suspend to Idle分析,以及在HiKey上性能对比【转】

Suspend to RAM和Suspend to Idle分析,以及在HiKey上性能对比【转】

下面着重分析一下如何基于此工具分析。

5.1.3 工具界面总体分析

最上面显示Kernel Suspend Time和Kernel Resume Time,可以从总体上查看是否有回退或者进步。

再下面是一些缩放按钮。

然后就是基于timeline的图表,比对颜色示意图,可以清晰看出suspend prepare、suspend、suspend late、suspend irq、suspend machine、resume machine、resume irq、resume early、resume和resume complete的分布。

最下面是每个模块、子系统的详细函数调用图以及开始时间、消耗时间。

Suspend to RAM和Suspend to Idle分析,以及在HiKey上性能对比【转】

5.1.4 子系统、模块详细分析

选中一个模块,会在最下面显示详细的模块在suspend/resume各个阶段消费的时间,以及函数调用关系图。

Suspend to RAM和Suspend to Idle分析,以及在HiKey上性能对比【转】

5.1.5 缩放查看细节

ZOOM IN放大,ZOOMOUT缩小,ZOOM 1:1恢复原始尺寸。

通过在timeline图表,放大可以查看到更小的模块消耗的时间。从宏观到模块,再到函数消耗时间,逐步细化,很有利于分析。

Suspend to RAM和Suspend to Idle分析,以及在HiKey上性能对比【转】

如果发现某个函数占用时间较大,可以逐级展开。知道发现最终占用较大的函数,发现问题所在。

Suspend to RAM和Suspend to Idle分析,以及在HiKey上性能对比【转】

5.1.6 工具代码分析

首先从入口main开始,和大多数工具一样开始都是解析命令选项,打印help信息;将所有的待测对象参数和测试参数保存在sysvals;

# ----------------- MAIN --------------------
# exec start (skipped if script is loaded as library)
if __name__ == '__main__':
    cmd = ''
    cmdarg = ''
    multitest = {'run': False, 'count': 0, 'delay': 0}
    # loop through the command line arguments
    args = iter(sys.argv[1:])
    for arg in args:
        …

# just run a utility command and exit
    if(cmd != ''):
        if(cmd == 'status'):
            statusCheck()
        elif(cmd == 'fpdt'):
            if(sysvals.android):
                doError('cannot read FPDT on android device', False)
            getFPDT(True)
        elif(cmd == 'usbtopo'):
            if(sysvals.android):
                doError('cannot read USB topology '+\
                    'on an android device', False)
            detectUSB(True)
        elif(cmd == 'modes'):
            modes = getModes()
            print modes
        elif(cmd == 'usbauto'):
            setUSBDevicesAuto()
        elif(cmd == 'summary'):
            print("Generating a summary of folder \"%s\"" % cmdarg)
            runSummary(cmdarg, True)
        sys.exit()

# run test on android device
    if(sysvals.android):  注释此段代码可以在Android上支持callgraph
        #if(sysvals.usecallgraph): 
        #    doError('ftrace (-f) is not yet supported '+\
        #        'in the android kernel', False)
        if(sysvals.notestrun):
            doError('cannot analyze test files on the '+\
                'android device', False)

# if instructed, re-analyze existing data files
    if(sysvals.notestrun):  分析已有数据文件,不需要重新测试
        rerunTest()
        sys.exit()

# verify that we can run a test
    if(not statusCheck()):  检查测试条件是否满足
        print('Check FAILED, aborting the test run!')
        sys.exit()

if multitest['run']:  连续多次测试
        # run multiple tests in a separte subdirectory
        s = 'x%d' % multitest['count']
        subdir = datetime.now().strftime('suspend-'+s+'-%m%d%y-%H%M%S')
        os.mkdir(subdir)
        for i in range(multitest['count']):
            if(i != 0):
                print('Waiting %d seconds...' % (multitest['delay']))
                time.sleep(multitest['delay'])
            print('TEST (%d/%d) START' % (i+1, multitest['count']))
            runTest(subdir)  进行单次测试
            print('TEST (%d/%d) COMPLETE' % (i+1, multitest['count']))
        runSummary(subdir, False)  生成summary.html
    else:
        # run the test in the current directory
        runTest(".")

sysvals.android表示是否在Android设备进行测试。

sysvals.usecallgraph表示是否生成函数调用关系图。

sysvals.rtcwake表示是否使用rtc进行唤醒。

针对Ubuntu之类的host设备,测试进行的很顺利。但是针对Android设备,在callgraph还存在一点问题。

run_Test无疑作为核心,收集log信息(ftrace、dmesg),执行suspend/resume,生成输出文件(txt、html)。

def runTest(subdir):
    global sysvals

# prepare for the test
    if(not sysvals.android):  针对不同的待测设备,初始化ftrace
        initFtrace()
    else:
        initFtraceAndroid()
    sysvals.initTestOutput(subdir)  生成输出目录,输出文件名等。

vprint('Output files:\n    %s' % sysvals.dmesgfile)
    if(sysvals.usecallgraph or
        sysvals.usetraceevents or
        sysvals.usetraceeventsonly):
        vprint('    %s' % sysvals.ftracefile)
    vprint('    %s' % sysvals.htmlfile)

# execute the test  执行测试,实际上命令内容基本一致。只是针对Android设备,增加了adb shell '…'。
    if(not sysvals.android):
        executeSuspend()
    else:
        executeAndroidSuspend()

# analyze the data and create the html output
    print('PROCESSING DATA')
    if(sysvals.usetraceeventsonly):  3.15之后的版本,只需要通过ftrace即可获取足够信息。之前的版本的数据都存在dmesg中。
        # data for kernels 3.15 or newer is entirely in ftrace
        testruns = parseTraceLog()
    else:
        # data for kernels older than 3.15 is primarily in dmesg
        testruns = loadKernelLog()
        for data in testruns:
            parseKernelLog(data)
        if(sysvals.usecallgraph or sysvals.usetraceevents):
            appendIncompleteTraceLog(testruns)
    createHTML(testruns)  根据解析的数据生成html矢量图表

executeAndroidSuspend在Android设备上操作sysfs节点来配置ftrace,抓取log,suspend/resume,然后将log拉到主机。

def executeAndroidSuspend():
    global sysvals

# check to see if the display is currently off
    tp = sysvals.tpath
    out = os.popen(sysvals.adb+\
        ' shell dumpsys power | grep mScreenOn').read().strip()
    # if so we need to turn it on so we can issue a new suspend
    if(out.endswith('false')):
        print('Waking the device up for the test...')
        # send the KEYPAD_POWER keyevent to wake it up
        os.system(sysvals.adb+' shell input keyevent 26')
        # wait a few seconds so the user can see the device wake up
        time.sleep(3)
    # execute however many s/r runs requested
    for count in range(1,sysvals.execcount+1):
        # clear the kernel ring buffer just as we start
        os.system(sysvals.adb+' shell dmesg -c > /dev/null 2>&1')  清空dmesg
        # start ftrace
        if(sysvals.usetraceevents):
            print('START TRACING')
            os.system(sysvals.adb+" shell 'echo 1 > "+tp+"tracing_on'")  开始ftrace抓取
        # initiate suspend
        for count in range(1,sysvals.execcount+1):
            if(sysvals.usetraceevents):
                os.system(sysvals.adb+\
                    " shell 'echo SUSPEND START > "+tp+"trace_marker'")  写SUSPEND START到ftrace,作为开始标记。后面解析log,会以此为标记。
            if(sysvals.rtcwake):
                print('SUSPEND START')
                print('will autoresume in %d seconds' % sysvals.rtcwaketime)
                os.system(sysvals.adb+" shell 'echo +%d > /sys/class/rtc/rtc0/wakealarm'"%(sysvals.rtcwaketime))  设置wakeup resource
            else:
                print('SUSPEND START (press a key to resume)')

os.system(sysvals.adb+" shell 'echo "+sysvals.suspendmode+\
                " > "+sysvals.powerfile+"'")  进入suspend,之后就是resume
            # execution will pause here, then adb will exit
            while(True):  轮询adb shell pwd判断设备是否被唤醒
                check = os.popen(sysvals.adb+\
                    ' shell pwd 2>/dev/null').read().strip()
                if(len(check) > 0):
                    break
                time.sleep(1)
            if(sysvals.usetraceevents):
                os.system(sysvals.adb+" shell 'echo RESUME COMPLETE > "+tp+\
                    "trace_marker'")  写RESUME COMPLETE到ftrace,作为结束标记
        # return from suspend
        print('RESUME COMPLETE')
        # stop ftrace
        if(sysvals.usetraceevents):
            os.system(sysvals.adb+" shell 'echo 0 > "+tp+"tracing_on'") 关闭ftrace功能
            print('CAPTURING TRACE')
            os.system('echo "'+sysvals.teststamp+'" > '+sysvals.ftracefile)
            os.system(sysvals.adb+' shell cat '+tp+\
                'trace >> '+sysvals.ftracefile)  将/sys/kernel/debug/tracing/trace内容保存到本地log
        # grab a copy of the dmesg output
        print('CAPTURING DMESG')
        os.system('echo "'+sysvals.teststamp+'" > '+sysvals.dmesgfile)
        os.system(sysvals.adb+' shell dmesg >> '+sysvals.dmesgfile)  将dmesg保存到本地

parseTraceLog用于解析ftrace log,phase的判断是依据suspend_resume关键词;每个模块的开始结束是以device_pm_callback_start/device_pm_callback_end作为判断;还调用FTraceCallGraph进行函数调用关系的解析。

createHTML是这个工具真正NB的地方,对parseTraceLog结果进行了可视化,生成可缩放、查看细节的html文件。

6 对工具的改进

虽然工具非常强大,但是在使用中还是有一些视角没有覆盖到。所以做了一些改进。

在Android上使能rtcwake;在Android上使能callgraph;针对多次测试生成csv比较不同phase消耗时间,比summary.html更细化;这对每次测试给出Phase时间和每个Phase内Device消耗时间。

6.1 Android上使能rtcwake

https://github.com/arnoldlu/common-use/commit/a862d8c2a4f9bd005c516c6b61b394386b882217

可以在Android上使用rtc作为唤醒源,可以在没有实体按键的设备上进行测试。

6.2 Android上使能callgraph

https://github.com/arnoldlu/common-use/commit/f8e288753a472cf48ccc0e9d7ffc67978c7d165e

如果没有callgraph只能显示Phase级别的信息,不能显示每个device的信息以及内部函数耗费的时间。

6.3 单次测试summary结果

https://github.com/arnoldlu/common-use/commit/53c270669bb0dfaada53e29852999d5367ec65da

在每次测试目录下,生成一个summary_phase_dev.csv文件。可以直观的看到不同Phase、不同device消耗的时间。

Suspend to RAM和Suspend to Idle分析,以及在HiKey上性能对比【转】

如果想要发现那个模块消耗最大时间,可以使用Excel的Filter功能。比如想看suspend_prepare下Device消耗时间有大到小排列。

这样就可以找出每个Phase中消耗资源大户。

Suspend to RAM和Suspend to Idle分析,以及在HiKey上性能对比【转】

6.4 多次测试summary结果

https://github.com/arnoldlu/common-use/commit/d162c4827a0cdc50fe94d3f1303af682b387dc3d

生成summary_phase.csv文件,按每次测试的不同phase显示耗费时间。

可以比较不同测试phase的时间耗费,看出哪一个phase存在回退现象。

Suspend to RAM和Suspend to Idle分析,以及在HiKey上性能对比【转】

6.5 suspend/resume起止时间点判断

analyze_suspend.py在解析log的时候,以SUSPEND START作为起点,以RESUME COMPLETE为终点。

在发送SUSPEND START之后,触发suspend动作。在这期间,如果host存在一定抢占,会增加suspend时间。

然后通poll设备的adb状态,来判断是否resume。一方面,adb可用状态要在resume结束之后,另一方面,在最坏的情况下,可能存在1s的误差,这对于毫秒级的resume来说是非常严重的一个结果。

最后发送RESUME COMPLETE作为结束。

if(sysvals.usetraceevents):
    os.system(sysvals.adb+\
        " shell 'echo SUSPEND START > "+tp+"trace_marker'")
print('SUSPEND START (press a key on the device to resume)')
os.system(sysvals.adb+" shell 'echo "+sysvals.suspendmode+\
    " > "+sysvals.powerfile+"'")
# execution will pause here, then adb will exit
while(True):
    check = os.popen(sysvals.adb+\
        ' shell pwd 2>/dev/null').read().strip()
    if(len(check) > 0):
        break
    time.sleep(1)
if(sysvals.usetraceevents):
    os.system(sysvals.adb+" shell 'echo RESUME COMPLETE > "+tp+\
        "trace_marker'")

更好的方式是在enter_state的开头结尾加ftrace,然后解析的时候以此为标记。

@@ -486,6 +496,7 @@ static int enter_state(suspend_state_t state)
{
        int error;
 
+       trace_suspend_resume(TPS("enter_state"), state, true);
        trace_suspend_resume(TPS("suspend_enter"), state, true);
        if (state == PM_SUSPEND_FREEZE) {
#ifdef CONFIG_PM_DEBUG
@@ -532,6 +543,7 @@ static int enter_state(suspend_state_t state)
        suspend_finish();
  Unlock:
        mutex_unlock(&pm_mutex);
+       trace_suspend_resume(TPS("enter_state"), state, false);
        return error;
}

7 分析步骤

本着从宏观到微观的进阶,一步步分找出可以优化的点。

下面是从开始一次测试到每次测试到suspend/resume不同phase,再到每个phase里面device callback的关系。

Suspend to RAM和Suspend to Idle分析,以及在HiKey上性能对比【转】

下面是每一次正常suspend/resume的流程,之前每个阶段函数分析也可以看出他们的对称关系。

Suspend to RAM和Suspend to Idle分析,以及在HiKey上性能对比【转】

在修改了工具对于suspend和resume时间判断的bug过后,得到了一组的数据。

Suspend to RAM和Suspend to Idle分析,以及在HiKey上性能对比【转】

分析一下稳定性,均方差比较小,还算比较稳定。数据稳定之后,就可以进行详细分析了。

Suspend to RAM和Suspend to Idle分析,以及在HiKey上性能对比【转】

下面查看每次测试的每个phase数据,可以看出每个phase数据的稳定性,以及每个phase费时占比。找出费时大户,suspend_prepare、suspend、suspend_machine、resume_machine、resume、resume_complete。

Suspend to RAM和Suspend to Idle分析,以及在HiKey上性能对比【转】

针对上述六个phase,列出Top 10设备或者子系统。

从下图可以看出,freeze_processes、sync_filesystems、mmc0、mmc2、CUP0~7、resume_console、tsensor是需要重点分析的设备。

Suspend to RAM和Suspend to Idle分析,以及在HiKey上性能对比【转】Suspend to RAM和Suspend to Idle分析,以及在HiKey上性能对比【转】

Suspend to RAM和Suspend to Idle分析,以及在HiKey上性能对比【转】Suspend to RAM和Suspend to Idle分析,以及在HiKey上性能对比【转】

Suspend to RAM和Suspend to Idle分析,以及在HiKey上性能对比【转】Suspend to RAM和Suspend to Idle分析,以及在HiKey上性能对比【转】

不区分phase列出Top 30如下,下面逐一分析可优化的空间。

Suspend to RAM和Suspend to Idle分析,以及在HiKey上性能对比【转】

7.1 resume_console

adb shell 'echo N > /sys/module/printk/parameters/console_suspend'
adb shell 'cat /sys/module/printk/parameters/console_suspend'

先看一下resume_console流程函数:

void resume_console(void)
{
    if (!console_suspend_enabled)
        return;
    down_console_sem();  获取console_sem和console_lock_dep_map
    console_suspended = 0;
    console_unlock();  
}

通过分析ftrace发现,主要时间消耗在console_unlock中。因为在console_lock被占用期间,有相当一部分由printk缓存的log。所以在释放锁之前需要将其处理掉。

void console_unlock(void)
{
    static char ext_text[CONSOLE_EXT_LOG_MAX];
    static char text[LOG_LINE_MAX + PREFIX_MAX];
    static u64 seen_seq;
    unsigned long flags;
    bool wake_klogd = false;
    bool do_cond_resched, retry;

trace_console_lock("console_unlock start", strlen("console_unlock start"));\
    if (console_suspended) {
        up_console_sem();
        return;
    }

/*
     * Console drivers are called under logbuf_lock, so
     * @console_may_schedule should be cleared before; however, we may
     * end up dumping a lot of lines, for example, if called from
     * console registration path, and should invoke cond_resched()
     * between lines if allowable.  Not doing so can cause a very long
     * scheduling stall on a slow console leading to RCU stall and
     * softlockup warnings which exacerbate the issue with more
     * messages practically incapacitating the system.
     */
    do_cond_resched = console_may_schedule;
    console_may_schedule = 0;

/* flush buffered message fragment immediately to console */
    console_cont_flush(text, sizeof(text));
again:
    for (;;) {  如果默认的LOGLEVEL定的比较高,即优先级低,则会有相当多的log需要打印。占用很多时间。
        …
    }
    console_locked = 0;

/* Release the exclusive_console once it is used */
    if (unlikely(exclusive_console))
        exclusive_console = NULL;

raw_spin_unlock(&logbuf_lock);

up_console_sem();  释放console_sem和console_lock_dep_map

/*
     * Someone could have filled up the buffer again, so re-check if there's
     * something to flush. In case we cannot trylock the console_sem again,
     * there's a new owner and the console_unlock() from them will do the
     * flush, no worries.
     */
    raw_spin_lock(&logbuf_lock);
    retry = console_seq != log_next_seq;
    raw_spin_unlock_irqrestore(&logbuf_lock, flags);

if (retry && console_trylock())
        goto again;

if (wake_klogd)
        wake_up_klogd();
    trace_console_lock("console_unlock end", strlen("console_unlock end"));\
}

那么问题就变得简单了,减少printk量就可以了。

通过cat /proc/sys/kernel/printk可以得到。在kernel/sysctl.c中有其实现。

7    4    1    7

这四个值分别对应:

#define console_loglevel (console_printk[0])
#define default_message_loglevel (console_printk[1])
#define minimum_console_loglevel (console_printk[2])
#define default_console_loglevel (console_printk[3])

又对应到:

int console_printk[4] = {
    CONSOLE_LOGLEVEL_DEFAULT,    /* console_loglevel */
    MESSAGE_LOGLEVEL_DEFAULT,    /* default_message_loglevel */
    CONSOLE_LOGLEVEL_MIN,        /* minimum_console_loglevel */
    CONSOLE_LOGLEVEL_DEFAULT,    /* default_console_loglevel */
};

/* We show everything that is MORE important than this.. */
#define CONSOLE_LOGLEVEL_SILENT  0 /* Mum's the word */
#define CONSOLE_LOGLEVEL_MIN     1 /* Minimum loglevel we let people use */
#define CONSOLE_LOGLEVEL_QUIET     4 /* Shhh ..., when booted with "quiet" */
#define CONSOLE_LOGLEVEL_DEFAULT 7 /* anything MORE serious than KERN_DEBUG */
#define CONSOLE_LOGLEVEL_DEBUG    10 /* issue debug messages */
#define CONSOLE_LOGLEVEL_MOTORMOUTH 15    /* You can't shut this one up */

可知只要内核log优先级高于KERN_DEBUG都会被打印。由下表可知几乎所有的log都会被打印。这就会造成printk相当繁忙,console_unlock会处理相当多信息。

#define KERN_EMERG    KERN_SOH "0"    /* system is unusable */
#define KERN_ALERT    KERN_SOH "1"    /* action must be taken immediately */
#define KERN_CRIT    KERN_SOH "2"    /* critical conditions */
#define KERN_ERR    KERN_SOH "3"    /* error conditions */
#define KERN_WARNING    KERN_SOH "4"    /* warning conditions */
#define KERN_NOTICE    KERN_SOH "5"    /* normal but significant condition */
#define KERN_INFO    KERN_SOH "6"    /* informational */
#define KERN_DEBUG    KERN_SOH "7"    /* debug-level messages */

#define KERN_DEFAULT    KERN_SOH "d"    /* the default kernel loglevel */

想解决也很简单,提高console_loglevel的优先级。

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
old mode 100644
new mode 100755
index e7e586b..b927d67
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -60,7 +60,7 @@ extern void printascii(char *);
#endif
 
int console_printk[4] = {
-       CONSOLE_LOGLEVEL_DEFAULT,       /* console_loglevel */
+       CONSOLE_LOGLEVEL_QUIET, /* console_loglevel */
        MESSAGE_LOGLEVEL_DEFAULT,       /* default_message_loglevel */
        CONSOLE_LOGLEVEL_MIN,           /* minimum_console_loglevel */
        CONSOLE_LOGLEVEL_DEFAULT,       /* default_console_loglevel */

在进行修改后,再来进行对比测试。可以看出消耗时间得到显著提升,优化后的resume_complete时间基本上可以忽略不计。

7, mem
    Line 748: resume_complete,resume_console[3],248.54900000002544
    Line 748: resume_complete,resume_console[3],248.6340000000382
    Line 748: resume_complete,resume_console[3],248.26499999994667
    Line 748: resume_complete,resume_console[3],248.3510000000706
    Line 748: resume_complete,resume_console[3],248.42499999999745

7, freeze
    Line 996: resume_complete,resume_console[1],76.18400000001202
    Line 996: resume_complete,resume_console[1],76.19500000009793
    Line 996: resume_complete,resume_console[1],76.3280000001032
    Line 996: resume_complete,resume_console[1],76.1689999999362
    Line 996: resume_complete,resume_console[1],76.19999999997162

4, freeze
    Line 996: resume_complete,resume_console[1],0.1010000000007949
    Line 996: resume_complete,resume_console[1],0.10499999999069587
    Line 996: resume_complete,resume_console[1],0.09799999997994746
    Line 996: resume_complete,resume_console[1],0.1010000000007949
    Line 996: resume_complete,resume_console[1],0.10000000003174137

4, mem
    Line 749: resume_complete,resume_console[3],0.3370000000586515
    Line 749: resume_complete,resume_console[3],0.33800000005612674
    Line 749: resume_complete,resume_console[3],0.37700000007134804
    Line 749: resume_complete,resume_console[3],0.3359999999474894
    Line 749: resume_complete,resume_console[3],0.3429999999298161

7.2 mmc suspend/resuem分析

从下图可知,mmc相关suspend/resume主要在mmc0:0001和mmc2:0001两个设备的suspend/resume。下面重点分析这两个设备的suspend/resume回调函数。

Suspend to RAM和Suspend to Idle分析,以及在HiKey上性能对比【转】

在执行suspend过程中,先将bus上面的设备driver先suspend,然后在suspend bus。

在resume时,过程相反,先bus resume,然后再逐个设备driver resume。

mmc0:0001

那就来看看bus和各个设备耗费的时间:

4013.868837 |   4)    sh-4511     |               |  /* device_pm_callback_start: mmcblk mmc0:0001, parent: mmc0, [suspend] */
4013.868893 |   4)    sh-4511     |               |  /* device_pm_callback_end: mmcblk mmc0:0001, err=0 */
4013.869000 |   4)    sh-4511     |               |  /* device_pm_callback_start: block mmcblk0, parent: mmc0:0001, [suspend] */
4013.869053 |   4)    sh-4511     |               |  /* device_pm_callback_end: block mmcblk0, err=0 */
4013.889229 |   5)    sh-4511     |               |  /* device_pm_callback_start: mmcblk mmc0:0001, parent: mmc0, bus [suspend] */
4013.914631 |   0)    sh-4511     |               |  /* device_pm_callback_end: mmcblk mmc0:0001, err=0 */
4022.787571 |   0)    sh-4511     |               |  /* device_pm_callback_start: mmcblk mmc0:0001, parent: mmc0, bus [resume] */
4022.886749 |   0)    sh-4511     |               |  /* device_pm_callback_end: mmcblk mmc0:0001, err=0 */
4023.059198 |   0)    sh-4511     |               |  /* device_pm_callback_start: block mmcblk0, parent: mmc0:0001, [resume] */
4023.059270 |   0)    sh-4511     |               |  /* device_pm_callback_end: block mmcblk0, err=0 */
4023.059398 |   0)    sh-4511     |               |  /* device_pm_callback_start: mmcblk mmc0:0001, parent: mmc0, [resume] */
4023.059830 |   0)    sh-4511     |               |  /* device_pm_callback_end: mmcblk mmc0:0001, err=0 */

可以看出driver的suspend/resume并没有耗费太多时间,主要在mmc bus的suspend/resume耗费了太多时间。

在drivers/mmc/core/bus.c中

static struct bus_type mmc_bus_type = {
    .name        = "mmc",
    .dev_groups    = mmc_dev_groups,
    .match        = mmc_bus_match,
    .uevent        = mmc_bus_uevent,
    .probe        = mmc_bus_probe,
    .remove        = mmc_bus_remove,
    .shutdown    = mmc_bus_shutdown,
    .pm        = &mmc_bus_pm_ops,
};

mmc_bus_pm_bus对应的suspend/resume函数如下:

static const struct dev_pm_ops mmc_bus_pm_ops = {
    SET_RUNTIME_PM_OPS(mmc_runtime_suspend, mmc_runtime_resume, NULL)
    SET_SYSTEM_SLEEP_PM_OPS(mmc_bus_suspend, mmc_bus_resume)
};

mmc bus的suspend/resume如下:

static int mmc_bus_suspend(struct device *dev)
{
    struct mmc_card *card = mmc_dev_to_card(dev);
    struct mmc_host *host = card->host;
    int ret;

ret = pm_generic_suspend(dev);  对应设备驱动的suspend回调函数。
    if (ret)
        return ret;

ret = host->bus_ops->suspend(host);   这里的host指的是mmc_host,bus_ops指的是mmc_ops。
    return ret;
}

static int mmc_bus_resume(struct device *dev)
{
    struct mmc_card *card = mmc_dev_to_card(dev);
    struct mmc_host *host = card->host;
    int ret;

ret = host->bus_ops->resume(host);  这里的host指的是mmc_host,bus_ops指的是mmc_ops。

if (ret)
        pr_warn("%s: error %d during resume (card was removed?)\n",
            mmc_hostname(host), ret);

ret = pm_generic_resume(dev);  对应设备的驱动的resume回调函数。
    return ret;
}

pm_generic_suspend和pm_generic_resume是对子系统设备的通用回调函数。

int pm_generic_suspend(struct device *dev)
{
    const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;

return pm && pm->suspend ? pm->suspend(dev) : 0;
}

int pm_generic_resume(struct device *dev)
{
    const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;

return pm && pm->resume ? pm->resume(dev) : 0;
}

从中可以看出,如果对应设备存在dev->driver->pm->suspend和dev->driver->pm->resume则,调用回调函数。

mmc_bus_suspend

mmc_bus_suspend花费了大概25.4ms。下面是ftrace中截取的一段,从中可以看出时间主要消耗在struct mmc_bus_ops mmc_ops的.suspend回调函数。

/*            */   mmc_bus_suspend() {
/*            */     pm_generic_suspend() {
/*! 307.552 us*/     }
/*            */     mmc_suspend() {
/** 25060.78 us*/      } /* mmc_suspend */
/** 25378.28 us*/    } /* mmc_bus_suspend */

mmc_bus_resume

mmc_resume消耗了大部分时间,整个流程才99.158ms。

/*              */   mmc_bus_resume() {
/*              */     mmc_resume() {
/* * 97167.39 us*/     }
/*              */     pm_generic_resume() {
/* # 1980.104 us*/     }
/* * 99158.12 us*/   }

mmc_suspend

对应的host->bus_ops,即mmc_ops。在host下的设备都suspend之后,suspend mmc_host。

在mmc_host resume之后,才能进行设备的resume。

static const struct mmc_bus_ops mmc_ops = {
    .remove = mmc_remove,
    .detect = mmc_detect,
    .suspend = mmc_suspend,
    .resume = mmc_resume,
    .runtime_suspend = mmc_runtime_suspend,
    .runtime_resume = mmc_runtime_resume,
    .alive = mmc_alive,
    .shutdown = mmc_shutdown,
    .reset = mmc_reset,
};

通过分析ftrace.txt文件,发现其中msleep花费了17.1ms,这里是存在问题的。

int __mmc_switch(struct mmc_card *card, u8 set, u8 index, u8 value,
        unsigned int timeout_ms, bool use_busy_signal, bool send_status,
        bool ignore_crc)
{
    struct mmc_host *host = card->host;
    int err;
    struct mmc_command cmd = {0};
    unsigned long timeout;
    u32 status = 0;
    bool use_r1b_resp = use_busy_signal;

mmc_retune_hold(host);


        /*
         * We are not allowed to issue a status command and the host
         * does'nt support MMC_CAP_WAIT_WHILE_BUSY, then we can only
         * rely on waiting for the stated timeout to be sufficient.
         */
        if (!send_status) {
            mmc_delay(timeout_ms);
            goto out;
        }


}

mmc_resume

通过分析ftrece.txt,可以发现mmc_resume存在4个msleep,共消耗了12646.35 +14260.78 +13881.66 +15093.22 =55.882 ms。

关于mmc_ops的suspend/resume/runtime_suspend/runtime_resume的探讨

先来看看这四个函数的,其流程受到MMC_CAP_AGGRESSIVE_PM和MMC_CAP_RUNTIME_RESUME两个flag的控制。执行的实体都是_mmc_suspend、_mmc_resume。

static int mmc_suspend(struct mmc_host *host)
{
    int err;

err = _mmc_suspend(host, true);
    if (!err) {
        pm_runtime_disable(&host->card->dev);
        pm_runtime_set_suspended(&host->card->dev);
    }

return err;
}

static int mmc_resume(struct mmc_host *host)
{
    int err = 0;

if (!(host->caps & MMC_CAP_RUNTIME_RESUME)) {
        err = _mmc_resume(host);
        pm_runtime_set_active(&host->card->dev);
        pm_runtime_mark_last_busy(&host->card->dev);
    }
    pm_runtime_enable(&host->card->dev);

return err;
}

static int mmc_runtime_suspend(struct mmc_host *host)
{
    int err;

if (!(host->caps & MMC_CAP_AGGRESSIVE_PM))
        return 0;

err = _mmc_suspend(host, true);
    if (err)
        pr_err("%s: error %d doing aggressive suspend\n",
            mmc_hostname(host), err);

return err;
}

static int mmc_runtime_resume(struct mmc_host *host)
{
    int err;

if (!(host->caps & (MMC_CAP_AGGRESSIVE_PM | MMC_CAP_RUNTIME_RESUME)))
        return 0;

err = _mmc_resume(host);
    if (err)
        pr_err("%s: error %d doing aggressive resume\n",
            mmc_hostname(host), err);

return 0;
}

1.如果两flag都没有定义,则runtime_suspend和runtim_resume都是空函数。起作用的就是跟随系统的suspend/resume流程。

2.如果只定义了MMC_CAP_RUNTIME_RESUME,则不会runtime_suspend。并且在系统resume的时候,不会执行resume回调函数。只会在根据需要执行runtime_resume。使用runtime_resume代替了resume。

3.如果只定义了MMC_CAP_AGGRESSIVE_PM ,则suspend/resume跟随系统suspend/resume流程。并且runtime_suspend/resume_resume也根据实际情况执行。一切正常。

4.如果两者都定义了,既可以suspend也可以runtime_suspend,但是只能runtime_resume,不能跟随系统resume流程执行resume回调函数。

也就是说MMC_CAP_AGGRESSIVE_PM 则runtime_suspend/runtime_resume都可用,MMC_CAP_RUNTIME_RESUME则只能使用runtime_resume执行resume功能。

那么就来看一下,在应用了MMC_CAP_RUNTIME_RESUME之后效果如何。

mmc0:0001增加runtime-suspend属性:

diff --git a/arch/arm64/boot/dts/hisilicon/hi6220.dtsi b/arch/arm64/boot/dts/hisilicon/hi6220.dtsi
old mode 100644
new mode 100755
index 09e2c71..2cec392
--- a/arch/arm64/boot/dts/hisilicon/hi6220.dtsi
+++ b/arch/arm64/boot/dts/hisilicon/hi6220.dtsi
@@ -801,6 +801,7 @@
                        clock-names = "ciu", "biu";
                        resets = <&sys_ctrl PERIPH_RSTDIS0_MMC0>;
                        bus-width = <0x8>;
+                      runtime-suspend;
                        vmmc-supply = <&ldo19>;
                        pinctrl-names = "default";
                        pinctrl-0 = <&emmc_pmx_func &emmc_clk_cfg_func

修改DeviceTree解析文件,增加MMC_CAP_RUNTIME_RESUME。

index 094202c..35fd7b5
--- a/drivers/mmc/host/dw_mmc.c
+++ b/drivers/mmc/host/dw_mmc.c
@@ -2922,6 +2922,10 @@ static struct dw_mci_board *dw_mci_parse_dt(struct dw_mci *host)
                dev_info(dev, "supports-highspeed property is deprecated.\n");
                pdata->caps |= MMC_CAP_SD_HIGHSPEED | MMC_CAP_MMC_HIGHSPEED;
        }
+       if (of_find_property(np, "runtime-suspend", NULL)) {
+               dev_info(dev, "supports-highspeed property is deprecated.\n");
+               pdata->caps |=  MMC_CAP_RUNTIME_RESUME;
+       }
 
        return pdata;
}

修改后mmc0:0001的resume达到了预期,mmc_resume没有被执行。

针对统计结果,效果明显。

Suspend to RAM和Suspend to Idle分析,以及在HiKey上性能对比【转】

虽然没有在系统resume过程中执行,但是mmc0:0001总要resume。只不过稍微延迟了,不再这个工具统计之中。

延后执行的mmc0:0001的resume耗费了72.317ms,也和之前的差不多。实际上没有对整个流程作出实质贡献,只是不在统计数据之内。

[32m[   32.486851] [0m[33mmmc_host mmc0[0m: Bus speed (slot 0) = 24800000Hz (slot req 400000Hz, actual 400000HZ div = 31 caps=40138143 caps2=0)
[32m[   32.500871] [0m[33mueventd[0m: fixup /sys/devices/system/cpu/cpu4/cpufreq/scaling_max_freq 1000 1000 664
[32m[   32.501305] [0m[33mueventd[0m: fixup /sys/devices/system/cpu/cpu4/cpufreq/scaling_min_freq 1000 1000 664
[32m[   32.540313] [0m[33mueventd[0m: fixup /sys/devices/system/cpu/cpu5/cpufreq/scaling_max_freq 1000 1000 664
[32m[   32.540747] [0m[33mueventd[0m: fixup /sys/devices/system/cpu/cpu5/cpufreq/scaling_min_freq 1000 1000 664
[32m[   32.559168] [0m[33mmmc_host mmc0[0m: Bus speed (slot 0) = 51756522Hz (slot req 52000000Hz, actual 51756522HZ div = 0 caps=40138143 caps2=0)

mmc2:0001

mmc2:0001和mmc0:0001的区别在于不同的mmc_bus_ops,mmc2:0001是SDIO接口,对应的应该是mmc_sdio_ops。

4013.876306 |   4)    sh-4511     |               |  /* device_pm_callback_start: mmc mmc2:0001, parent: mmc2, [suspend] */
4013.876360 |   4)    sh-4511     |               |  /* device_pm_callback_end: mmc mmc2:0001, err=0 */
4013.876397 |   4)    sh-4511     |               |  /* device_pm_callback_start: sdio mmc2:0001:1, parent: mmc2:0001, [suspend] */
4013.876437 |   4)    sh-4511     |               |  /* device_pm_callback_end: sdio mmc2:0001:1, err=0 */
4013.876470 |   4)    sh-4511     |               |  /* device_pm_callback_start: wl1271_sdio mmc2:0001:2, parent: mmc2:0001, [suspend] */
4013.876525 |   4)    sh-4511     |               |  /* device_pm_callback_end: wl1271_sdio mmc2:0001:2, err=0 */
4013.876556 |   4)    sh-4511     |               |  /* device_pm_callback_start: wl18xx_driver wl18xx.1.auto, parent: mmc2:0001:2, [suspend] */
4013.876596 |   4)    sh-4511     |               |  /* device_pm_callback_end: wl18xx_driver wl18xx.1.auto, err=0 */
4013.881676 |   4)    sh-4511     |               |  /* device_pm_callback_start: wl18xx_driver wl18xx.1.auto, parent: mmc2:0001:2, bus [suspend] */
4013.881698 |   4)    sh-4511     |               |  /* device_pm_callback_end: wl18xx_driver wl18xx.1.auto, err=0 */
4013.881740 |   4)    sh-4511     |               |  /* device_pm_callback_start: wl1271_sdio mmc2:0001:2, parent: mmc2:0001, bus [suspend] */
4013.881765 |   4)    sh-4511     |               |  /* device_pm_callback_end: wl1271_sdio mmc2:0001:2, err=0 */
4013.882582 |   4)    sh-4511     |               |  /* device_pm_callback_start: sdio mmc2:0001:1, parent: mmc2:0001, bus [suspend] */
4013.882603 |   4)    sh-4511     |               |  /* device_pm_callback_end: sdio mmc2:0001:1, err=0 */
4013.882645 |   4)    sh-4511     |               |  /* device_pm_callback_start: mmc mmc2:0001, parent: mmc2, bus [suspend] */
4013.885524 |   4)    sh-4511     |               |  /* device_pm_callback_end: mmc mmc2:0001, err=0 */
4022.888667 |   0)    sh-4511     |               |  /* device_pm_callback_start: mmc mmc2:0001, parent: mmc2, bus [resume] */
4023.042980 |   0)    sh-4511     |               |  /* device_pm_callback_end: mmc mmc2:0001, err=0 */
4023.043021 |   0)    sh-4511     |               |  /* device_pm_callback_start: sdio mmc2:0001:1, parent: mmc2:0001, bus [resume] */
4023.043037 |   0)    sh-4511     |               |  /* device_pm_callback_end: sdio mmc2:0001:1, err=0 */
4023.043067 |   0)    sh-4511     |               |  /* device_pm_callback_start: wl1271_sdio mmc2:0001:2, parent: mmc2:0001, bus [resume] */
4023.043089 |   0)    sh-4511     |               |  /* device_pm_callback_end: wl1271_sdio mmc2:0001:2, err=0 */
4023.043128 |   0)    sh-4511     |               |  /* device_pm_callback_start: wl18xx_driver wl18xx.1.auto, parent: mmc2:0001:2, bus [resume] */
4023.043151 |   0)    sh-4511     |               |  /* device_pm_callback_end: wl18xx_driver wl18xx.1.auto, err=0 */
4023.048824 |   0)    sh-4511     |               |  /* device_pm_callback_start: wl18xx_driver wl18xx.1.auto, parent: mmc2:0001:2, [resume] */
4023.048877 |   0)    sh-4511     |               |  /* device_pm_callback_end: wl18xx_driver wl18xx.1.auto, err=0 */
4023.048916 |   0)    sh-4511     |               |  /* device_pm_callback_start: wl1271_sdio mmc2:0001:2, parent: mmc2:0001, [resume] */
4023.048979 |   0)    sh-4511     |               |  /* device_pm_callback_end: wl1271_sdio mmc2:0001:2, err=0 */
4023.049011 |   0)    sh-4511     |               |  /* device_pm_callback_start: sdio mmc2:0001:1, parent: mmc2:0001, [resume] */
4023.049074 |   0)    sh-4511     |               |  /* device_pm_callback_end: sdio mmc2:0001:1, err=0 */
4023.049113 |   0)    sh-4511     |               |  /* device_pm_callback_start: mmc mmc2:0001, parent: mmc2, [resume] */
4023.049165 |   0)    sh-4511     |               |  /* device_pm_callback_end: mmc mmc2:0001, err=0 */

由下可知不同部分在于mmc_host的suspend/resume,pm_generic_suspend/pm_generic_resume基本上耗费的时间都很少。

所以重点看看mmc_sdio_suspend和mmc_sdio_resume两个函数。

/*              */    mmc_bus_suspend() {
/*   0.833 us   */      pm_generic_suspend();
/*              */      mmc_sdio_suspend() {
/* # 2854.687 us*/      }
/* # 2864.115 us*/    }

/*              */    mmc_bus_resume() {
/*              */      mmc_sdio_resume() {
/* @ 154277.3 us*/      }
/*   1.563 us   */      pm_generic_resume();
/* @ 154290.3 us*/    }

mmc_sdio_suspend

时间很短,不关注。

mmc_sdio_resume

mmc2:001的mmc_bus_resume时间达到154.313,mmc_sdio_resume包含三个msleep共75331.82+15953.43+14369.58=105654.83us=105.654ms。

考虑:是否可以将SDIO的resume也像MMC那样延后执行呢?

7.3 CPU_OFF/CPU_ON

在分析了resome_console和mmc之后,再来看一下CPU_OFF/CPU_ON的执行过程。

在disable_nonboot_cpus中选取first_cpu,除此之外的所有for_each_online_cpu都会被_cpu_down,并且将其放到frozen_cpus上。

在enable_nonboot_cpus中,遍历frozen_cpus,将其_cpu_up。

针对HiKey,真个流程就是对CPU 1-7进行关闭、打开的操作,所以重点分析一下_cpu_down和_cpu_up。

耗时最大的三个地方都用粗体下划线标出,除了发送状态通知之外,还有rcu sync处理。

对cpu_chain上所有注册notifier,逐个执行回调函数notifier_call,根据action进行处理,这是一个很耗时的过程。

/* Requires cpu_add_remove_lock to be held */
static int _cpu_down(unsigned int cpu, int tasks_frozen)
{
    int err, nr_calls = 0;
    void *hcpu = (void *)(long)cpu;
    unsigned long mod = tasks_frozen ? CPU_TASKS_FROZEN : 0;
    struct take_cpu_down_param tcd_param = {
        .mod = mod,
        .hcpu = hcpu,
    };

if (num_online_cpus() == 1)  如果online只有一个CPU,则无法再进行down操作。
        return -EBUSY;

if (!cpu_online(cpu))  如果当前CPU没有online,则无需进行down。
        return -EINVAL;

cpu_hotplug_begin();  取得cpu_hotplug.lock锁

err = __cpu_notify(CPU_DOWN_PREPARE | mod, hcpu, -1, &nr_calls);  在cpu_chain上发从CPU_DOWN_PREPARE状态。
    if (err) {
        nr_calls--;
        __cpu_notify(CPU_DOWN_FAILED | mod, hcpu, nr_calls, NULL);
        pr_warn("%s: attempt to take down CPU %u failed\n",
            __func__, cpu);
        goto out_release;
    }

/*
     * By now we've cleared cpu_active_mask, wait for all preempt-disabled
     * and RCU users of this state to go away such that all new such users
     * will observe it.
     *
     * For CONFIG_PREEMPT we have preemptible RCU and its sync_rcu() might
     * not imply sync_sched(), so wait for both.
     *
     * Do sync before park smpboot threads to take care the rcu boost case.
     */
    if (IS_ENABLED(CONFIG_PREEMPT))
        synchronize_rcu_mult(call_rcu, call_rcu_sched);
    else
        synchronize_rcu();

smpboot_park_threads(cpu);  将此CPU的由kthread_create创建的线程设置为PARKED。

/*
     * Prevent irq alloc/free while the dying cpu reorganizes the
     * interrupt affinities.
     */
    irq_lock_sparse();

/*
     * So now all preempt/rcu users must observe !cpu_active().
     */
    err = stop_machine(take_cpu_down, &tcd_param, cpumask_of(cpu));
    if (err) {
        /* CPU didn't die: tell everyone.  Can't complain. */
        cpu_notify_nofail(CPU_DOWN_FAILED | mod, hcpu);
        irq_unlock_sparse();
        goto out_release;
    }
    BUG_ON(cpu_online(cpu));  如果指定的CPU还处于online状态,则触发kernel panic。

/*
     * The migration_call() CPU_DYING callback will have removed all
     * runnable tasks from the cpu, there's only the idle task left now
     * that the migration thread is done doing the stop_machine thing.
     *
     * Wait for the stop thread to go away.
     */
    while (!per_cpu(cpu_dead_idle, cpu))
        cpu_relax();
    smp_mb(); /* Read from cpu_dead_idle before __cpu_die(). */
    per_cpu(cpu_dead_idle, cpu) = false;

/* Interrupts are moved away from the dying cpu, reenable alloc/free */
    irq_unlock_sparse();

hotplug_cpu__broadcast_tick_pull(cpu);
    /* This actually kills the CPU. */
    __cpu_die(cpu);  调用底层架构相关的cpu_kill回调函数。

/* CPU is completely dead: tell everyone.  Too late to complain. */
    tick_cleanup_dead_cpu(cpu);
    cpu_notify_nofail(CPU_DEAD | mod, hcpu);  通知完成offline动作的处理器状态为CPU_DEAD。

check_for_tasks(cpu);

out_release:
    cpu_hotplug_done();  释放cpu_hotplug.lock锁。
    trace_sched_cpu_hotplug(cpu, err, 0);
    if (!err)
        cpu_notify_nofail(CPU_POST_DEAD | mod, hcpu);
    return err;
}

/* Requires cpu_add_remove_lock to be held */
static int _cpu_up(unsigned int cpu, int tasks_frozen)
{
    int ret, nr_calls = 0;
    void *hcpu = (void *)(long)cpu;
    unsigned long mod = tasks_frozen ? CPU_TASKS_FROZEN : 0;
    struct task_struct *idle;

cpu_hotplug_begin();  如果没有其他进程占有,则退出,执行后面的工作;如果被占用,则设置这个进程为TASK_INTERRUPTIBLE,等待结束。

if (cpu_online(cpu) || !cpu_present(cpu)) {  如果该CPU已经online,则没有必要执行up;或者非present,则无法up。
        ret = -EINVAL;
        goto out;
    }

idle = idle_thread_get(cpu);  给指定CPU生成一个idle线程
    if (IS_ERR(idle)) {
        ret = PTR_ERR(idle);
        goto out;
    }

ret = smpboot_create_threads(cpu);  创建一个用于管理CPU hotplug动作的线程
    if (ret)
        goto out;

ret = __cpu_notify(CPU_UP_PREPARE | mod, hcpu, -1, &nr_calls);  通知cpu_chain中的处理器,当前正在online的CPU状态为CPU_UP_PREPARE。
    if (ret) {
        nr_calls--;
        pr_warn("%s: attempt to bring up CPU %u failed\n",
            __func__, cpu);
        goto out_notify;
    }

/* Arch-specific enabling code. */
    ret = __cpu_up(cpu, idle);  调用更底层的使能CPU操作。

if (ret != 0)
        goto out_notify;
    BUG_ON(!cpu_online(cpu));

/* Now call notifier in preparation. */
    cpu_notify(CPU_ONLINE | mod, hcpu);  通知cpu_chanin中的处理器,目前online动作的处理器的状态为CPU_ONLINE。

out_notify:
    if (ret != 0)
        __cpu_notify(CPU_UP_CANCELED | mod, hcpu, nr_calls, NULL);
out:
    cpu_hotplug_done();  释放cpu_hotplug.lock锁。
    trace_sched_cpu_hotplug(cpu, ret, 1);

return ret;
}

RCU synchronize

RCU即Read-Copy Update,是Linux内核比较成熟的新型读写锁,具有较高的读写并发性能,常常用在需要互斥的关键性能路径。

在Kernel中,有两种类型实现tiny和tree,tiny rcu更简洁,常用在小型嵌入式系统中;tree rcu被广泛用在了server、desktop、android中。

RCU的和心理链式读者访问的同时,写者可以更新访问对象的副本,但写者需要等待所有读者完成访问之后,才能删除老对象。这个过程实现的关键和难点在于如何判断所有的读者已经完成访问。通常把写者开始更新,到所有读者完成访问这段时间叫做宽限期(Grace Period)。内核中实现宽限期等待的函数是synchronize_rcu。

synchronize_rcu_mult同时在call_rcu()函数列表的宽限期上等待,知道所有的都结束。

总结:cpu_chain和rcu sync耗时大部是由外界因素决定的,如果cpu_chain或者call_rcu()列表很多,或者里面回调函数特别耗时,都会拉长CPU_OFF/CPU_ON时间。这部分的优化特别离散。

参考文档:

RCU synchronize原理分析 http://www.wowotech.net/kernel_synchronization/223.html

synchronize_rcu()函数详解 http://blog.chinaunix.net/uid-20648784-id-1592811.html

如何确定一个函数耗费时间?

在函数中添加以下ftrace,可以得到执行时的timestamp,进程名称,函数名和对应的行数。

trace_suspend_resume(TPS(__func__), __LINE__, true);

结果如下:

223.502950 |   1)    sh-2832     |               |                  /* suspend_resume: CPU_ON[4] begin */
223.502953 |   1)    sh-2832     |               |                  /* suspend_resume: _cpu_up[513] begin */
223.502957 |   1)    sh-2832     |               |                  /* suspend_resume: _cpu_up[516] begin */
223.502959 |   1)    sh-2832     |               |                  /* suspend_resume: _cpu_up[522] begin */
223.502969 |   1)    sh-2832     |               |                  /* suspend_resume: _cpu_up[529] begin */
223.502973 |   1)    sh-2832     |               |                  /* suspend_resume: _cpu_up[534] begin */
223.529988 |   1)    sh-2832     |               |                  /* suspend_resume: _cpu_up[544] begin */
223.530382 |   1)    sh-2832     |               |                  /* suspend_resume: _cpu_up[552] begin */
223.531451 |   1)    sh-2832     |               |                  /* suspend_resume: _cpu_up[559] begin */
223.531454 |   1)    sh-2832     |               |                  /* suspend_resume: _cpu_up[563] begin */
223.531456 |   1)    sh-2832     |               |                  /* suspend_resume: CPU_ON[4] end */

在Excel中打开,可以轻松算出时间差。可知Line 534到Line 544之前耗费了最多时间。

Suspend to RAM和Suspend to Idle分析,以及在HiKey上性能对比【转】

详情请参考:

cpu hotplug的流程 http://blog.****.net/u013686805/article/details/46942469

Linux CPU core的电源管理(5)_cpu control及cpu hotplug http://www.wowotech.net/pm_subsystem/cpu_hotplug.html

8 参考文档

Power Management Support in Hikey (suspend-resume):http://www.96boards.org/forums/topic/power-management-support-in-hikey-suspend-resume/#gsc.tab=0

Suspend to Idle:http://www.linaro.org/blog/suspend-to-idle/

Suspend and Resume:https://01.org/zh/suspendresume

SuspendAndResume github:https://github.com/arnoldlu/suspendresume

Linux电源管理(6)_Generic PM之Suspend功能:http://www.wowotech.net/pm_subsystem/suspend_and_resume.html

联系方式:arnoldlu@qq.com