首页 | 新闻 | 新品 | 文库 | 方案 | 视频 | 下载 | 商城 | 开发板 | 数据中心 | 座谈新版 | 培训 | 工具 | 博客 | 论坛 | 百科 | GEC | 活动 | 主题月 | 电子展
返回列表 回复 发帖

BUG: scheduling while atomic 分析

BUG: scheduling while atomic 分析

遇到一个典型的schedule问题。





  • <3>[26578.636839] C1 [      swapper/1] BUG: scheduling while atomic: swapper/1/0/0x00000002
  • <6>[26578.636869] C0 [    kworker/u:1] CPU1 is up  
  • <4>[26578.636900] C1 [      swapper/1] Modules linked in: bcm15500_i2c_ts  
  • <4>[26578.636961] C1 [      swapper/1] [<c00146d0>] (unwind_backtrace+0x0/0x11c) from [<c0602684>] (__schedule+0x70/0x6e0)  
  • <4>[26578.636991] C1 [      swapper/1] [<c0602684>] (__schedule+0x70/0x6e0) from [<c06030ec>] (schedule_preempt_disabled+0x14/0x20)  
  • <4>[26578.637052] C1 [      swapper/1] [<c06030ec>] (schedule_preempt_disabled+0x14/0x20) from [<c000f05c>] (cpu_idle+0xf0/0x104)  
  • <4>[26578.637083] C1 [      swapper/1] [<c000f05c>] (cpu_idle+0xf0/0x104) from [<c05e98e0>] (cpu_die+0x2c/0x5c)  
  • <3>[26578.637510] C1 [      swapper/1] BUG: scheduling while atomic: swapper/1/0/0x00000002
  • <4>[26578.637510] C1 [      swapper/1] Modules linked in: bcm15500_i2c_ts  
  • <4>[26578.637602] C1 [      swapper/1] [<c00146d0>] (unwind_backtrace+0x0/0x11c) from [<c0602684>] (__schedule+0x70/0x6e0)  
  • <4>[26578.637663] C1 [      swapper/1] [<c0602684>] (__schedule+0x70/0x6e0) from [<c06030ec>] (schedule_preempt_disabled+0x14/0x20)  
  • <4>[26578.637724] C1 [      swapper/1] [<c06030ec>] (schedule_preempt_disabled+0x14/0x20) from [<c000f05c>] (cpu_idle+0xf0/0x104)  
  • <4>[26578.637754] C1 [      swapper/1] [<c000f05c>] (cpu_idle+0xf0/0x104) from [<c05e98e0>] (cpu_die+0x2c/0x5c)  
  • <3>[26578.648069] C1 [      swapper/1] BUG: scheduling while atomic: swapper/1/0/0x00000002


查看源代码



  • /*
  • * __schedule() is the main scheduler function.
  • */
  • static
    void __sched __schedule(void)  
  • {  
  •     struct task_struct *prev, *next;  
  •     unsigned long *switch_count;  
  •     struct rq *rq;  
  •     int cpu;  

  • need_resched:  
  •     preempt_disable();  
  •     cpu = smp_processor_id();  
  •     rq = cpu_rq(cpu);  
  •     rcu_note_context_switch(cpu);  
  •     prev = rq->curr;  

  •     schedule_debug(prev);  
  •     ....  
  • }  






  • /*
  • * Print scheduling while atomic bug:
  • */
  • static
    noinline
    void __schedule_bug(struct task_struct *prev)  
  • {  
  •     if (oops_in_progress)  
  •         return;  

  •     printk(KERN_ERR "BUG: scheduling while atomic: %s/%d/0x%08x\n",  
  •         prev->comm, prev->pid, preempt_count());  

  •     debug_show_held_locks(prev);  
  •     print_modules();  
  •     if (irqs_disabled())  
  •         print_irqtrace_events(prev);  

  •     dump_stack();  
  • }  

  • /*
  • * Various schedule()-time debugging checks and statistics:
  • */
  • static
    inline
    void schedule_debug(struct task_struct *prev)  
  • {  
  •     /*
  •      * Test if we are atomic. Since do_exit() needs to call into
  •      * schedule() atomically, we ignore that path for now.
  •      * Otherwise, whine if we are scheduling when we should not be.
  •      */
  •     if (unlikely(in_atomic_preempt_off() && !prev->exit_state))  
  •         __schedule_bug(prev);  
  •     rcu_sleep_check();  

  •     profile_hit(SCHED_PROFILING, __builtin_return_address(0));  

  •     schedstat_inc(this_rq(), sched_count);  
  • }  


可以看出, 满足如下条件将会打印该出错信息



  • unlikely(in_atomic_preempt_off() && !prev->exit_state  

为0表示TASK_RUNNING状态,当前进程在运行; 并且处于原子状态,,那么就不能切换给其它的进程





  • <code class="cpp plain">Linux/include/linux/sched.h </code>  

  • /*
  • * Task state bitmask. NOTE! These bits are also
  • * encoded in fs/proc/array.c: get_task_state().
  • *
  • * We have two separate sets of flags: task->state
  • * is about runnability, while task->exit_state are
  • * about the task exiting. Confusing, but this way
  • * modifying one set can't modify the other one by
  • * mistake.
  • */
  • #define TASK_RUNNING        0
  • #define TASK_INTERRUPTIBLE  1
  • #define TASK_UNINTERRUPTIBLE    2
  • #define __TASK_STOPPED      4
  • #define __TASK_TRACED       8
  • /* in tsk->exit_state */
  • #define EXIT_ZOMBIE     16
  • #define EXIT_DEAD       32
  • /* in tsk->state again */
  • #define TASK_DEAD       64
  • #define TASK_WAKEKILL       128
  • #define TASK_WAKING     256
  • #define TASK_STATE_MAX      512






  • kernel/include/linux/hardirq.h  

  • #if defined(CONFIG_PREEMPT_COUNT)
  • # define PREEMPT_CHECK_OFFSET 1
  • #else
  • # define PREEMPT_CHECK_OFFSET 0
  • #endif

  • /*
  • * Are we running in atomic context?  WARNING: this macro cannot
  • * always detect atomic context; in particular, it cannot know about
  • * held spinlocks in non-preemptible kernels.  Thus it should not be
  • * used in the general case to determine whether sleeping is possible.
  • * Do not use in_atomic() in driver code.
  • */
  • #define in_atomic() ((preempt_count() & ~PREEMPT_ACTIVE) != 0)

  • /*
  • * Check whether we were atomic before we did preempt_disable():
  • * (used by the scheduler, *after* releasing the kernel lock)
  • */
  • #define in_atomic_preempt_off() \
  •         ((preempt_count() & ~PREEMPT_ACTIVE) != PREEMPT_CHECK_OFFSET)  


结论整理

linux内核打印"BUG: scheduling while atomic"和"bad: scheduling from the idle thread"错误的时候,
通常是在中断处理函数中调用了可以休眠的函数,如semaphore,mutex,sleep之类的可休眠的函数,
而linux内核要求在中断处理的时候,不允许系统调度,不允许抢占,要等到中断处理完成才能做其他事情。
因此,要充分考虑中断处理的时间,一定不能太久。

另外一个能产生此问题的是在idle进程里面,做了不该做的事情。现在Linux用于很多手持式设备,为了降低功耗,
通常的作法是在idle进程里面降低CPU或RAM的频率、关闭一些设备等等。要保证这些动作的原子性才能确保
不发生"bad: scheduling from the idle thread"这样的错误!


禁止内核抢占是指内核不会主动的抢占你的process,但是现在是你在自己的程序中主动call schedule(),
kernel并不能阻止你这么作。

Scheduling while atomic" means that a thread has called schedule() during an operation which is supposed to be atomic (ie uninterrupted).





  • 190 NOTE: ***** WARNING *****  
  • 191 NEVER SLEEP IN A COMPLETION HANDLER. These are normally called  
  • 192 during hardware interrupt processing. If you can, defer substantial  
  • 193 work to a tasklet (bottom half) to keep system latencies low. You'll  
  • 194 probably need to use spinlocks to protect data structures you manipulate  
  • 195 in completion handlers.   






  • GFP_ATOMIC is used when   
  • (a) you are inside a completion handler, an interrupt, bottom half, tasklet or timer, or   
  • (b) you are holding a spinlock or rwlock (does not apply to semaphores), or   
  • (c) current->state != TASK_RUNNING, this is the case only after you've changed it.   
返回列表