Friday, May 23, 2014

ftrace - The kernel function tracer - 2


In this post, we discuss about function and function_graph scenario of ftrace usage.
To find out which tracers are available, simply cat the available_tracers file in the tracing directory:
# cat available_tracers 
blk function_graph mmiotrace wakeup_rt wakeup function nop
By default, no tracer will be set.
# cat current_tracer 
nop
Let set current tracer as function
# echo function > current_tracer 
# cat current_tracer 
function
Enable the tracing
# echo 1 > tracing_on
Viewing trace log
# cat trace | head -50
# tracer: function
#
#           TASK-PID    CPU#    TIMESTAMP  FUNCTION
#              | |       |          |         |
        metacity-1920  [000]  5127.451220: __pollwait <-unix_poll
        metacity-1920  [000]  5127.451220: add_wait_queue <-__pollwait
        metacity-1920  [000]  5127.451220: _raw_spin_lock_irqsave <-add_wait_queue
        metacity-1920  [000]  5127.451221: _raw_spin_unlock_irqrestore <-add_wait_queue
        metacity-1920  [000]  5127.451221: fput <-do_poll.isra.5
        metacity-1920  [000]  5127.451221: poll_freewait <-do_sys_poll
        metacity-1920  [000]  5127.451221: remove_wait_queue <-poll_freewait
        metacity-1920  [000]  5127.451222: _raw_spin_lock_irqsave <-remove_wait_queue
        metacity-1920  [000]  5127.451222: _raw_spin_unlock_irqrestore <-remove_wait_queue
        metacity-1920  [000]  5127.451222: fput <-poll_freewait
        metacity-1920  [000]  5127.451223: sys_writev <-syscall_call
        metacity-1920  [000]  5127.451223: fget_light <-sys_writev
        metacity-1920  [000]  5127.451223: vfs_writev <-sys_writev
        metacity-1920  [000]  5127.451224: do_readv_writev <-vfs_writev
        metacity-1920  [000]  5127.451224: rw_copy_check_uvector <-do_readv_writev
In the above example, sample call sequence.
unix_poll --> __pollwait --> add_wait_queue --> _raw_spin_lock_irqsave 
                                                _raw_spin_unlock_irqrestore 
Do you have trouble in finding calling sequence??? set function_graph as current tracer.
# echo function_graph > current_tracer
Function graph with timing information.
# cat trace | head -50
# tracer: function_graph
#
# CPU  DURATION                  FUNCTION CALLS
# |     |   |                     |   |   |   |
 2)   0.104 us    |            _raw_spin_lock_irqsave();
 2)   0.104 us    |            _raw_spin_unlock_irqrestore();
 2)               |            ep_poll_readyevents_proc() {
 2)               |              ep_scan_ready_list.isra.8() {
 2)               |                mutex_lock() {
 2)   0.065 us    |                  _cond_resched();
 2)   0.602 us    |                }
 2)   0.070 us    |                _raw_spin_lock_irqsave();
 2)   0.107 us    |                _raw_spin_unlock_irqrestore();
 2)   0.068 us    |                ep_read_events_proc();
 2)   0.070 us    |                _raw_spin_lock_irqsave();
 2)   0.106 us    |                _raw_spin_unlock_irqrestore();
 2)   0.068 us    |                mutex_unlock();
 2)   4.642 us    |              }
 2)   5.244 us    |            }
Now I'm very much interested in listing kernel functions used by ls command. There are four commands executed in single line.
# echo > trace && echo 1 > tracing_on && ls && echo 0 > tracing_on
  ------------    -------------------    --    -------------------
    |                   |                              | 
    |                   |                              |
    |                   V                              V
    |              Enable Tracing               Disable tracing after "ls"  
    V                 
 Clear trace Log 
Lets see the log.
# vim trace
# tracer: function_graph
#
# CPU  DURATION                  FUNCTION CALLS
# |     |   |                     |   |   |   |
 1)   0.268 us    |  __fsnotify_parent();
 1)               |  fsnotify() {
 1)   0.088 us    |    __srcu_read_lock();
 1)   0.064 us    |    __srcu_read_unlock();
 1)   1.354 us    |  }
 .....
 .....
 1)               |            mem_cgroup_charge_common() {
 1)   0.066 us    |              lookup_page_cgroup();
 ------------------------------------------
 2)    <idle>-0    =>    ls-2556                          <-- Here starts ls functions 
 ------------------------------------------

 2)               |  schedule_tail() {
 1)   0.101 us    |              __mem_cgroup_try_charge();
 2)   0.252 us    |    finish_task_switch();
 1)               |              __mem_cgroup_commit_charge() {
 2)   0.081 us    |    _cond_resched();
 .....
 .....
 1)   0.065 us    |            native_pte_clear();
 2)   0.109 us    |                native_load_sp0();
 1)   0.447 us    |            native_flush_tlb_single();
 2)   0.089 us    |                native_load_tls();
 ------------------------------------------
 2)    ls-2556     =>    <idle>-0                 <-- Context switch to idle task
 ------------------------------------------    
 2)   0.187 us    |      finish_task_switch();
 1)   0.359 us    |    fget_light();
 ....
 .... 
 ------------------------------------------
 2)    <idle>-0    =>    ls-2556               <-- Context switch to "ls"
 ------------------------------------------

 2)   0.188 us    |                finish_task_switch();
 1)               |    security_file_ioctl() {
 1)   0.066 us    |      cap_file_ioctl();
 ....
 ....
 0)               |        __copy_from_user_ll() {
 0)   0.059 us    |          __copy_from_user_ll.part.1();
 0)   0.399 us    |        }
 0)   1.050 us    |      }
It is easier to understand the implementation using it's code flow. The timing information is very useful for optimizing our code.

Thursday, May 22, 2014

Platform Device Driver - a practical approach - 1


Platform devices are inherently not discover-able, i.e. the hardware cannot say "Hey! I'm present!" to the software. For example PCI and USB are self discover-able, but I2C is not.
In the embedded and system-on-chip world, non-discoverable devices are, if anything, increasing in number. So the kernel still needs to provide ways to be told about the hardware that is actually present. Platform devices have long been used in this role in the kernel.
There are two important components here.
  1. Platform Driver
  2. Platform Device
Platform Driver - Set of operation done on device
Linux kernel defines set of standard operations which will be performed on a platform device.
Refer http://lxr.free-electrons.com/source/include/linux/platform_device.h#L173
At a minimum, the probe() and remove() callbacks must be supplied; the other callbacks have to do with power management and should be provided if they are relevant.
static int sample_drv_probe(struct platform_device *pdev)
{
   //Empty Probe function.
}
static int sample_drv_remove(struct platform_device *pdev)
{
  //Empty remove function.
}

static struct platform_driver sample_pldriver = {
    .probe          = sample_drv_probe,
    .remove         = sample_drv_remove,
    .driver = {
            .name  = DRIVER_NAME,
    },
};
In the above code just make a note of DRIVER_NAME. We discuss bit later.
So now a platform driver with two operations(probe and remove) is ready. This driver should be register with kernel.
NOTE:: I guess, you know that every loadable device driver is basically kernel module. For making our code complete, moving platform driver in to helloworld kernel module.
#include <linux/module.h>
#include <linux/kernel.h>

MODULE_LICENSE("GPL");

int ourinitmodule(void)
{
    printk(KERN_ALERT "\n Welcome to sample Platform driver.... \n");
    return 0;
}

void ourcleanupmodule(void)
{
    printk(KERN_ALERT "\n Thanks....Exiting sample Platform driver... \n");
    return;
}

module_init(ourinitmodule);
module_exit(ourcleanupmodule);
Now platform driver related stuff in kernel module.
#include <linux/module.h>
#include <linux/kernel.h>

//for platform drivers....
#include <linux/platform_device.h>

#define DRIVER_NAME "Sample_Pldrv"

MODULE_LICENSE("GPL");

/**************/ 
static int sample_drv_probe(struct platform_device *pdev){
}
static int sample_drv_remove(struct platform_device *pdev){
}

static struct platform_driver sample_pldriver = {
    .probe          = sample_drv_probe,
    .remove         = sample_drv_remove,
    .driver = {
            .name  = DRIVER_NAME,
    },
};
/**************/  

int ourinitmodule(void)
{
    printk(KERN_ALERT "\n Welcome to sample Platform driver.... \n");

    /* Registering with Kernel */
    platform_driver_register(&sample_pldriver);

    return 0;
}

void ourcleanupmodule(void)
{
    printk(KERN_ALERT "\n Thanks....Exiting sample Platform driver... \n");

    /* Unregistering from Kernel */
    platform_driver_unregister(&sample_pldriver);

    return;
}

module_init(ourinitmodule);
module_exit(ourcleanupmodule);
Now driver named "Sample_Pldrv" is ready with two empty functions.
Platform Device - Information about device
Kernel knows about device's information like IRQ number, memory locations, etc by registering platform device. To operate on this device, we early wrote platform driver right ?? If you want to bind the platform device to a driver, then device must be registered with same name which driver is registered. In our case "Sample_Pldrv".
/* Specifying my resources information */
static struct resource sample_resources[] = {
        {
                .start          = RESOURCE1_START_ADDRESS,
                .end            = RESOURCE1_END_ADDRESS,
                .flags          = IORESOURCE_MEM,
        },
        {
                .start          = RESOURCE2_START_ADDRESS,
                .end            = RESOURCE2_END_ADDRESS,
                .flags          = IORESOURCE_MEM,
        },
    {
                .start          = SAMPLE_DEV_IRQNUM,
                .end            = SAMPLE_DEV_IRQNUM,
                .flags          = IORESOURCE_IRQ,
        }

    };    

static struct platform_device sample_device = {
        .name           = DRIVER_NAME,
        .id             = -1,
        .num_resources  = ARRAY_SIZE(sample_resources),
        .resource       = sample_resources,
};
Here, there are two memory related information and one IRQ information is mentioned. Now you got the role of DRIVER_NAME ??
Either device and driver can be in single module or as separate module. It's your choice. But probe() will be called when both device and driver is available.
Say you loaded only driver. when the device with same name is loaded, the probe() will be called.
Refer https://github.com/jeyaramvrp/kernel-module-programming/tree/master/sample-platform-driver for simple platform driver and device code written as separate kernel module.

Tuesday, May 20, 2014

ftrace - The kernel function tracer - 1



1. Introduction
  • Ftrace is a tracing utility built directly into the Linux kernel.
  • Designed to help out developers and designers of systems to find what is going on inside the kernel.
  • Below mentioned output is executed in ubuntu 12.04 virtual machine.
2. Build kernel with the following configuration(Ubuntu already Configured)
CONFIG_FUNCTION_GRAPH_TRACER
CONFIG_STACK_TRACER
CONFIG_DYNAMIC_FTRACE
CONFIG_FUNCTION_TRACER
3. Mount the debugfs
#mount -t debugfs nodev /sys/kernel/debug
#mount |grep debugfs
5:none on /sys/kernel/debug type debugfs (rw)
4. Enter in to root shell
$ sudo  -i 
You will be asked for your password, then be given a root shell. In that shell, you can cd to /sys/kernel/debug.
5. Sample Execution

# pwd
/sys/kernel/debug/tracing

# cat available_tracers 
blk function_graph mmiotrace wakeup_rt wakeup function nop

# echo function_graph > current_tracer

# cat current_tracer 
function_graph

# cat trace | head -20
# tracer: function_graph
#
# CPU  DURATION                  FUNCTION CALLS
# |     |   |                     |   |   |   |
 0) ! 164785.8 us |      } /* native_safe_halt */
 0) ! 164786.9 us |    } /* default_idle.part.4 */
 0) ! 164787.6 us |  } /* default_idle */
 0)   0.062 us    |  local_touch_nmi();
 0)               |  cpuidle_idle_call() {
 0)   0.094 us    |    cpuidle_get_driver();
 0)   0.690 us    |  }
 0)               |  default_idle() {
 0)               |    default_idle.part.4() {
 0)               |      native_safe_halt() {
 0)   ==========> |
 0)               |        do_IRQ() {
 0)               |          irq_enter() {
 0)               |            rcu_irq_enter() {
 0)   0.231 us    |              rcu_exit_nohz();
 0)   0.995 us    |            }