extracting fields in shell


Thanks for visiting! If you're new here, you may want to subscribe to my RSS feed. This blog posts regular information about web development, unix/linux, How-tos and patches. Go ahead, subscribe to my feed! You can also receive updates via email, instant messenger, skype or tweeter.

A lot of shell scripts require processing some kind of data structured in fields or columns separated by special characters ( space, coma, semi colon, etc... )

This is a short tutorial that shows you how you can extract the fields in a stream of data. There are several ways of doing this and each has it's advantages of disadvantages.

Here is what I use:

  1. Using cut

    The 'cut' program will allow you to extract the fields separated by one character. you can specify which field to extract, and what is the field separator.
    Example: echo "a:b:c" | cut -f2 -d':' will output b
    The cut program has the advantage that it is simple to use, almost ( all ) Unix flavors have it included in the base distribution and is relatively lightweight ( ~33Kb with no library dependency other then libc on my gentoo Linux )
    The problem with cut is that the field separator can only be a single character.

  2. Using awk

    awk is a pattern scanning and processing language somehow similar perl. Actually it is believed that perl was inspired by languages like awk, perl, C, and some others. Awk is a lot more flexible then cur and can do a lot more. You can actually specify a regular expression for the field separator.
    Here is an example for extracting the fields separated by one or more spaces:
    echo "a b c"|awk '{print $2}' - this will print the second field. As you can see I have not specified any separator because awk uses <space> as the default separator. <space> means any number of spaces here.
    You can specify a different field separator by using the -F parameter.

  3. Using a shell function

    this may be the simplest and fastest solution but will only work if the field separator is composed of spaces or tabs only. As you may know the parameters are passed to a shell function separated by spaces. so you can just make a function that has the sole purpose of returning the field ( parameter ) you want.
    If I want to get the third field from a line I would do a function like this

     
    function getfield()
    {
    echo $3
    }

    getfield a b ccc ddd would display 'ccc' . This is more useful in a script where you need to get a field value from a variable containing some text but not so mush with whole files.

Do you know any other/better method ? Feel free to share them in the comments

  • Digg
  • Reddit
  • del.icio.us
  • Slashdot
  • Spurl
  • StumbleUpon
  • Furl
  • description
  • Netscape
  • NewsVine
  • Technorati
  • YahooMyWeb
  • Simpy
If you enjoyed this post, you should subscribe to my full RSS Feeds

close Reblog this comment
blog comments powered by Disqus

Creative Commons License
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 License

Technology Blogs - Blog Top Sites Search For Blogs, Submit Blogs, The Ultimate Blog Directory Blogarama - The Blog Directory 5starsblog Computers Blogs - Blog Flare blog search directory gob BlogHop