For Loop for Reading a File in Shell Script

For and Read-While Loops in Bash

The loop is one of the most key and powerful constructs in computing, considering information technology allows usa to repeat a set of commands, every bit many times as we desire, upon a list of items of our choosing. Much of computational thinking involves taking one task and solving information technology in a way that can exist applied repeatedly to all other similar tasks, and the for loop is how we make the computer do that repetitive work:

                      for            item            in            $items            do            task            $item            washed                  

Unlike nearly of the lawmaking we've written and then far at the interactive prompt, a for-loop doesn't execute as before long as nosotros hit Enter:

          user@host:~$ for particular in $items                  

We tin write out as many commands as we want in the block between the practice and done keywords:

                      practise            command_1   command_2            # another for loop just for fun            for            a            in            $things;            do; command_3 a;            washed            command_4            done                  

Only until we reach done, and hit Enter, does the for-loop do its work.

This is fundamentally different than the line-by-line command-and-response we've experienced then far at the prompt. And it presages how we volition exist programming further on: less accent on executing commands with each line, and more emphasis on planning the functionality of a program, and and then executing it afterwards.

Basic syntax

The syntax for for loops can be disruptive, so hither are some basic examples to prep/refresh your comprehension of them:

                      for            animal            in            domestic dog cat            'fruit bat'            elephant ostrich            exercise                        echo            "I want a                        $creature                          for a pet"            done                  

Here'south a more elaborate version using variables:

                      for            matter            in            $collection_of_things            exercise            some_program            $thing            another_program            $thing            >> information.txt            # as many commands as we want            done                  

A control substitution can be used to generate the items that the for loop iterates beyond:

                      for            var_name            in            $(seq i 100);            do                        repeat            "Counting                        $var_name            ..."            washed                  

If yous need to read a list of lines from a file, and are absolutely sure that none of the lines incorporate a space within them:

                      for            url            in            $(cat list_of_urls.txt);            do            gyre            "            $url            "            >> everywebpage_combined.html            washed                  

A read-while loop is a variation of the above, but is safer for reading lines from a file:

                      while                        read            url            do            gyre            "            $url            "            >> everywebpage_combined.html            done            < list_of_urls.txt                  

Amalgam a basic for loop

Allow's start from a beginning, with a very minimal for loop, and so congenital it into something more elaborate, to assistance us become an agreement of their purpose.

The simplest loop

This is about as elementary every bit you tin can make a for loop:

                      user@host:~$                        for            x            in            1            >                        do            >                        echo            Hullo            >                        done            Hi                  

Did that seem pretty worthless? Yes information technology should have. I wrote four lines of code to practise what information technology takes a unmarried line to exercise, echo 'Hi'.

More elements in the drove

It's difficult to tell, merely a "loop" did execute. It just executed one time. OK, so how exercise we arrive execute more than one fourth dimension? Add more (space-separated) elements to the right of the in keyword. Permit's add four more 1's:

                      user@host:~$                        for            10            in            i ane ane 1            >                        do            >                        echo            Hi            >                        washed            Howdy Hi How-do-you-do How-do-you-do                  

OK, not very heady, just the plan definitely seemed to at to the lowest degree loop: four i'due south resulted in four repeat commands existence executed.

What happens when we supervene upon those four 1'due south with dissimilar numbers? And possibly a couple of words?

                      user@host:~$                        for            10            in            Q Zebra 999 Smithsonian            >                        do            >                        echo            Hi            >                        done            Hi Hi Hi Hello                  

And…zip. Then the loop doesn't automatically exercise annihilation specific to the drove of values we gave it. Non nonetheless anyway.

Refer to the loop variable

Permit's await to the left of the in keyword, and at that x. What's the indicate of that x? A lowercase x isn't the proper name of a keyword or command that we've encountered so far (and executing information technology lonely at the prompt will throw an error). Then maybe it's a variable? Permit'due south endeavour referencing it in the echo statement:

                      user@host:~$                        for            x            in            Q Zebra 999 Smithsonian            >                        do            >                        repeat            Hi            $x            >                        washed            Hullo Q Hello Zebra Hello 999 How-do-you-do Smithsonian                  

Bingo. This is pretty much the fundamental workings of a for loop: - Get a collection of items/values (Q Zebra 999 Smithsonian) - Laissez passer them into a for loop construct - Using the loop variable (x) equally a placeholder, write commands between the do/done block. - When the loop executes, the loop variable, x, takes the value of each of the items in the listing – Q, Zebra, 999, Smithsonian, – and the block of commands between do and done is so executed. This sequence repeats once for every item in the list.

The do/done cake can contain any sequence of commands, fifty-fifty another for-loop:

                      user@host:~$                        for            ten            in            Q Zebra 999 Smithsonian            >                        do            >                        echo            Hello            >                        done            Hello Q Hello Zebra Hi 999 Hi Smithsonian                  
                      user@host:~$                        for            x            in            $(seq 1 three);            do            >                        for            y            in            A B C;            do            >                        echo            "            $ten            :            $y            "            >                        washed            >                        done            i:A 1:B 1:C 2:A 2:B ii:C iii:A 3:B 3:C                  

Loops-within-loops is a common construct in programming. For the most office, I'chiliad going to try to avert assigning problems that would involve this kind of logic, every bit it can be tricky to untwist during debugging.

Read a file, line-past-line, reliably with read-while

Because cat prints a file line-by-line, the following for loop seems sensible:

                      user@host:~$                        for            line            in            $(cat list-of-dirs.txt)            >                        practise            >                        echo            "            $line            "            >                        done                  

Yet, the command commutation will cause cat to split up words by infinite. If listing-of-dirs.txt contains the following:

          Apples Oranges Documents and Settings                  

The output of the for loop will be this:

          Apples Oranges Documents and Settings                  

A read-while loop will preserve the words within a line:

                      user@host:~$                        while                        read            line            practice                        echo            "            $line            "            done            < list-of-dirs.txt Apples Oranges Documents and Settings                  

We tin also piping from the result of a control by enclosing it in <( and ):

                      user@host:~$                        while                        read            line            do                        echo            "Word count per line:                        $line            "            done            < <(cat listing-of-dirs.txt | wc -w)            i 1 three                  

Pipes and loops

If you're coming from other languages, data streams may be unfamiliar to you. At to the lowest degree they are to me, equally the syntax for working with them is far more direct and straightforward in Bash than in Ruby or Python.

Nevertheless, if you're new to programming in any language, what might as well be unclear is how working with data streams is different than working with loops.

For example, the post-obit snippet:

                      user@host:~$                        echo            "hullo world i am hither"            |            \            >            tr            '[:lower:]'            '[:upper:]'            | tr            ' '            '\n'            How-do-you-do WORLD I AM Here                  

– produces the same output as this loop:

                      for            word            in            hello world i am hither;            practise                        repeat            $word            | tr            '[:lower:]'            '[:upper:]'            done                  

And depending on your mental model of things, information technology does seem that in both examples, each word, eastward.g. hello, globe, is passed through a process of translation (via tr) and then echoed.

Pipes and filters

Without getting into the fundamentals of the Unix arrangement, in which a pipage operates fundamentally different than a loop here, let me suggest a mental workaround:

Programs that pipe from stdin and stdout can usually be arranged as filters, in which a stream of information goes into a program, and comes out in a different format:

                      # send the stream through a opposite filter            user@host:~$                        repeat            "hi world i am here"            | rev ereh ma i dlrow olleh            # filter out the first 2 characters            user@host:~$                        echo            "how-do-you-do earth i am hither"            | cut -c three- llo world i am here            # filter out the spaces            user@host:~$                        echo            "hullo world i am here"            | tr -d            ' '            helloworldiamhere            # filter out words with less than four characters            user@host:~$                        repeat            "hi world i am here"            | grep -oE            '[a-z]{iv,}'            howdy earth here                  

For tasks that are more just transforming data, from filter to filter, recall about using a loop. What might such as a job be? Given a list of URLs, download each, and email the downloaded information, with a customized trunk and bailiwick:

                      user@host:~$                        while                        read            url;            do            # download the page            content            =            $(curlicue -Ls            $url            )            # count the words            num_of_words            =            $(            echo            $content            | wc -due west)            # extract the title            title            =            $(            repeat            $content            | grep -oP            '(?<=<championship>)[^<]+'            )            # transport an email with the page's championship and word count            repeat            "            $content            "            | mail whoever@stanford.edu -s            "            $championship            :                        $num_of_words                          words"            echo            "...Sending:                        $title            :                        $num_of_words                          words"            washed            < urls.txt                  

The data input source, each URL in urls.txt, isn't really beingness filtered here. Instead, a multi-stride task is being done for each URL.

Pipage into read-while

That said, a loop itself can exist implemented as only one more than filter among filters. Take this variation of the read-while loop, in which the consequence of echo | grep is piped, line past line, into the while loop, which prints to stdout using echo, which is redirected to the file named some.txt:

                      echo            'hey yous'            | grep -oE            '[a-z]+'            |            while                        read            line;            practise                        repeat            word | wc -c            done            >> sometxt                  

This is non a construct that yous may need to do often, if at all, simply hopefully it reinforces pipe usage in Unix.

Less interactive programming

The frequent use of for loops, and similar constructs, ways that nosotros're moving past the skilful ol' days of typing in i line of commands and having it execute right after we hit Enter. No affair how many commands we pack inside a for loop, goose egg happens until we hit the done keyword.

Write once. And then loop it

With that loss of line-by-line interaction with the shell, we lose the primary advantage of the interactive prompt: immediate feedback. And we notwithstanding have all the disadvantages: if we make a typo earlier in the block of commands between do and done, we have to starting time all over.

And then here's how we mitigate that:

Test your lawmaking, one case at a time

One of the biggest mistakes novices brand with for loops is they remember a for loop immediately solves their problem. And then, if what they take to practice is download 10,000 URLs, only they can't properly download just i URL, they think putting their flawed commands into a for loop is a step in the right direction.

Besides this existence a fundamentally misunderstanding of a for loop, the practical problem is that you are now running your cleaved lawmaking x,000 times, which means you have to look ten,000 times every bit long to discover out that your code is, alas, still broken.

And so pretend yous've never heard of for loops. Pretend you have to download all 10,000 URLs, i control a time. Tin you write the command to do it for the commencement URL. How about the 2nd? In one case you're reasonably confident that no small syntax errors are tripping you up, so it's time to think near how to find a general design for the nine,997 other URLs.

Write scripts

The interactive control-line is great. Information technology was fun to get-go out with, and information technology'll be fun throughout your computing career. But when yous have a big job in front end of yous, involving more ten lines of code, so it's time to put that code into a beat script. Don't trust your fallible man fingers to flawlessly retype lawmaking.

img

Utilise nano to work on loops and salve them as shell scripts. For longer files, I'll piece of work on my computer's text editor (Sublime Text) and then upload to the server.

Practise with web scraping

Merely to footing the syntax and workings of the for-loop, here's the thought process from turning a routine task into a loop:

For the numbers i through x, use curl to download the Wikipedia entry for each number, and save information technology to a file named "wiki-number-(any the number is).html"

The old fashioned way

With just 10 URLs, we could fix a couple of variables and so re-create-and-paste the a coil control, 10 times, making changes to each line:

                      user@host:~$            curl http://en.wikipedia.org/wiki/i >            'wiki-number-i.html'            user@host:~$            gyre http://en.wikipedia.org/wiki/2 >            'wiki-number-ii.html'            user@host:~$            roll http://en.wikipedia.org/wiki/three >            'wiki-number-3.html'            user@host:~$            curl http://en.wikipedia.org/wiki/4 >            'wiki-number-4.html'            user@host:~$            curl http://en.wikipedia.org/wiki/5 >            'wiki-number-5.html'            user@host:~$            scroll http://en.wikipedia.org/wiki/vi >            'wiki-number-6.html'            user@host:~$            curl http://en.wikipedia.org/wiki/7 >            'wiki-number-seven.html'            user@host:~$            curl http://en.wikipedia.org/wiki/viii >            'wiki-number-8.html'            user@host:~$            curl http://en.wikipedia.org/wiki/9 >            'wiki-number-9.html'            user@host:~$            roll http://en.wikipedia.org/wiki/x >            'wiki-number-ten.html'                  

And guess what? It works. For 10 URLs, it's not a bad solution, and it'south significantly faster than doing information technology the former erstwhile-fashioned way (doing it from your web browser)

Reducing repetition

Even without thinking near a loop, we tin can even so reduce repetition using variables: the base of operations URL, http://en.wikipedia.org/wiki/, and the base of operations-filename never change, and so let's assign those values to variables that can be reused:

                      user@host:~$                        base_url            =http://en.wikipedia.org/wiki            user@host:~$                        fname            =            'wiki-number'            user@host:~$            curl            "            $base_url            /1"            >            "            $fname            -ane"            user@host:~$            whorl            "            $base_url            /2"            >            "            $fname            -2"            user@host:~$            gyre            "            $base_url            /iii"            >            "            $fname            -three"            user@host:~$            curl            "            $base_url            /iv"            >            "            $fname            -four"            user@host:~$            coil            "            $base_url            /5"            >            "            $fname            -5"            user@host:~$            scroll            "            $base_url            /six"            >            "            $fname            -six"            user@host:~$            curl            "            $base_url            /vii"            >            "            $fname            -7"            user@host:~$            curl            "            $base_url            /viii"            >            "            $fname            -8"            user@host:~$            scroll            "            $base_url            /9"            >            "            $fname            -9"            user@host:~$            curlicue            "            $base_url            /10"            >            "            $fname            -ten"                  

Applying the for-loop

At this signal, we've simplified the pattern so far that we tin can run across how little changes with each dissever task. Later learning well-nigh the for-loop, we can utilize it without much thinking (we also add a slumber command so that we interruption between spider web requests)

                      user@host:~$                        base_url            =http://en.wikipedia.org/wiki            user@host:~$                        fname            =            'wiki-number'            user@host:~$                        for            x            in            1 ii 3 iv 5 vi 7 8 9 10            >                        do            >            curl            "            $base_url            /            $x            "            >            "            $fname            -            $x            "            >            sleep two            >                        done                  

Generating a list

In about situations, creating a for-loop is piece of cake; information technology'southward the creation of the listing that can be the hard piece of work. What if nosotros wanted to collect the pages for numbers 1 through 100? That's a lot of typing.

But if nosotros allow our laziness dictate our thinking, nosotros tin can imagine that counting from x to y seems like an inherently computational task. And it is, and Unix has the seq utility for this:

                      user@host:~$                        base_url            =http://en.wikipedia.org/wiki            user@host:~$                        fname            =            'wiki-number'            user@host:~$                        for            x            in            $(seq 1 100)            >                        do            >            curl            "            $base_url            /            $ten            "            >            "wiki-number-            $x            "            >            sleep 2            >                        done                  

Generating a list of non-numbers for iteration

Many repetitive tasks aren't as simple as counting from x to y, and so the problem becomes how to generate a non-linear list of items? This is basically what the art of data-collection and management. But let'south make a simple scenario for ourselves:

For ten of the 10-alphabetic character (or more than) words that appear at least once in a headline on the current NYTimes.com forepart page, fetch the Wiktionary page for that give-and-take

Nosotros break this task into two parts:

  1. Fetch a listing of 10 10+-letter words from nytimes.com headlines
  2. Pass those words to our for-loop

Step 1: Using the pup utility (or command-line HTML parser of your choice):

                      user@host:~$                        words            =            $(curlicue -s http://www.nytimes.com |            \            >            pup            'h2.story-heading text{}'            |            \            >            grep -oE            '[[:alpha:]]{10,}'            | sort |            \            >            uniq | head -n x)                  

Step two (assuming the words variable is being passed forth):

                      user@host:~$                        base_url            =            'https://en.wiktionary.org/wiki/'            user@host:~$                        fname            =            'wiktionary-'            user@host:~$                        for            discussion            in            $words            >                        practice            >                        repeat            $word            >            curl -sL            "            $base_url$give-and-take            "            >            "            $fname$word            .html"            >            sleep 2            >                        done                  

Check out Software Carpentry'south excellent guide to for-loops in Bash

perezmortund.blogspot.com

Source: http://www.compciv.org/topics/bash/loops

0 Response to "For Loop for Reading a File in Shell Script"

Postar um comentário

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel