Home Downloading data using curl
Post
Cancel

Downloading data using curl

  • What is curl?

    curl:

    • is short for Client for URLs
    • is a Unix command line tool
    • transfer data to and from a server
    • is used to download data from HTTP(S) sites and FTP servers
  • Checking curl installation

    Check curl installation:

    1
    
    man curl
    

    If curl has not been installed, you will see:

    curl command not found.

  • Learning curl Syntax

    Basic curl syntax:

    1
    
    curl [option flags] [URL]
    

    URL is required. curl also supports HTTP, HTTPS, FTP, and SFTP. For a full list of the options available:

    1
    
    curl --help
    
  • Downloading a Single File

    Example: A single file is stored at:

    https://websitename.com/datafilename.txt

    Use the optional flag -0 to save the file with its original name:

    curl -0 https://websitename.com/datafilename.txt

    To rename the file, use the lower case -o + new filename:

    curl -o renamedatafilename.txt https://websitename.com/datafilename.txt

  • Downloading Multiple Files using Wildcards

    Oftentimes, a server will host multiple data files, with similar filenames:

    1
    2
    3
    4
    5
    6
    
          https://websitename.com/datafilename001.txt
          https://websitename.com/datafilename002.txt
          .
          .
          .
          https://websitename.com/datafilename100.txt
    

    Using Wildcards(*)

    Download every file hosted on https://websitename.com/ that starts with datafilename and end in .txt:

    1
    
          curl -O https://websitename.com/datafilename*.txt
    
  • Downloading Multiple Files using Globbing Parset

    Continuing with the previous example:

    1
    2
    3
    4
    
          https://websitename.com/datafilename001.txt
          https://websitename.com/datafilename002.txt
          ...
          https://websitename.com/datafilename100.txt
    

    Using Globbing Parser

    The following will download every file sequentially starting with datafilename001.txt and ending with datafilename100.txt.

    1
    
          curl  -O https://websitename.com/datafilename[001-100].txt
    

    Increment through the files and download every Nth file (e.g.datafilename001.txt, datafilename020.txt,…, datafilename100.txt)

    1
    
          curl -0 https://websitename.com/datafilename[001-100:10].txt
    
  • Preemptive Troubleshooting

    curl has two particularly useful option flags in case of timeouts during download:

    • -L : Redirects the HTTP URL if a 300 error code occurs.
    • -C : Resumes a previous file transfer if it times out before completion.

    Putting everythin together:

    1
    
          curl -L -O -C https://websitename.com/datafilename[001-100].txt
    
    • All option flags come before the URL
    • Order of the flags does not matter. (e.g: -L -C -O is fine)

Downloading data using Wget

  • What is Wget?

    Wget:

    • Derives its name from World Wide Web and get
    • Native to Linux but compatible for all operating systems
    • used to download data from HTTP(s) and FTP
    • better than curl at downloading multiple files recursively
  • Learning Wget Syntax

    Basic Wget syntax:

    1
    
          wget [option flags][URL]
    

    URL is required Wget also supports HTTP, HTTPS, FTP, and SFTP. For a full list of the option flags available, see:

    1
    
          wget -- help
    
  • Downloading a Single File

    Option flags unique to Wget: -b: Go to background immediately after startup -q: Turn off the Wget output -c: Resume broken download (i.e continue getting a partially-downloaded file)

    1
    
          wget -bqc https://websitename.com/datafilename.txt
    

Advanced downloading using Wget

  • Multiple file downloading with Wget

    Save a list of file locations in a text file.

    1
    2
    3
    4
    5
    6
    
          cat url_list.txt
    
          Returns:
              https://websitename.com/datafilename001.txt
              https://websitename.com/datafilename002.txt
              ...
    

    Download from the URL locations stored within the file url_list.txt using -i.

    1
    
          wget -i url_list.txt
    
  • Setting download constraints for large files

    Set upper download bandwidth limit (by defaul in bytes per with second) with --limit-rate. Syntax:

    1
    
          wget --limit-rate={rate}k {file_location}
    

    Example:

    1
    
          wget --limit-rate=200k -i url_list.txt
    
  • Curl versus Wget

    curl advantages:

    • Can be used for downloading and uploading files from 20+ protocols.
    • Easier to install across all operating systems. Wget advantages:
    • Has many built-in functionalities for handling multiple file downloads.
    • Can handle various file formats for download (e.g. file directory, HTML page)
This post is licensed under CC BY 4.0 by the author.