Python Concurrent Image Downloader

Home » Programming Language » Python » Python Concurrent Image Downloader

One excellent example of the benefits of multithreading is, without a doubt, the use of multiple threads to download multiple images or files. This is, actually, one of the best use cases for multithreading due to the blocking nature of I/O.

We are going to retrieve 10 different images from https://picsum.photos/200/300, which is a free API that delivers a different image every time you hit that link. We’ll then store these 10 different images within a temp folder.

Sequential Download

First, we should have some form of a baseline against which we can measure the performance gains. To do this, we’ll write a quick program that will download these 10 images sequentially, as follows:

from traceback import print_tb
import urllib.request

def downloadImage(imgPath, fileName):
    print("Downloading Image from ", imgPath)
    urllib.request.urlretrieve(imgPath, fileName)
  
def main():
    url = "https://picsum.photos/200/300"
    for i in range(10):
        imgName = "temp/image-" + str(i) + ".jpg"
        downloadImage(url, imgName)
    print("Downloading finished")

if __name__ == '__main__':
    main()

In the preceding code, we begin by importing urllib.request. This will act as our medium for performing HTTP requests for the images that we want. We then define a new function called downloadImage, which takes in two parameters, imgPath and fileName. imgPath represents the URL image path that we wish to download. fileName represents the name of the file that we wish to use to save this image locally.

In the main function, we then start up a for loop. Within this for loop, we generate an imgName which includes the temp/ directory, a string representation of what iteration we currently at str(i) and the file extension .jpg. We then call the downloadImage function, passing the url variable, which provides us with a random image as well as our newly generated imgName.

Upon running this script, you should see your temp directory sequentially fill up with 10 distinct images.

$ sequentialImageDownloader.py
Downloading Image from  https://picsum.photos/200/300
Downloading Image from  https://picsum.photos/200/300
Downloading Image from  https://picsum.photos/200/300
Downloading Image from  https://picsum.photos/200/300
Downloading Image from  https://picsum.photos/200/300
Downloading Image from  https://picsum.photos/200/300
Downloading Image from  https://picsum.photos/200/300
Downloading Image from  https://picsum.photos/200/300
Downloading Image from  https://picsum.photos/200/300
Downloading Image from  https://picsum.photos/200/300
Downloading finished

Concurrent Download

Now that we have our baseline, it’s time to write a quick program that will concurrently download all the images that we require. We’ll be going over creating and starting threads. The key point of this is to realize the potential performance gains to be had by writing programs concurrently:

import threading
import urllib.request
import time

def downloadImage(imgPath, fileName):
    print("Downloading Image from ", imgPath)
    urllib.request.urlretrieve(imgPath, fileName)
    print("Completed Download")

def createThread(i,url):
    imgName = "temp/image-" + str(i) + ".jpg"
    downloadImage(url,imgName)
  
def main():
    url = "https://picsum.photos/200/300"
    t = time.time()
    # create an array which will store a reference to
    # all of our threads
    threads = []

    # create 10 threads, append them to our array of threads
    # and start them off  
    for i in range(10):
        thread = threading.Thread(target=createThread, args=(i,url,))
        threads.append(thread)
        thread.start()
    
    # ensure that all the threads in our array have completed
    # their execution before we log the total time to complete
    for i in threads:
        i.join()
    
    # calculate the total execution time
    t1 = time.time()
    totalTime = t - t1
    print("Total Execution Time {}".format(totalTime))

if __name__ == '__main__':
    main()

In the first line of our newly modified program, you should see that we are now importing the threading module. We then abstract our filename generation, call the downloadImage function into our own createThread function.

Within the main function, we first create an empty array of threads, and then iterate 10 times, creating a new thread object, appending this to our array of threads, and then starting that thread.

Finally, we iterate through our array of threads by calling for i in threads, and call the join method on each of these threads. This ensures that we do not proceed with the execution of our remaining code until all of our threads have finished downloading the image.

If you execute this on your machine, you should see that it almost instantaneously starts the download of the 10 different images. When the downloads finish, it again prints out that it has successfully completed, and you should see the temp folder being populated with these images.

$ concurrentImageDownloader.py
Downloading Image from  https://picsum.photos/200/300
Downloading Image from  https://picsum.photos/200/300
Downloading Image from  https://picsum.photos/200/300
Downloading Image from  https://picsum.photos/200/300
Downloading Image from  https://picsum.photos/200/300
Downloading Image from  https://picsum.photos/200/300
Downloading Image from  https://picsum.photos/200/300
Downloading Image from  https://picsum.photos/200/300
Downloading Image from  https://picsum.photos/200/300
Downloading Image from  https://picsum.photos/200/300
Completed Download
Completed Download
Completed Download
Completed Download
Completed Download
Completed Download
Completed Download
Completed Download
Completed Download
Completed Download
Total Execution Time -1.1606624126434326

Both the preceding scripts do exactly the same tasks using the exact same urllib.request library, but if you take a look at the total execution time, then you should see an order of magnitude improvement on the time taken for concurrent script to fetch all 10 images.

your comments are appreciated and if you wants to see your articles on this platform then please shoot a mail at this address kusingh@programmingeeksclub.com

Thanks for reading 🙂

Sequential Download

Concurrent Download

Join Our Newsletter!

Join Our Newsletter!

3 thoughts on “Python Concurrent Image Downloader”

Leave a Comment Cancel Reply