Python zipdl (download a single file from a ZIP via HTTP) (last update: 2013-05-13, created: 2013-05-13) back to the list ↑
In short: a python script that lets you download a single file from a ZIP archive placed on a HTTP-server that supports sending partial content. See the comment in the code for more info :)

Please note there is totally no error checking.
Also, on Python 2.5 and older this code has a path traversal while saving the target file, so it's best to use it only on trusted archives (or with Python 2.6+).

You can set DEBUG to True to see how many packets actually flow. It seems to be a sane amount - normally one packet per header + a packet(s?) for the variable fields, and another one for data (so it was 3 reads to list the file in the example blow, and another 3 to get the data).
I guess this could be optimized in zipfile a little more (e.g. reading all the central directory entries could be read in one shot, since the size of the central directory is in the end-of-central-directory record; also, the file data could be read with the file name / extra headers), but this makes only sense if each "read" is really slow.
Same goes for my code - there is no need to disconnect/reconnect each time.

Well, but this was supposed to be a small experiment anyway :)

Some output / example of usage:

14:44:47 gynvael> python
File Name                                             Modified             Size
readme_EndFirst.txt                            2013-05-13 14:30:34          231
14:44:55 gynvael> python readme_EndFirst.txt
14:45:39 gynvael> ls -la readme_EndFirst.txt
-rw-r----- 1 gynvael gynvael 231 May 13 14:45 readme_EndFirst.txt
14:45:42 gynvael> 

And the code itself:


# A small PoC of making a HTTP-backed file-like object. In this case it's
# used by the zipfile library, so you can basically list all the files in 
# a ZIP archive that's placed on a server that supports partial downloads.
# You can also download just a single specific file from that archive.
# This might be useful for huge archives where you need only a couple of
# smaller files :)

# Consider this public domian, no magic is here.
# Initially written by gynvael.coldwind//vx (2013)

import zipfile
import os
import sys
import httplib
import urlparse


def HTTPGetFileSize(url):
  u = urlparse.urlsplit(url)
  conn = httplib.HTTPConnection(u.netloc)

  path = u.path
  if len(u.query) > 0:
    path += "?" + u.query

  conn.request("HEAD", path)
  res = conn.getresponse()

  if res.status != 200:
    print res
    return False

  data = res.getheader("Content-Length")
  return int(data)

def HTTPGetPartialData(url, f, t):
  u = urlparse.urlsplit(url)
  conn = httplib.HTTPConnection(u.netloc)

  path = u.path
  if len(u.query) > 0:
    path += "?" + u.query

  conn.request("GET", path, "", {
    "Range": "bytes=%u-%u" % (f, t)
  res = conn.getresponse()

  if res.status not in [200, 206]:
    print res.status, res.reason
    return False

  data =

  return data

class MyFileWrapper:
  def __init__(self, url):
    self.url = url
    self.position = 0
    self.total_size = HTTPGetFileSize(url)

    if self.total_size == False:
      raise Exception("file not found or sth like that")

  def seek(self, offset, whence):

    if whence == 0:
      self.position = offset
    elif whence == 1:
      self.position += offset
    elif whence == 2:
      self.position = self.total_size + offset

    if DEBUG==True:
      print "seek: (%u) %u -> %u" % (whence, offset, self.position)

  def tell(self):
    if DEBUG==True:    
      print "tell: -> %u" % self.position
    return self.position

  def read(self, amount=-1):

    if amount == -1:
      amount = self.total_size - self.position

    d = HTTPGetPartialData(self.url, self.position, self.position + amount - 1)
    self.position += len(d)

    if DEBUG==True:
      print "read: %u %u -> %u" % (self.position - len(d), amount, self.position)

    return d

# Let's start the code.
if len(sys.argv) not in [2, 3]:
  print "usage: <URL-to-zip> [<filename-to-extract>]"

f = MyFileWrapper(sys.argv[1])
z = zipfile.ZipFile(f, "r")

if len(sys.argv) == 2:
  # Note, running this on Python 2.5 is shooting urself in the foot
  # since there are no anti-path-traversal measures in <2.6.  
【 design & art by Xa / Gynvael Coldwind 】 【 logo font (birdman regular) by utopiafonts / Dale Harris 】