Notes - gynvael.coldwind//vx

Python zipdl (download a single file from a ZIP via HTTP) (last update: 2013-05-13, created: 2013-05-13) back to the list ↑

In short: a python script that lets you download a single file from a ZIP archive placed on a HTTP-server that supports sending partial content. See the comment in the code for more info :)

Please note there is totally no error checking.
Also, on Python 2.5 and older this code has a path traversal while saving the target file, so it's best to use it only on trusted archives (or with Python 2.6+).

You can set DEBUG to True to see how many packets actually flow. It seems to be a sane amount - normally one packet per header + a packet(s?) for the variable fields, and another one for data (so it was 3 reads to list the file in the example blow, and another 3 to get the data).
I guess this could be optimized in zipfile a little more (e.g. reading all the central directory entries could be read in one shot, since the size of the central directory is in the end-of-central-directory record; also, the file data could be read with the file name / extra headers), but this makes only sense if each "read" is really slow.
Same goes for my code - there is no need to disconnect/reconnect each time.

Well, but this was supposed to be a small experiment anyway :)

Some output / example of usage:

14:44:47 gynvael> python zipdl.py http://gynvael.vexillium.org/dump/example.zip

File Name                                             Modified             Size

readme_EndFirst.txt                            2013-05-13 14:30:34          231

14:44:55 gynvael> python zipdl.py http://gynvael.vexillium.org/dump/example.zip readme_EndFirst.txt

14:45:39 gynvael> ls -la readme_EndFirst.txt

-rw-r----- 1 gynvael gynvael 231 May 13 14:45 readme_EndFirst.txt

14:45:42 gynvael>

And the code itself:

#!/usr/bin/python



# A small PoC of making a HTTP-backed file-like object. In this case it's

# used by the zipfile library, so you can basically list all the files in 

# a ZIP archive that's placed on a server that supports partial downloads.

# You can also download just a single specific file from that archive.

# This might be useful for huge archives where you need only a couple of

# smaller files :)



# Consider this public domian, no magic is here.

# Initially written by gynvael.coldwind//vx (2013)





import zipfile

import os

import sys

import httplib

import urlparse



DEBUG=False



def HTTPGetFileSize(url):

  u = urlparse.urlsplit(url)

  conn = httplib.HTTPConnection(u.netloc)



  path = u.path

  if len(u.query) > 0:

    path += "?" + u.query



  conn.request("HEAD", path)

  res = conn.getresponse()



  if res.status != 200:

    print res

    return False



  data = res.getheader("Content-Length")

  conn.close()

  return int(data)



def HTTPGetPartialData(url, f, t):

  u = urlparse.urlsplit(url)

  conn = httplib.HTTPConnection(u.netloc)



  path = u.path

  if len(u.query) > 0:

    path += "?" + u.query



  conn.request("GET", path, "", {

    "Range": "bytes=%u-%u" % (f, t)

    })

  res = conn.getresponse()



  if res.status not in [200, 206]:

    print res.status, res.reason

    return False



  data = res.read()

  conn.close()



  return data



class MyFileWrapper:

  def __init__(self, url):

    self.url = url

    self.position = 0

    self.total_size = HTTPGetFileSize(url)



    if self.total_size == False:

      raise Exception("file not found or sth like that")

    pass



  def seek(self, offset, whence):



    if whence == 0:

      self.position = offset

    elif whence == 1:

      self.position += offset

    elif whence == 2:

      self.position = self.total_size + offset



    if DEBUG==True:

      print "seek: (%u) %u -> %u" % (whence, offset, self.position)

    pass



  def tell(self):

    if DEBUG==True:    

      print "tell: -> %u" % self.position

    return self.position



  def read(self, amount=-1):



    if amount == -1:

      amount = self.total_size - self.position



    d = HTTPGetPartialData(self.url, self.position, self.position + amount - 1)

    self.position += len(d)



    if DEBUG==True:

      print "read: %u %u -> %u" % (self.position - len(d), amount, self.position)



    return d



# Let's start the code.

if len(sys.argv) not in [2, 3]:

  print "usage: zipdl.py <URL-to-zip> [<filename-to-extract>]"

  sys.exit(1)



f = MyFileWrapper(sys.argv[1])

z = zipfile.ZipFile(f, "r")



if len(sys.argv) == 2:

  z.printdir()

else:

  # Note, running this on Python 2.5 is shooting urself in the foot

  # since there are no anti-path-traversal measures in <2.6.  

  z.extract(sys.argv[2])

【 design & art by Xa / Gynvael Coldwind 】【 logo font (birdman regular) by utopiafonts / Dale Harris 】