rebol [ Title: "batch-download" File: %batch-download.r Date: 5-Sep-2006 Version: 1.0.4 Progress: 0.5 Status: "working, looking fairly solid, tested on Rebol/View 1.3.2.3.1 and 1.3.1.3.1" Needs: [View] Author: "Anton Rolls" Language: 'English Purpose: {Download a list of files from the internet, one by one, a chunk at a time. Interruptions are auto-resumed.} Usage: {see demo-batch-download.r} History: [ 1.0.0 [8-Jan-2004 {First version} "Anton"] 1.0.1 [14-May-2006 {integrating into a single function} "Anton"] 1.0.2 [15-May-2006 {integration fairly complete, working well} "Anton"] 1.0.3 [16-May-2006 {looking good, downloads all 10 of Graham's EMR files without problem, saved a snapshot of version 1.0.3 to the freezer, removed old batch-download function and support functions, increased width of window from 400 to 500, added /hold option, "Abort" button text changes to "Close"} "Anton"] 1.0.4 [5-Sep-2006 {added support for FTP urls, ports are explicitly opened in binary mode to support this} "Anton"] ] ToDo: { - long urls (such as below) don't entirely fit in the field - resize proportionally with window ? - some urls (eg. with a query) don't convert to filenames well, so cause an error, eg. Richard Dawkins - Religion, The Root Of All Evil: http://vp.video.google.com/videodownload?version=0&secureurl=uAAAAMCAoA8uyBH3dnmZrI8dl19B972oujZrC2Pf9AULfMxriXVovYQvUcoV6ItQFge81S5Y2uV5j9pcTXPF-q1Oft2kYH5otgoJAdtEOvsGLsDZkMZhmLwP6XiZkC305yAqMwSgAiL-VlxrWgBP5vziJykDnjmQHM7J7asIonhhxZtcIQ6pJOsysmwEs_95NIjsTn_Al-m_m3lnIcCEOcTZOuVKidtI1h-t_7M3Opx10kbtUTs7jBa6pwanM5G8yzzAZw&sigh=BAzdQZhCO9vrzkDoLaaYxlkdo_o&begin=0&len=2878800&docid=-2439999165547892433 - log-area style for showing connection/restart etc. messages - graph traffic for visual feedback. - Maybe one graph for the total time, and one graph for the last 5 minutes, say. - study SurfNET updater - resizeable window (some urls were slightly too long) - get IE / Firefox spoof strings ready (in anticipation of those difficult servers) - Change this line so it says whether the download was resumed or started from the beginning. set-status "Connected. Starting download..." - append .part to partial files filenames ? - improve the gui for multiple files - consider scrolling list of files using percent-progress style (but that will bloat out a fair bit) - prescan all urls to check availability and get total size of all the files (more bloat) but then could have a single progress bar for the entire process. - total-size: to-integer port/locals/headers/content-length ; <-- surely using port/size is the same ? No, it actually isn't the same, port/size is 0 for net ports, for some rebol versions. - check which versions - port/locals/headers (trying to get the length) was invalid path for this url: ftp://aiedownload.intel.com/df-support/4666/eng/win9xe67.exe - DONE support FTP, which currently breaks batch-download Example urls from Graham: ftp://mirror.cs.wisc.edu/pub/mirrors/ghost/AFPL/gs853/gs853w32.exe ftp://mirror.cs.wisc.edu/pub/mirrors/ghost/ghostgum/gsv48w32.exe Error message: "unable to connect at resume position" ** Script Error: Invalid path value: headers ** Where: connect ** Near: total-size: to-integer port/locals/headers/content-length if size - use port/size for FTP (already using port/locals/headers/content-length for HTTP) - DONE handle FTP error "cannot skip a not binary file port" on open/direct/skip url length - open/direct/binary/skip <--- this works (port/size always reflects the full size of the FTP file, not affected by skip position) - I think one or both of SKIP and DIRECT mode implies BINARY mode, but if you try to open an FTP port like this: So this might be a problem of the FTP scheme, so look into fixing it ? - I noticed FTP can be resumed right at the end of the file (COPY returns NONE straight away, of course) This looks like it is handled ok. - IMPROVE THIS: when can't restart from resume position, print to the console the size reached before restarting - well... should go into log-area - do something with this error message when trying to resume a completed HTTP file ? Unable to connect at resume position. Size reached: 332288 Error. Target url: http://www.rops.org/download/freescript53.exe could not be retrieved. Server response: HTTP/1.1 416 Requested Range Not Satisfiable near: [port: open/binary/direct/skip url size - careful don't write code only for HTTP - error message really needs to be sent to a log-area style, so the user has a chance to understand situations handled poorly by the code. - need some say in where local files are stored, accept a block of blocks ? - check result of read-net-chunk - investigate if changing buffer size makes any difference - auto-resume if download fails, try again after 1 second, 2 secs, 4 secs, 8 secs etc.. Actually, make it logarithmically proportional to the number of retries in the last 5 minutes. Eg. if there has been 1 retry, then wait 1 second, 2 retries -> 4 seconds, 3 retries, 9 seconds etc. - send the info that it stalled via callback - if no progress has been made 5 minutes after a stall, then stop trying (and call a callback to report that) - read-thru-chunk: when resuming, snip off last little bit of the file in case it's corrupted ? - start time, bytes since start time, Est. time to completion - keep all such time data associated with files in a serialised file (like a download manager) - round percentage progress to two decimal places ? - format bytes and total, putting commas at thousands mark - doing size? url is wasteful, makes me think I need to rework the lot (pretty easy), anyway, catch the error when this web access fails (read-thru-chunk) - if doing a start/stop anytime serialisable download manager, this is less of a problem, because can use the size stored previously in the file No, looks like still have to connect twice, because the total-size is what's left after the skip position. - could use freezer/form-error.r - how to detect if resume is not available ? some web servers do not support it - examples ? - option to change default 60 second timeout (way down the bottom in read-net-chunk) - allow adding downloads while running ? (by scanning a directory ?) } Notes: { batch-download downloads files one by one, strictly from start to finish. As such, it tries to ensure that each file completes before continuing to the next one. If you leave it alone, it should complete each file in order. If a connection fails, then it will keep retrying until success, or until forever is reached :) Pressing "Skip" or "Abort" can leave files only partially downloaded, of course. } Public-Functions: [batch-download] ] context [ ; required by include framework ; new integrated version (squeezed all the functionality of the above three functions in) batch-download: func [ {Download some files from the internet, one at a time, chunk by chunk, auto-resuming when interrupted. When finished, returns a block of the urls that downloaded completely. (Note if the user Aborts early, complete files which haven't been examined yet will not be present in the returned block.)} urls [block!] "files to download" /to-dir dest-dir [file!] /hold "Leave window open when finished" /local wait-time initial-wait-time increase-wait-time completed window title url-txt prog stat set-status show-progress connect disconnect work paused? skip? abort? complete? n-url url file loc-path port total-size buffer buffer-size size response data ][ wait-time: initial-wait-time: 00:01:00 / 32 ; double it five times to reach the maximum of 1 minute. increase-wait-time: does [wait-time: min 0:01:00 wait-time * 2] completed: copy [] ; urls of complete downloads view/new center-face window: layout [ space 10x8 title: vh2 500 "Downloading File" ;"in chunks:" url-txt: text center 500 "url" prog: progress 500 across tog "Pause" [work: either value ['pause]['unpause]] btn "Skip" [if n-url <= length? urls [skip?: true set-status "Skipping..."]] stat: text 300x24 as-is middle ; <- middle doesn't work! better right align anyway I think btn 70 "Abort" [abort?: true set-status "Aborting..." unview/only window] ; (unview in case all finished) ] window/feel: make window/feel [ detect: func [face event][ ; <- no key handler if event/type = 'close [abort?: true set-status "Aborting..."] event ] ] set-status: func [text][stat/text: text show stat] show-progress: does [ prog/data: size / (max 1 total-size) set-status reform [size "of" total-size "bytes" join "^/" round/to 100 * min 1 prog/data 0.01 "%"] show [prog stat] ] connect: does [ ;?? file size: any [size? file 0] ;?? size if port [print "Error!! port looks already open." exit] set-status "Opening port..." ;?? url if error? set/any 'err try [ ;port: open/direct/skip url size ; resume position <- this can fail when file is complete already port: open/binary/direct/skip url size ; resume position <- this can fail when file is complete already ; (FTP requires BINARY mode with SKIP) ;print ["after open/binary/direct/skip, port/size:" mold port/size] total-size: any [ all [port/size > 0 port/size] ; as used by FTP (port/size is not affected by skip position) all [ port/locals/headers/content-length ; as used by HTTP size + to-integer port/locals/headers/content-length ; content-length is relative to skip position ] ] ][ set-status "Unable to connect at resume position. (Maybe complete ?)" ;print ["Unable to connect at resume position. Size reached:" size] ;print mold disarm err ; The error could be: {HTTP/1.1 416 Requested Range Not Satisfiable} if error? try [ ;port: open/direct url port: open/binary/direct url ][ set-status "Unable to connect." exit ] ;total-size: to-integer port/locals/headers/content-length total-size: any [ all [port/size > 0 port/size] ; as used by FTP all [ port/locals/headers/content-length ; as used by HTTP to-integer port/locals/headers/content-length ] ] if size <> total-size [ ; did remote file change length ? ;print ["(size:" size ") <> (total-size:" total-size ")"] set-status probe "Error - size mismatch! Restarting from beginning." ; restart this download from 0 delete file size: 0 ] ] show-progress ;?? total-size set-status join "Opened port - total size: " total-size if size = total-size [set-status "File complete." complete?: skip?: true exit] if size > total-size [set-status "Local file is longer than remote file!" skip?: true exit] set-modes port/sub-port [lines: false binary: true no-wait: true] ;port: skip port size ; resume position buffer: make binary! buffer-size: 8000 set-status "Connected. Starting download..." wait-time: initial-wait-time ] disconnect: does [ ;if none? port [print "Error, port looks already closed." exit] either port [ set-status "Closing port..." close port port: none set-status "Closed port." ][ set-status "Port is closed." ] ] n-url: 0 while [(n-url: n-url + 1) <= length? urls][ url: urls/:n-url set-face title reform ["Downloading" either 1 < length? urls [reform [n-url "of" length? urls "files"]]["file"]] ;"in chunks:"] file: last split-path url ; <--- very simple, causes problems with url query strings and long urls in general ;;file: %richard-dawkins.mp4 ; <---- temporarily if to-dir [file: dest-dir/:file] loc-path: first split-path file if not exists? loc-path [make-dir/deep loc-path] set-face url-txt url complete?: false until [ if all [none? port not paused?][connect] if skip? [skip?: false break] either any [paused? none? port][ ;?? wait-time ;probe type? port response: wait wait-time ; some time to handle gui events ;?? response ; <--- response type is port! after failure to connect then Abort (that's weird) ][ response: wait [30 port/sub-port] ; 30 second timeout <-- also means slow reaction to gui actions when server is slow ] if abort? [unview/only window break] if skip? [skip?: false break] either response [ ; port data available ? if not port? response [print "response Not port!!!"] clear buffer if data: copy/part port/sub-port buffer-size [ ; copy a chunk append buffer data write/append/binary file buffer size: size + length? data show-progress ] complete?: none? data ; complete when no more data ][ ; wait returned NONE if not paused? [ ; not because we are paused, must be a timeout... set-status "Stalled, reconnecting..." disconnect connect if none? port [increase-wait-time] ] ] if work = 'pause [ disconnect wait-time: initial-wait-time set-status "Paused." paused?: true work: none ] if work = 'unpause [ ; unpausing ? connect ; reconnect and resume the download either port [ paused?: false work: none ][ ; couldn't connect (try again later) increase-wait-time ] ] complete? ] disconnect if complete? [append completed url] if abort? [break] ] set-status "Finished processing all files." if viewed? window [ either hold [ set in last window/pane 'text "Close" show window ; change button text from "Abort" to "Close" do-events ][ unview/only window ] ] completed ] ] ; end of context