Intelligent Download Manager Service (project idea)

Technical contact: delphij@

Why?

The current FreeBSD ports infrastructure uses fetch(1) to get distfiles. This approach is simple and straightforward but have many shortcomings:

The system have no idea about how to decide where to get the files. For instance, a port may have 10 different distfiles, and 2 of them are located in the same country, but the system will have to either a) try based on the order that the maintainer decides, or b) download in random order, or c) prefer using the FreeBSD.org master site, which can be pointed to a local site;
There is no global limit of how many connections one system can made, for instance, one may start downloads in parallel by running multiple 'make checksum' inside many ports directories at the same time, which end up hitting the same master site and that could result in sub-optimal performance;
There is no prevention of having two processes downloading the same file at the same time because all download processes are independent, for instance when two independent ports are referencing one single big distfile;
There is no speed limiting mechanisms (This is not a goal if this would be a SoC project).
There is no support of modern P2P distribution, it would be useful to very large files when downloading, for instance with the installer. (This is not a goal if this would be a SoC project, but having the service enables the possibility of supporting P2P).

Desired workflow

The intelligent download manager consists two components. The first one is an agent, which is a command line tool that accepts information for downloading files (target directory and filename, URLs for downloading the file with an optional priority number, size, and even an expected cryptographical hash). The agent checks if there is a running instance of the service and invokes one if there is no one running, then submit these information to the service.

To simulate existing fetch(1) command which is script friendly, the agent should be able to work in blocking manner, where after it submits the request to download manager service, it wait for response and can optionally inquiry the service for status (where are we downloading from, how much data have been downloaded, etc) and present that to the user.

Required features

Mid-term record for site an their speeds. "Mid term" means that we set up a mapping between IPv4/IPv6 addresses and their speed collected from download procedure (e.g. five samples of speed for each site, collected in 1 minute interval, with timestamp and use only samples collected from within last week, this should be configurable);
Each download site is considered as a "worker" that can pick tasks from the download task waiting queue; if there is waiting task that can be download from this site, the worker grabs that task and reuses existing TCP connection; if there is no matching task, the worker waits for 15 seconds before quitting;
Download worker can also preempt a task from another worker, if there is absolute need (the other worker expects to use more time to download AND this worker can download the full file faster than the other worker); (Not required feature for SoC project), if there is no available task from the queue;
For each download task (filename that is mapped to one or many URLs), if there is no matching worker or all workers are busy, start a new worker based on priority; when two sites are assigned with same priority, pick one randomly. URLs are marked with a "scan number" when a worker is started, to prevent same site being tried again for permanent errors;
The command line agent to simulate fetch(1) and phttpget(1);
Command line tool to dump the current state (optional for SoC project);