browse by category or date

Have you ever creating offline archive of a website that you like? Which “offline downloader” did you use? Hopefully this post will give you rough idea on how to traverse the website ( and do whatever you want to do with the content ). My idea of traversing a website represented in the following steps:

  1. Define the startup page
  2. Collect all hyper-links that have not been visited and put it into queue
  3. Do something with the content
  4. Dequeue a hyper-link and return to step 2
  5. Repeat until no more links in the queue.

I have build my own prototype that suit my needs. I might need further tweaks and improvements before releasing it to the public. My main goal would be adding functionality to define characteristics of the page that will be traversed in step 2, and functionality of what you can do with the content of the page. Stay tuned for further updates.

GD Star Rating
loading...

Possibly relevant:

About Hardono

Howdy! I'm Hardono. I am working as a Software Developer. I am working mostly in Windows, dealing with .NET, conversing in C#. But I know a bit of Linux, mainly because I need to keep this blog operational. I've been working in Logistics/Transport industry for more than 11 years.

Incoming Search

procrastinate

No Comment

Add Your Comment