How to Traverse a Website Programmatically

03-Apr-2007 procrastinate 0 349 views

Have you ever creating offline archive of a website that you like? Which “offline downloader” did you use? Hopefully this post will give you rough idea on how to traverse the website ( and do whatever you want to do with the content ). My idea of traversing a website represented in the following steps:

Define the startup page
Collect all hyper-links that have not been visited and put it into queue
Do something with the content
Dequeue a hyper-link and return to step 2
Repeat until no more links in the queue.

I have build my own prototype that suit my needs. I might need further tweaks and improvements before releasing it to the public. My main goal would be adding functionality to define characteristics of the page that will be traversed in step 2, and functionality of what you can do with the content of the page. Stay tuned for further updates.

GD Star Rating
loading...

About Hardono

Howdy! I'm Hardono. I am working as a Software Developer. I am working mostly in Windows, dealing with .NET, conversing in C#. But I know a bit of Linux, mainly because I need to keep this blog operational. I've been working in Logistics/Transport industry for more than 11 years.

SODEVE

How to Traverse a Website Programmatically

About Hardono

Incoming Search

No Comment

Your Comment

BLOG

TAGS

SODEVE

How to Traverse a Website Programmatically

Possibly relevant:

About Hardono

Incoming Search

No Comment

Your Comment

BLOG

TAGS