Go back to previous page
Forum URL: http://www.dombom.com/cgi-bin/dcforum/dcboard.cgi
Forum Name: The New MadBomber Marketing and SEO Forum
Topic ID: 44
Message ID: 2
#2, RE: Web Crawler that Lists URL and and Pictures....
Posted by sgtaw on Dec-15-06 at 12:55 PM
In response to message #1
Mery Christmas Kurt!

Thanks for the quick reply....

I'm not sure that html2rss will work... I tried to change your tags for my purposes and got an error.

Here is what I am trying to do.

1. Let's take this site for example http://www.tennis-warehouse.com
I want to "crawl" this site grabbing all the product pages.

2. In grabbing those pages, I want to be able to grab various bits of information (this can change from site to site). The key items are: url of the page, title, metadescription, and (the problem child) the picture url.

For instance, http://www.tennis-warehouse.com/descpage.html?PCODE=MTLX10.

In addition to the items I mentioned, I want to grab the picture of the tennis racket. Most preferrably, I would want to have the url of where the picture is located.

3. I then want to be able to have all that information saved as a CSV so that I can upload it, for instance to BIB.

I played a tiny bit with instantrss. I found a unique tag in the webpages and replaced instantrss tags. But it got me an error.

I guess a work around would be to down load the site I am interested in. Then do a replacez using the tags that you have in instantrss. Then uploading the site to my server so that I can run instantrss.

Thanks Kurt!

Ed