|
Posted by Ben C on 07/20/07 07:08
On 2007-07-19, M <nowhereman@twilightzone.net> wrote:
> "dorayme" <doraymeRidThis@optusnet.com.au> wrote in message
> news:doraymeRidThis-CCE468.07313320072007@news-vip.optusnet.com.au...
>> In article <bCMni.131315$NV3.476@pd7urf2no>,
>> "M" <nowhereman@twilightzone.net> wrote:
>
>> Give an example of one url you would like to do this to.
>
> Not sure why this is relevant but, hey, if it leads to something. . . As an
> example:
>
> http://niftytutorials.com/basics/transform-your-photos-into-a-beautiful-mosaic/1/
>
> Essentially, I just want to save barebones articles with any relevant
> images. I don't want Google ads, sidebars, irrelevant banner images, forms,
> search boxes, background images, scripts, etc.
>
> Sometimes the website is gracious enough to offer a print version which gets
> rid of most of this stuff.
>
> I have a Notetab script which does most of what I want but wanted to see if
> something else out there is better at it.
If you want to get a lot of stuff out of one particular site a script
using curl and BeautifulSoup (which is a Python module) may be the way
to go, especially if the content has class or id attributes in it that
you can use to latch onto the bits you want.
I use this method for TV listings and traffic news.
[Back to original message]
|