CMS for collecting basic info fromm other sites

They have: 2 posts

Joined: Feb 2015

Hi All,
Could you please help me find right CMS and plugin for building up site which collects basick info on title and post / artickle from other sites ?
Simple example would be a site that collects movies description from other sites like IMDB.com. This would work in a way that user enters the site, search for instants 'American Sniper' and gets the list of few sites that contains that movie description imdb.com, movieguide.org, ... with consolidated basic info about the movie like rating and type of movie with the link to respective web site.

Any information that would save me going through most popular CMSes and hundreds of plug-ins would be much appreciated.

BTW. Are there any working (free or $) scripts that would help me do the simple crawl through the web sites and collect data from them? Like simple just take a title and rating described in this or that html tag and save to the DB ?

Thanks
Gonzales

Greg K's picture

He has: 2,145 posts

Joined: Nov 2003

Well, there are two thoughts that come to mind reading this...

1. The legality of scraping sites like IMDB... Probably against their terms of use for the site

2. Do you realize how huge that place is, updated daily, and how long it would take to crawl the entire site? Probably not really going to be feasible. (hint, go to google and put this as the search term site:imdb.com It comes back saying they have over 46 million pages indexed...

I really think you may want to rethink what you are wanting to do and narrow it down to something more doable than a broad scraping of sites.

They have: 2 posts

Joined: Feb 2015

Greg,
Thanks for the reply. Imdb.com was just an example however I'd disagree with you on this (even if this is just and example Wink ).
I will double check on the legality but I doubt posting the link to theirs site will be an issue. Please mind that I was referring to only posting the link with the title etc not coping description of the movies and other site content.
For the data volume, since I'm planning to pull only titles, this should be no issue. Even if they have 100k movie or more that is not a problem for these days DB engines and / or server hardware. .

Regards
G

They have: 1 posts

Joined: Jun 2016

How about rss feeds? I want to display rss feeds possibly merge sort and colate the items on them from several list, choose some to be saved which will then be added to my database in a way that I can search for them by keywords, list and sort them? Thanks. Ive done it with mediawiki but found that it was resource heavy and was hoping there was a way with a smaller footprint or perhaps just more resource friendly

They have: 66 posts

Joined: Jan 2016

I agree with rich59123 . You can use rss feeds. Install and set up a Wordpress site. Install an RSS feeds plugin. Get the RSS feed url of the sites you are interested in, then place their url in your plugin configuration. Set the time of the post to be posted. That's it!

We don't try. We Do It. -- 3wcorner.com

They have: 66 posts

Joined: Jan 2016

I agree with rich59123 . You can use rss feeds. Install and set up a Wordpress site. Install an RSS feeds plugin. Get the RSS feed url of the sites you are interested in, then place their url in your plugin configuration. Set the time of the post to be posted. That's it!

We don't try. We Do It. -- 3wcorner.com

Want to join the discussion? Create an account or log in if you already have one. Joining is fast, free and painless! We’ll even whisk you back here when you’ve finished.