Getting Started¶

Scraping video pages¶

Most use cases will simply require the auto_scrape() function.

>>> from vidscraper import auto_scrape
>>> video = auto_scrape("http://www.youtube.com/watch?v=J_DV9b0x7v4")
>>> video.title
u'CaramellDansen (Full Version + Lyrics)'

That’s it! Couldn’t be easier. auto_scrape() will pull down metadata from all the places it can figure out based on the url you entered and return a Video instance loaded with that data.

If vidscraper doesn’t know how to fetch data for that url – for example, if you try to scrape google.com (which isn’t a video page) – UnhandledVideo will be raised.

Limiting metadata¶

Videos can have metadata pulled from a number of sources - for example, a page scrape, an OEmbed API, and a service-specific API. When loading video data, vidscraper will query as many of these services as it needs to provide the data you ask for.

So, if you only need certain pieces of metadata (say the title and description of a video), you can pass those fields to auto_scrape() and potentially save HTTP requests:

>>> video = auto_scrape(url, fields=['title', 'description'])

Getting videos for a feed¶

If you want to get every video for a feed, you can use auto_feed():

>>> from vidscraper import auto_feed
>>> feed = auto_feed("http://blip.tv/djangocon/rss")

This will read the feed at the given url and return a generator which yields Video instances for each entry in the feed. The instances will be preloaded with metadata from the feed. In many cases this will fill out all the fields that you need. If you need more, however, you can tell the video to load more data manually:

>>> video = feed.next()
>>> video.load()

(Don’t worry - if vidscraper can’t figure out a way to get more data, it will simply do nothing!)

The feed instance is a lazy generator - it won’t make any HTTP requests until you call next() the first time. It will only make a second request once you’ve gotten to the bottom of the first page.

Not crawling a whole feed¶

By default, auto_feed() will try to crawl through the entire feed. Depending on the feed you’re crawling, you could be there for a while. If you’re pressed for time (or bandwidth) you can limit the number of videos you pull down:

>>> from vidscraper import auto_feed
>>> feed = auto_feed("http://blip.tv/djangocon/rss")
>>> len(list(feed))
117
>>> feed = auto_feed("http://blip.tv/djangocon/rss", max_results=20)
>>> len(list(feed))
20

Searching video services¶

It’s also easy to run a search on a variety of services with auto_search():

>>> from vidscraper import auto_search
>>> searches = auto_search('parrot -dead', max_results=20)
>>> searches
[<vidscraper.suites.blip.Search object at 0x10b490f90>,
 <vidscraper.suites.youtube.Search object at 0x10b49f090>]

You’ll get back a list of search iterables for suites which support the search parameters. These have the same behavior in terms of loading new pages that you see in the feed iterator.

>>> video = searches[0].next()
>>> video.title
u"Episode 57: iMovie HD '06, Part II"