.. Copyright 2009 - Participatory Culture Foundation This file is part of vidscraper. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS`` AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. Getting Started =============== Scraping video pages ++++++++++++++++++++ Most use cases will simply require the auto_scrape function. Usage is incredibly easy:: >>> from vidscraper import auto_scrape >>> video = auto_scrape("http://www.youtube.com/watch?v=J_DV9b0x7v4") >>> video.title u'CaramellDansen (Full Version + Lyrics)' That's it! Couldn't be easier. auto_scrape will determine the right :doc:`scraping suite ` to use for the url you pass in and will use that suite to return a :class:`.ScrapedVideo` instance that represents the data associated with the video at that url. If no suites are found which support the url, :exc:`.CantIdentifyUrl` will be raised. If you only need certain fields (say you only need the "file_url" and the "title" fields), you can pass those fields in as a second argument:: >>> video = auto_scrape(url, fields=['file_url', 'title']) Video fields ------------ If a :class:`.ScrapedVideo` is initialized without any fields, then :mod:`vidscraper` will assume you want all of the fields for the video. When the :class:`.ScrapedVideo` is being loaded, :mod:`vidscraper` will maximize the number of requested fields that it fills; occasionally, this may mean that it will make more than one HTTP request. This means that limiting the fields to what you are actually using can save quite a bit of work. Getting videos for a feed +++++++++++++++++++++++++ If you want to get every video for a feed, you can use :func:`vidscraper.auto_feed`:: >>> from vidscraper import auto_feed >>> results = auto_feed("http://blip.tv/djangocon/rss") This will read the feed at the given url and return a generator which yields :class:`.ScrapedVideo` instances for each entry in the feed. The instances will be preloaded with metadata from the feed. In many cases this will fill out all the fields that you need. If you need more, however, you can tell the video to load more data manually:: >>> video = results.next() >>> video.load() (Don't worry - if :mod:`vidscraper` can't figure out a way to get more data, it will simply do nothing!) .. note:: Because this function returns a generator, the feed will actually be fetched the first time the generator's :meth:`next` method is called. Crawling an entire feed ----------------------- :func:`auto_feed` also supports feed crawling for some suites. You use it like this:: >>> from vidscraper import auto_feed >>> results = auto_feed("http://blip.tv/djangocon/rss", crawl=True) Now, when the generator runs out of results on the first page, it will automatically fetch the next page, and then the next, and so on. This is not for the faint of heart. Depending on the feed you're crawling, you could be there for a while. Searching video services ++++++++++++++++++++++++ It's also easy to run a search on a variety of services that support it. Simply do the following:: >>> from vidscraper import auto_search >>> results = auto_search(['parrot'], exclude_terms=['dead']).values() The search will be run on all suites that support searching, and the results will be returned as a dictionary mapping the suite used to the results for that feed.