|
The Robots Exclusion facilities of the World Wide Web limit spiders (aka
crawlers) as to the content they may index. Most such “web bots” actually follow the
rules. Those that get hairy or mis-behaved may be excluded, resulting in a lack of indexing by their related
search engines. The data below is what is returned, as of now, when an even reasonably behaved “web
bots” retrieves the robots.txt instructions...
/* http://McMtGOP.org/robots.txt at Mon, 06 Sep 2010 10:58:42 GMT */
User-agent: AboutUsBot
Disallow: /
User-agent: ah-ha.com
Disallow: /
User-agent: Alphablend
Disallow: /
User-agent: Alligator
Disallow: /
User-agent: AltoVistoWebCrawler
Disallow: /
User-agent: Amaya
Disallow: /
User-agent: amphetameme
Disallow: /
User-agent: asterias
Disallow: /
User-agent: autoemailspider
Disallow: /
User-agent: b2w
Disallow: /
User-agent: Baiduspider
Disallow: /
User-agent: Balihoo
Disallow: /
User-agent: BitBeamer
Disallow: /
oUser-agent: Brutus
Disallow: /
User-agent: Bullshit
Disallow: /
User-agent: bumblebee
Disallow: /
User-agent: CacheabilityEngine
Disallow: /
User-agent: Camino
Disallow:
User-agent: CAST
Disallow: /
User-agent: Charlotte
Disallow: /
User-agent: Check&Get
Disallow: /
User-agent: CheckLinks
Disallow: /
User-agent: CherryPicker
Disallow: /
User-agent: COMBINE
Disallow: /
User-agent: contype
Disallow: /
User-agent: CPT_CUM_PROXY_CHECKER
Disallow: /
User-agent: curl
Disallow: /
User-agent: Custo
Disallow: /
User-agent: Cyberdog
Disallow: /
User-agent: DeepIndexer
Disallow: /
User-agent: devSoft
Disallow: /
User-agent: DiaGem
Disallow: /
User-agent: DISCo
Disallow: /
User-agent: DnloadMage
Disallow: /
User-agent: DownloadSession
Disallow: /
User-agent: Download Demon
User-agent: Download Express
User-agent: Download Ninja
User-agent: Download Wonder
User-agent: Download
Disallow: /
User-agent: DreamPassport
Disallow: /
User-agent: DSurf
Disallow: /
User-agent: "DTS Agent"
Disallow: /
User-agent: EasyDL
Disallow: /
User-agent: EasyWebPromotion
Disallow: /
User-agent: EBrowse
Disallow: /
User-agent: eCatch
Disallow: /
User-agent: ElectricSurfMaster
Disallow: /
User-agent: Email Extractor
User-agent: Email
Disallow: /
User-agent: EmailSiphon
Disallow: /
User-agent: EmailWolf
Disallow: /
User-agent: ESurf
Disallow: /
User-agent: Exalead
Disallow: /
User-agent: ExtractorPro
Disallow: /
User-agent: ezic.com
Disallow: /
User-agent: fantomBrowser
Disallow: /
User-agent: fast
Disallow: /
User-agent: FDSE
Disallow: /
User-agent: FileHeap! file downloader
User-agent: FileHeap!
User-agent: FileHeap
Disallow: /
User-agent: FileHound
Disallow: /
User-agent: Fluffy
Disallow: /
User-agent: Franklin Locator
Disallow: /
User-agent: FreshDownload
Disallow: /
User-agent: FrontPage
Disallow: /
User-agent: FSurf
Disallow: /
User-agent: Funnel
Disallow: /
User-agent: Gaisbot
Disallow: /
User-agent: www.galaxy.com
Disallow: /
User-agent: Gamespy_Arcade
Disallow: /
User-agent: GetBot
Disallow: /
User-agent: Getinfo
Disallow: /
User-agent: GetRight
Disallow: /
User-agent: Gigabot
Disallow: /
User-agent: Gigabot/3.0
Disallow: /
User-agent: Gigabot/2.0att
Disallow: /
User-agent: Girafabot
Disallow: /
User-agent: gnome-vfs
Disallow: /
User-agent: Go-Ahead-Got-It
Disallow: /
User-agent: Gozilla
Disallow: /
User-agent: Go!Zilla
Disallow: /
User-agent: googlebot
Disallow: /Campaigns
Disallow: /FlexArea
Disallow: /Issues
Disallow: /Nav
Disallow: /Polls
Disallow: /Posts
Disallow: /styles
User-agent: McBot
Disallow: /
User-agent: Mediapartners-Google
Disallow: /
User-agent: grub-client
Disallow: /
User-agent: HLoader
Disallow: /
User-agent: home.thenewweb.com
Disallow: /
User-agent: HTTrack
Disallow: /
# Archive doesn't let MY browser access archive -- spider doesn't get access
User-agent: ia_archiver
Disallow: /
User-agent: iCollect
Disallow: /
User-agent: iGetter
Disallow: /
User-agent: ImageWalker
Disallow: /
User-agent: Industry Program
Disallow: /
User-agent: Indy Library
User-agent: Indy
Disallow: /
User-agent: InternetLinkAgent/
Disallow: /
User-agent: IntScanner
Disallow: /
User-agent: InstallShield DigitalWizard
User-agent: InstallShield
Disallow: /
User-agent: ipd
Disallow: /
User-agent: Innerprise
Disallow: /
User-agent: IPiumBot
Disallow: /
User-agent: Iria
Disallow: /
User-agent: IUPUI Research Bot
Disallow: /
User-agent: Java1.3.0
User-agent: Java1.3.1
User-agent: Java1
User-agent: Java2
User-agent: Java
Disallow: /
User-agent: JoBo
Disallow: /
User-agent: JOC Web Spider
Disallow: /
User-agent: johnhasbeenhere
Disallow: /
User-agent: Kapere
Disallow: /
User-agent: Lachesis
Disallow: /
User-agent: Larbin
Disallow: /
User-agent: LeechGet
Disallow: /
User-agent: libwww-perl
Disallow: /
User-agent: LightningDownload
Disallow: /
User-agent: LinkAlarm
Disallow: /
User-agent: LinkChecker
Disallow: /
User-agent: LinkLint-checkonly
Disallow: /
User-agent: Linkman
Disallow: /
User-agent: LLUPDATECTRL
Disallow: /
User-agent: Mac Finder
Disallow: /
User-agent: Mail Sweeper
Disallow: /
User-agent: Mass Downloader
User-agent: Mass
Disallow: /
User-agent: MetaProducts Download Express
User-agent: MetaProducts
Disallow: /
User-Agent: MFC_Tear_Sample
Disallow: /
User-agent: MFHttpScan
Disallow: /
User-agent: MicrosoftPrototypeCrawler
User-agent: Microsoft URL Control
User-agent: Microsoft
Disallow: /
User-agent: Missauga Locate
User-agent: Missauga Locator
User-agent: Missauga
Disallow: /
User-agent: Missouri College Browse
Disallow: /
User-agent: Mister PiX
Disallow: /
User-agent: MJ12bot
Crawl-Delay: 86400
Disallow: /Campaigns
Disallow: /FlexArea
Disallow: /Issues
Disallow: /Nav
Disallow: /Polls
Disallow: /Posts
Disallow: /styles
User-agent: moget
Disallow: /
User-agent: MovableType
Disallow: /
User-agent: Mozilla/3.0
Disallow: /
User-agent: Mozilla/3.01
Disallow: /
User-agent: Mozzilla
Disallow: /
User-agent: MSIECrawler
Disallow: /
User-agent: MSNBOT_Mobile
Disallow: /
User-agent: msnbot
Disallow: /Campaigns
Disallow: /FlexArea
Disallow: /Issues
Disallow: /Nav
Disallow: /Polls
Disallow: /Posts
Disallow: /styles
User-agent: msnbot-media
Disallow: /
User-agent: MSRBot
Crawl-Delay: 86400
Disallow: /Campaigns
Disallow: /FlexArea
Disallow: /Issues
Disallow: /Nav
Disallow: /Polls
Disallow: /Posts
Disallow: /styles
Disallow: /Photos
Disallow: /FlexPage.aspx?2008Caucus
User-agent: MyGetRight
Disallow: /
User-agent: NaverBot
User-agent: NaverRobot
User-agent: Naver
Disallow: /
User-agent: "Net Probe"
Disallow: /
User-agent: NetPumper
Disallow: /
User-agent: NEWT ActiveX
User-agent: NEWT
Disallow: /
User-agent: NICErsPRO
Disallow: /
User-agent: Nitro Downloader
User-agent: Nitro
Disallow: /
User-agent: "none of your business"
Disallow: /
User-agent: Nudelsalat
Disallow: /
User-agent: Nutch
Disallow: /
User-agent: oBot
Disallow: /
User-agent: Offline Explorer
User-agent: Offline
Disallow: /
User-Agent: page_prefetcher
Disallow: /
User-agent: PagmIEDownload
Disallow: /
User-agent: panscient.com"
Disallow: /
User-agent: pavuk
Disallow: /
User-agent: PlantyNet_WebRobot
Disallow: /
User-agent: Plucker
Disallow: /
User-agent: Pockey
Disallow: /
User-agent: Popdexter
Disallow: /
User-agent: Program Shareware
User-agent: Program
Disallow: /
User-agent: Progressive Download
User-agent: Progressive
Disallow: /
User-agent: ProxyTester
Disallow: /
User-agent: puf
Disallow: /
User-agent: PuxaRapido
Disallow: /
User-agent: Python-urllib
Disallow: /
User-agent: Python-webchecker
Disallow: /
User-agent: RobotMidareru
Disallow: /
User-agent: RealDownload
Disallow: /
User-agent: RPT-HTTPClient
Disallow: /
User-agent: RepoMonkey Bait & Tackle
User-agent: RepoMonkey
Disallow: /
User-agent: Scat
Disallow: /
User-agent: ScoutAbout
Disallow: /
User-agent: Sonic
Disallow: /
User-agent: searchpreview
Disallow: /
User-agent: semanticdiscovery
Disallow: /
User-agent: Siphon
Disallow: /
User-agent: SiteSnagger
Disallow: /
User-agent: SiteWinder
Disallow: /
User-agent: Slurp
Disallow: /
Crawl-delay: 86400.0
User-agent: SlySearch
Disallow: /
User-agent: SmartDownload
Disallow: /
User-agent: SOFTWING_TEAR_AGENT
Disallow: /
User-agent: "Space Bison"
Disallow: /
User-agent: SpeedDownload
Disallow: /
User-agent: sprocket
Disallow: /
User-agent: SQ Webscanner
User-agent: SQ
Disallow: /
User-agent: "SSM Agent"
Disallow: /
User-agent: Stamina
Disallow: /
User-agent: Star Downloader
User-agent: Star
Disallow: /
User-agent: Steeler
Disallow: /
User-agent: SuperHTTP
Disallow: /
User-agent: SurveyBot
Disallow: /
User-agent: SynoBot
Disallow: /
User-agent: Teleport
Disallow: /
User-agent: Teoma
Disallow: /
User-agent: "tom bot"
Disallow: /
User-agent: "TSW Bot"
Disallow: /
User-agent: TurnitinBot
Disallow: /
User-agent: TweakMASTER
Disallow: /
User-agent: UdmSearch
Disallow: /
User-agent: Undisclosed
Disallow: /
User-agent: URLGetFile
Disallow: /
User-agent: "URL Spider SQL"
Disallow: /
User-agent: UtilMind HTTPGet
User-agent: UtilMind
Disallow: /
User-agent: VCIKJZDDLS
Disallow: /
User-agent: vobsub
Disallow: /
User-agent: W3C-checklink
Disallow: /Campaigns
Disallow: /FlexArea
Disallow: /Issues
Disallow: /Nav
Disallow: /Polls
Disallow: /Posts
Disallow: /styles
User-agent: W3C_Validator
Disallow: /Campaigns
Disallow: /FlexArea
Disallow: /Issues
Disallow: /Nav
Disallow: /Polls
Disallow: /Posts
Disallow: /styles
User-agent: WebAlta
Disallow: /
User-agent: WebAuto
Disallow: /
User-agent: WebCapture
Disallow: /
User-agent: Webclipping.com
Disallow: /
User-agent: webcollage
Disallow: /
User-agent: WebCopier
Disallow: /
User-agent: "Web Downloader"
Disallow: /
User-agent: WebLeacher
Disallow: /
User-agent: "Web Link Validator"
Disallow: /
User-agent: "Web Magnet"
Disallow: /
User-agent: WEBMOLE
Disallow: /
User-agent: WebReaper
Disallow: /
User-agent: WebSauger
Disallow: /
User-agent: Website eXtractor
Disallow: /
User-agent: WebStripper
Disallow: /
User-agent: WebZIP
Disallow: /
User-agent: WEP Search 00
User-agent: WEP Search
User-agent: WEP
Disallow: /
User-agent: WGet
User-agent: Wget
Disallow: /
User-agent: WhizBang
Disallow: /
User-agent: Wildsoft Surfer
User-agent: Wildsoft
Disallow: /
User-agent: WinHttp.WinHttpRequest
Disallow: /
User-agent: www4mail
Disallow: /
User-agent: WWWOFFLE
Disallow: /
User-agent: Xaldon WebSpider
User-agent: Xaldon
Disallow: /
User-agent: xEdit
Disallow: /
User-agent: Xenu
Disallow: /
User-agent: Yahoo-MMCrawler
Disallow: /
User-agent: Yahoo! Slurp
Disallow: /Campaigns
Disallow: /FlexArea
Disallow: /Issues
Disallow: /Nav
Disallow: /Polls
Disallow: /Posts
Disallow: /styles
Crawl-delay: 86400.0
User-agent: Yandex
Disallow: /
User-agent: Zao
Disallow: /
User-agent: ZBot
Disallow: /
User-agent: Zeus
Disallow: /
User-agent: ZyBorg
Disallow: /
User-agent: http://www.almaden.ibm.com/cs/crawler
Disallow: /
User-agent: whsearch
Disallow: /
User-agent: *
Disallow: /Campaigns
Disallow: /FlexArea
Disallow: /Issues
Disallow: /Nav
Disallow: /Polls
Disallow: /Posts
Disallow: /styles
Crawl-delay: 172800
|