Googlebot

Zaczęty przez cieplutki, 17 Czerwiec 2008, 15:16

0 użytkowników i 1 Gość przegląda ten wątek.

Anette

no tak myślalam..skórka nie warta wyprawki a im to do niczego nie potrzebne..Ok i dzięki :)BIG PLUSIK dla Ciebie Pozdrawiam  i życze wszystkim pożytku z Twojej porady i pliku

Nolt

#16
Cytat: roco w 20 Czerwiec 2008, 18:31
Hmm ta "inna rzecz" to plik "who.template.php" jeśli zezwalamy na widok wszystkim, albo jesteśmy zalogowani, to na dole forum tam gdzie centrum info, mamy staty, tj. users online i po kliknieciu w link lub graficzkę z lewej strony przeniesiemy się do do lisy, => kto jest on line łącznie z pajakami sieciowymi.

To co wrzucil Nolt, to jest wersja zmodyfikowana, i pokazuje zamiast jednej listy - to 3 oddzielne: Zalogowani, czyli "domownicy", Vizytujacy czyli goście i roboty..

Ale ja choć nie mam spolszczonego tego pliku, to mam o wiele bardziej wypaśny, - zmodyfikowałem go jakiś czas temu, a własnie teraz go zaktualizowałem o listy botów. W sumie jest ich tam 90!
Hmm standardowo jak widziales jest:
                //Search Spiders
array (
'agent' => 'WISENutbot',
'spidername' => 'Looksmart spider',
                        'spider' => true,
),
array (
'agent' => 'MSNBot',
'spidername' => 'MSN spider',
'spider' => true,
),
array (
'agent' => 'W3C_Validator',
'spidername' => 'W3C Validator',
        'spider' => true,
),
array (
'agent' => 'Googlebot-Image',
'spidername' => 'Google-Image Spider',
'spider' => true,
),
array (
'agent' => 'Googlebot',
'spidername' => 'Google spider',
'spider' => true,
),

array (
'agent' => 'Mediapartners-Google',
'spidername' => 'Google AdSense spider',
'spider' => true,
),

                array (
'agent' => 'Scooter',
'spidername' => 'Altavista spider',
'spider' => true,
                ),
array (
'agent' => 'Yahoo! Slurp',
'spidername' => 'Yahoo spider',
'spider' => true,),

                array (
'agent' => 'FAST-WebCrawler',
'spider' => true,
                ),
                array (
'agent' => 'Wget',
'spider' => true,
                ),
array (
'agent' => 'Ask Jeeves',
'spider' => true,

),
array (
'agent' => 'Speedy Spider',
'spider' => true,

),
array (
'agent' => 'SurveyBot',
'spider' => true,

),
array (
'agent' => 'IBM_Planetwide',
'spider' => true,

),

                array (
'agent' => 'GigaBot',
'spider' => true,

),
array (
'agent' => 'ia_archiver',
'spider' => true,
),
array (
'agent' => 'FAST-WebCrawler',
'spider' => true,

),
array (
'agent' => 'Inktomi Slurp',
                        'spider' => true,
),
               
                array (
'agent' => 'appie',
'spidername' => 'Walhello spider',
'spider' => true,

więc myślę ze te 90... to troche przesadyzm ;]
I... nie 2 słowa wymagane są do spolszczenia tylko 3 ;]
www.wizzi.pl
Moje style dla SMF 2

SMF.PL FAQ
nie odpowiadam na żadne PW, więc ich nie wysyłajcie chyba że zostaniecie o to poproszeni

roco

hehe tak gwoli ścisłości:

Cytat: roco w 20 Czerwiec 2008, 18:31
Co do spolszczenia.. hmm to raptem 3 słowa, które występują 2 razy.. Mozna zobaczyć z foty z załacznika.

nawet nie powiedziałem że przysłowiowe 2 na krzyż..

Nolt, te pliki pochodzą pewnie z tego samego źródła.. Tylko ja o swój zadbałem, zdobyłem i inne nazwy robotów i s tym co było w moim pliku jest teraz 90 ~ wpisów samych robotów..

To jest w tym twoim, czyli defaultowym + twoje spolszczenie:

$known_agents = array (
                //Search Spiders
array (
'agent' => 'WISENutbot',
'spidername' => 'Looksmart spider',
                        'spider' => true,
),
array (
'agent' => 'MSNBot',
'spidername' => 'MSN spider',
'spider' => true,
),
array (
'agent' => 'W3C_Validator',
'spidername' => 'W3C Validator',
        'spider' => true,
),
array (
'agent' => 'Googlebot-Image',
'spidername' => 'Google-Image Spider',
'spider' => true,
),
array (
'agent' => 'Googlebot',
'spidername' => 'Google spider',
'spider' => true,
),

array (
'agent' => 'Mediapartners-Google',
'spidername' => 'Google AdSense spider',
'spider' => true,
),

                array (
'agent' => 'Scooter',
'spidername' => 'Altavista spider',
'spider' => true,
                ),
array (
'agent' => 'Yahoo! Slurp',
'spidername' => 'Yahoo spider',
'spider' => true,),

                array (
'agent' => 'FAST-WebCrawler',
'spider' => true,
                ),
                array (
'agent' => 'Wget',
'spider' => true,
                ),
array (
'agent' => 'Ask Jeeves',
'spider' => true,

),
array (
'agent' => 'Speedy Spider',
'spider' => true,

),
array (
'agent' => 'SurveyBot',
'spider' => true,

),
array (
'agent' => 'IBM_Planetwide',
'spider' => true,

),

                array (
'agent' => 'GigaBot',
'spider' => true,

),
array (
'agent' => 'ia_archiver',
'spider' => true,
),
array (
'agent' => 'FAST-WebCrawler',
'spider' => true,

),
array (
'agent' => 'Inktomi Slurp',
                        'spider' => true,
),
               
                array (
'agent' => 'appie',
'spidername' => 'Walhello spider',
'spider' => true,
),


A to jest w moim:

$known_agents = array (

//////// Search Spiders ////////////

//The big cheese spiders detected by SMF by default Originating Site /Website Description Recently Seen ?
array(
'agent' => 'Googlebot',
'spidername' => 'Google spider',
'spider' => true,
),
// Google http://www.google.com The main spider used by Google Yes 05/08

array(
'agent' => 'Msn',
'spidername' => 'MsnBot',
'spider' => true,
),
// Msn http://www.msn.com The main spider used by MSN Yes 05/08

array(
'agent' => 'Yahoo!',
'spidername' => 'Slurp',
'spider' => true,
),
// Yahoo http://www.yahoo.com Worlds most aggressive spider Yes 05/08

// MAJOR spiders Originating Site Website Description Recently Seen ?
array(
'agent' => 'Ask',
'spidername' => 'Teoma',
'spider' => true,
),
// Ask.com http://www.ask.com Spider for Ask Search Engine Yes 05/08

array(
'agent' => 'Baidu',
'spidername' => 'Baiduspider',
'spider' => true,
),
// Baidu http://www.baidu.com Spider for Chinese search engine Yes 05/08

array(
'agent' => 'GigaBot',
'spidername' => 'Gigabot',
'spider' => true,
),
// Gigablast http://www.gigablast.com Another heavily travelled spider Yes 05/08

array(
'agent' => 'Google-AdSense',
'spidername' => 'Mediapartners-Google',
'spider' => true,
),
// Google http://www.google.com Spider related to Adsense/Adwords Yes 05/08

array(
'agent' => 'Google-Adwords',
'spidername' => 'AdsBot-Google',
'spider' => true,
),
// Google http://www.google.com Spider related to Adwords Yes 05/08

array(
'agent' => 'Google-SA',
'spidername' => 'gsa-crawler',
'spider' => true,
),
// Google http://www.google.com Google Search Appliance Spider Yes 05/08

array(
'agent' => 'Google-Image',
'spidername' => 'Googlebot-Image',
'spider' => true,
),
// Google http://www.google.com Spider for google image search Yes 05/08

array(
'agent' => 'InternetArchive',
'spidername' => 'ia_archiver-web.archive.org',
'spider' => true,
),
// Archive http://www.archive.org Way back When machine Spider Yes 05/08

array(
'agent' => 'Alexa',
'spidername' => 'ia_archiver',
'spider' => true,
),
// Alexa http://www.alexa.com *Must be detected after Internet Archive Yes 05/08

array(
'agent' => 'Omgili',
'spidername' => 'omgilibot',
'spider' => true,
),
// Omgili http://www.omgili.com Extremely aggressive Messageboard/forum Spider Yes 05/08

array(
'agent' => 'Speedy Spider',
'spidername' => 'Speedy Spider',
'spider' => true,
),
// EntireWeb http://www.entireweb.com Entire web spider Yes 05/08

array(
'agent' => 'Yahoo',
'spidername' => 'yahoo',
'spider' => true,
),
// Yahoo http://www.yahoo.com For Yahoo Publisher Network  (a variety in use) Yes 05/08

array(
'agent' => 'Yahoo JP',
'spidername' => 'Y!J',
'spider' => true,
),
// Yahoo http://www.yahoo.co.jp Spider for Yahoo Japan No

// Checkers/Testers/Robots Originating Site Website Description Recently Seen ?
array(
'agent' => 'DeadLinksChecker',
'spidername' => 'link validator',
'spider' => true,
),
// Dead-Links http://www.dead-links.com/ Checks your site for dead/bad links Yes 05/08

array(
'agent' => 'W3C Validator',
'spidername' => 'W3C_Validator',
'spider' => true,
),
// W3C http://validator.w3.org Checks standards validity of any html/xhtml page Yes 05/08

array(
'agent' => 'W3C CSSValidator',
'spidername' => 'W3C_CSS_Validator',
'spider' => true,
),
// W3C http://jigsaw.w3.org/css-validator/ Checks standards validity of css stylesheets Yes 05/08

array(
'agent' => 'W3C FeedValidator',
'spidername' => 'FeedValidator',
'spider' => true,
),
// W3C http://validator.w3.org/feed/ Checks standards validity of atom/rss feeds Yes 05/08

array(
'agent' => 'W3C LinkChecker',
'spidername' => 'W3C-checklink',
'spider' => true,
),
// W3C http://validator.w3.org/checklink Checks links on any html/xhtml page are valid Yes 05/08

array(
'agent' => 'W3C mobileOK',
'spidername' => 'W3C-mobileOK',
'spider' => true,
),
// W3C http://www.w3.org/2006/07/mobileok-ddc Checks page for how good it is for mobiles Yes 05/08

array(
'agent' => 'W3C P3PValidator',
'spidername' => 'P3P Validator',
'spider' => true,
),
// W3C http://www.w3.org/P3P/validator.html Checks something?? Yes 05/08

// Feed readers Originating Site Website Description Recently Seen ?
array(
'agent' => 'Bloglines',
'spidername' => 'Bloglines',
'spider' => true,
),
// Bloglines http://www.bloglines.com Spider for blog/rich web content (owned by Ask) Yes 05/08

array(
'agent' => 'Feedburner',
'spidername' => 'Feedburner',
'spider' => true,
),
// Feedburner http://www.feedburner.com Another RSS feed reader Yes 05/08

// Website Thumbnail/Snapshot/Thumbshot takers Originating Site /Website Description Recently Seen ?
array(
'agent' => 'SnapBot',
'spidername' => 'Snapbot',
'spider' => true,
),
// Snap http://www.snap.com Shapshots provider Yes 05/08

array(
'agent' => 'Picsearch',
'spidername' => 'psbot',
'spider' => true,
),
// Picsearch http://www.picsearch.com Picture/Image Search Engine Yes 05/08

array(
'agent' => 'Websnapr',
'spidername' => 'Websnapr',
'spider' => true,
),
// Websnapr http://www.websnapr.com Snapshot/site screenshot taker Yes 05/08

// More MINOR Spiders/Robots Originating Site Website Description Recently Seen ?
array(
'agent' => 'AllTheWeb',
'spidername' => 'FAST-WebCrawler',
'spider' => true,
),
// All The Web http://www.alltheweb.com Spider for alltheweb (now owned by Yahoo) No

array(
'agent' => 'Altavista',
'spidername' => 'Scooter',
'spider' => true,
),
// Altavista http://www.altavista.com Another Major Search Engine spider No

array(
'agent' => 'Asterias',
'spidername' => 'asterias',
'spider' => true,
),
// AOL http://www.aol.com Media Spider Yes 05/08

array(
'agent' => '192bot',
'spidername' => '192.comAgent',
'spider' => true,
),
// 192 http://www.192.com Spider to index for 192.com No

array(
'agent' => 'AbachoBot',
'spidername' => 'ABACHOBot',
'spider' => true,
),
// Abacho http://www.abacho.com Spider for multi language search engine/translator Yes 05/08

array(
'agent' => 'Abdcatos',
'spidername' => 'abcdatos',
'spider' => true,
),
// Abdcatos http://www.abcdatos.com/botlink/ Spider for Italian Search Engine Yes 05/08

array(
'agent' => 'Acoon',
'spidername' => 'Acoon',
'spider' => true,
),
// Acoon http://www.acoon.de Spider for small search engine Yes 05/08

array(
'agent' => 'Accoona',
'spidername' => 'Accoona',
'spider' => true,
),
// Accoona http://www.accoona.com Spider for Accoona Yes 05/08

array(
'agent' => 'BecomeBot',
'spidername' => 'BecomeBot',
'spider' => true,
),
// BecomeBot http://www.become.com Shopping/Products type search engine Yes 05/08

array(
'agent' => 'Daumoa',
'spidername' => 'Daumoa',
'spider' => true,
),
// Daum http://ws.daum.net/aboutkr.html South Korean Search Engine Spider Yes 05/08

array(
'agent' => 'Exabot',
'spidername' => 'Exabot',
'spider' => true,
),
// Exalead http://www.exalead.com Spider for small search engine Yes 05/08

array(
'agent' => 'Furl',
'spidername' => 'Furlbot',
'spider' => true,
),
// Furl http://www.furl.net Spider for Furl social bookmarking site Yes 05/08

array(
'agent' => 'FyperSpider',
'spidername' => 'FyberSpider',
'spider' => true,
),
// FyberSearch http://www.fybersearch.com Spider for Small Search Engine Yes 05/08

array(
'agent' => 'Geona',
'spidername' => 'GeonaBot',
'spider' => true,
),
// Geona http://www.geona.com Spider for another small search engine Yes 05/08

array(
'agent' => 'GirafaBot',
'spidername' => 'Girafabot',
'spider' => true,
),
// Girafabot http://www.girafa.com/ Thumbshot provider Yes 05/08

array(
'agent' => 'GoSeeBot',
'spidername' => 'GoSeeBot',
'spider' => true,
),
// GoSee http://www.gosee.com/bot.html Spider for small search engine Yes 05/08

array(
'agent' => 'Ichiro',
'spidername' => 'ichiro',
'spider' => true,
),
// Ichiro http://help.goo.ne.jp/door/crawler.html Spider for Japanese search engine Yes 05/08

array(
'agent' => 'LapozzBot',
'spidername' => 'LapozzBot',
'spider' => true,
),
// Lapozz http://www.lapozz.hu Spider for Hungarian search engine Yes 05/08

array(
'agent' => 'Looksmart',
'spidername' => 'WISENutbot',
'spider' => true,
),
// WISENutbot http://www.looksmart.com Spider related to advertising Yes 05/08

array(
'agent' => 'Lycos',
'spidername' => 'Lycos_Spider',
'spider' => true,
),
// Lycos http://www.lycos.com Spider for search engine No

array(
'agent' => 'Majestic12',
'spidername' => 'MJ12bot/v2',
'spider' => true,
),
// Majestic12 http://www.majestic12.co.uk/ Distributed Search Engine Project Yes 05/08

array(
'agent' => 'MLBot',
'spidername' => 'MLBot',
'spider' => true,
),
// MetaDataLabs http://www.metadatalabs.com/ Media indexing spider Yes 05/08

array(
'agent' => 'MSRBOT',
'spidername' => 'msrbot',
'spider' => true,
),
// Microsoft Research http://research.microsoft.com/research/sv/msrbot/  Microsoft Research bot No

array(
'agent' => 'Naver',
'spidername' => 'NaverBot',
'spider' => true,
),
// Naver http://www.naver.com South Korean Search Engine Spider Yes 05/08

array(
'agent' => 'Naver',
'spidername' => 'Yeti',
'spider' => true,
),
// Naver http://www.naver.com Another NaverBot for the South Korean Search Engine Yes 05/08

array(
'agent' => 'NoxTrumBot',
'spidername' => 'noxtrumbot',
'spider' => true,
),
// Noxtrum http://www.noxtrum.com Spider for Spanish search engine Yes 05/08

array(
'agent' => 'OmniExplorer',
'spidername' => 'OmniExplorer_Bot',
'spider' => true,
),
// Omniexplorer http://www.omni-explorer.com/ Spider No

array(
'agent' => 'OnetSzukaj',
'spidername' => 'OnetSzukaj',
'spider' => true,
),
// Onet http://szukaj.onet.pl Polish Search Engine Spider Yes 05/08

array(
'agent' => 'ScrubTheWeb',
'spidername' => 'Scrubby',
'spider' => true,
),
// ScrubTheWeb http://www.scrubtheweb.com Spider for Scrub the web Yes 05/08

array(
'agent' => 'SearchSight',
'spidername' => 'SearchSight',
'spider' => true,
),
// Searchsight http://www.searchsite.com Another search engine Yes 05/08

array(
'agent' => 'Seeqpod',
'spidername' => 'Seeqpod',
'spider' => true,
),
// Seeqpod http://www.seeqpod.com Spider for search engine (the google for mp3 files) Yes 05/08

array(
'agent' => 'Shablast',
'spidername' => 'ShablastBot',
'spider' => true,
),
// ShaBlast http://www.shablast.com Spider for a small search engine Yes 05/08

array(
'agent' => 'SitiDiBot',
'spidername' => 'SitiDiBot',
'spider' => true,
),
// Sitidi http://www.sitidi.net Spider for italian Sitidi search engine Yes 05/08

array(
'agent' => 'Slider',
'spidername' => 'silk/1.0',
'spider' => true,
),
// Slider http://www.slider.com Spider for Slider, but it only spiders DMOZ entries Yes 05/08

array(
'agent' => 'Sogou',
'spidername' => 'Sogou',
'spider' => true,
),
//Sogou http://www.sogou.com Spider for Chinese search engine Yes 05/08

array(
'agent' => 'StackRambler',
'spidername' => 'StackRambler',
'spider' => true,
),
// StackRambler http://www.rambler.ru/doc/robots.shtml Spider for Russian portal/search engine  Yes 05/08

array(
'agent' => 'SurveyBot',
'spidername' => 'SurveyBot',
'spider' => true,
),
// Domaintools http://www.domaintools.com Probe for website statistics (WhoIs  Source) Yes 05/08

array(
'agent' => 'Walhello',
'spidername' => 'appie',
'spider' => true,
),
// Wahello http://www.wahello.com/ Spider for wahello No

array(
'agent' => 'WebAlta',
'spidername' => 'WebAlta',
'spider' => true,
),
// WebAlta http://www.webalta.net Russian Search Engine Yes 05/08

array(
'agent' => 'YacyBot',
'spidername' => 'yacybot',
'spider' => true,
),
// Yacy http://www.yacy.com Crawler for distributed search engine Yes 05/08

array(
'agent' => 'YodaoBot',
'spidername' => 'YodaoBot',
'spider' => true,
),
// Yodao http://www.yodao.com Spider for Chinese Search Engine Yes 05/08

// Google-Wanna-Be's - Spiders/Robots for Startups Originating Site /Website Description Recently Seen ?
array(
'agent' => 'Charlotte',
'spidername' => 'Charlotte',
'spider' => true,
),
// SearchMe (Beta) http://www.searchme.com/support/ Spider for new search engine (in beta)   Yes 05/08

array(
'agent' => 'DiscoBot',
'spidername' => 'DiscoBot',
'spider' => true,
),
// DiscoveryEngine http://discoveryengine.com/discobot.html Spider for new search engine startup Yes 05/08

array(
'agent' => 'EnaBot',
'spidername' => 'EnaBot',
'spider' => true,
),
// EnaBall http://www.enaball.com/crawler.html Experimental new spider Yes 05/08

array(
'agent' => 'Gaisbot',
'spidername' => 'Gaisbot',
'spider' => true,
),
// Gaisbot http://gais.cs.ccu.edu.tw/robot.php Spider for search engine startup Yes 05/08

array(
'agent' => 'Kalooga',
'spidername' => 'kalooga',
'spider' => true,
),
// Kalooga http://www.kalooga.com Spider for new media search engine (in beta) Yes 05/08

array(
'agent' => 'ScoutJet',
'spidername' => 'ScoutJet',
'spider' => true,
),
// ScoutJet http://www.scoutjet.com/ Spider for new search engine (by the DMOZ founders) Yes 05/08

array(
'agent' => 'TinEye',
'spidername' => 'TinEye',
'spider' => true,
),
// TinEye http://tineye.com/crawler.html Spider for search engine startup Yes 05/08

array(
'agent' => 'Twiceler',
'spidername' => 'twiceler',
'spider' => true,
),
// Culli http://www.cuill.com/twiceler/robot.html Experimental Spider, (aggressive) Yes 05/08

// Software Originating Site Website Description Recently Seen ?
array(
'agent' => 'GSiteCrawler',
'spidername' => 'GSiteCrawler',
'spider' => true,
),
// GSiteCrawler http://www.gsitecrawler.com/ Windows Based Sitemap Generator Software Yes 05/08

array(
'agent' => 'HTTrack',
'spidername' => 'HTTrack',
'spider' => true,
),
// HTrack http://www.httrack.com HTTrack Website Copier - Offline Browser Yes 05/08

array(
'agent' => 'Wget',
'spidername' => 'Wget',
'spider' => true,
),
// WGet Software http://www.gnu.org/software/wget/ GNU software to retrieve files Yes 05/08
// Reason for detecting these: They can be very intensive. So seeing them in use, enables you to block if necessary.

//Rest of spiders:
array (
'agent' => 'Openbot',
'spidername' => 'Openfind spider',
'spider' => true,
),

array (
'agent' => 'Ask Jeeves',
'spidername' => 'Ask Jeeves',
'spider' => true,
),

array (
'agent' => 'IBM_Planetwide',
'spidername' => 'IBM_Planetwide',
'spider' => true,
),

array (
'agent' => 'Inktomi Slurp',
'spidername' => 'Inktomi Slurp',
'spider' => true,
),

array (
'agent' => 'Feedfetcher-Google',
'spidername' => 'Feedfetcher-Google',
'spider' => true,
),

array (
'agent' => 'http://www.relevantnoise.com',
'spidername' => 'relevantNOISE',
'spider' => true,
),

array (
'agent' => 'NewsGatorOnline/2.0',
'spidername' => 'NewsGatorOnline',
'spider' => true,
),

array (
'agent' => 'ping.blo.gs/2.0',
'spidername' => 'ping.blo.gs',
'spider' => true,
),

array (
'agent' => 'Jakarta Commons-HttpClient/3.0.1',
'spidername' => 'Amazon',
'spider' => true,
),

array (
'agent' => 'Jakarta Commons-HttpClient/3.0-rc2',
'spidername' => 'Amazon',
'spider' => true,
),

array (
'agent' => 'www.fi crawler',
'spidername' => 'www.fi spider',
'spider' => true,
),



/////////// phones ////////////


..pewnie różnica żadna.., ale może byś policzył.., bo prawde powiedziawszy, to policzylem tylko z grubsza..
no ale za to, do tego pliku dopisałem kontroki do Skype, Tlena i GG, to jakby kto miał tego moda, to kiedy będie ogladał tę listę, a będą zalogowani domownicy, forum, to będzie widać, jak na mojej focie..


No dobra, wystarczy zabawy, teraz trochę pracy.., Uwaga, to nie dla idiotów, i tych co nie posiadaja zainstalowanego moda, o ktorym mowa w tym topie!

Edytujemy plik Subs.php z katalogu -> souces/

O ile ktos ma tego moda to powinien gdzieś na końcu pliku mieć wisy od tego moda, ale te gdzie już tylko tablice (Aray) ze spider'ami..

Czyli szukamy tego miejsca:

//Function to check if the user-agent provided belongs to a spider. Based on getAgent function made by Owdy.
function ob_googlebot_getAgent(&$user_agent, &$spider_name, &$result)
{
$known_spiders = array (
//Search Spiders
array (
'agent' => 'WISENutbot',
'spidername' => 'Looksmart spider',
),
array (
'agent' => 'MSNBot',

tu wycinam środek z tablicami aray..

i kończy się ten fragment tak:

...   'spidername' => 'Amazon',
        ),
          array (
            'agent' => 'Jakarta Commons-HttpClient/3.0-rc2',
            'spidername' => 'Amazon',
        ),
);

foreach($known_spiders AS $poss)



Pokazałem początek i koniec tego wpisu.. w środku <nuda> gługo dlugo to samo..

Wycinamy wszystkie tam tablice (aray) i wklejamy te ode mnie, czyli powinno wygladać wszystko razem tak:

//Function to check if the user-agent provided belongs to a spider. Based on getAgent function made by Owdy.
function ob_googlebot_getAgent(&$user_agent, &$spider_name, &$result)
{
$known_spiders = array (
//Search Spiders
//The big cheese spiders detected by SMF by default Originating Site /Website Description Recently Seen ?
array(
'agent' => 'Google',
'spidername' => 'Googlebot',
),
// Google http://www.google.com The main spider used by Google Yes 05/08

array(
'agent' => 'Msn',
'spidername' => 'MsnBot',
),
// Msn http://www.msn.com The main spider used by MSN Yes 05/08

array(
'agent' => 'Yahoo!',
'spidername' => 'Slurp',
),
// Yahoo http://www.yahoo.com Worlds most aggressive spider Yes 05/08

// MAJOR spiders Originating Site Website Description Recently Seen ?
array(
'agent' => 'Ask',
'spidername' => 'Teoma',
),
// Ask.com http://www.ask.com Spider for Ask Search Engine Yes 05/08

array(
'agent' => 'Baidu',
'spidername' => 'Baiduspider',
),
// Baidu http://www.baidu.com Spider for Chinese search engine Yes 05/08

array(
'agent' => 'GigaBot',
'spidername' => 'Gigabot',
),
// Gigablast http://www.gigablast.com Another heavily travelled spider Yes 05/08

array(
'agent' => 'Google-AdSense',
'spidername' => 'Mediapartners-Google',
),
// Google http://www.google.com Spider related to Adsense/Adwords Yes 05/08

array(
'agent' => 'Google-Adwords',
'spidername' => 'AdsBot-Google',
),
// Google http://www.google.com Spider related to Adwords Yes 05/08

array(
'agent' => 'Google-SA',
'spidername' => 'gsa-crawler',
),
// Google http://www.google.com Google Search Appliance Spider Yes 05/08

array(
'agent' => 'Google-Image',
'spidername' => 'Googlebot-Image',
),
// Google http://www.google.com Spider for google image search Yes 05/08

array(
'agent' => 'InternetArchive',
'spidername' => 'ia_archiver-web.archive.org',
),
// Archive http://www.archive.org Way back When machine Spider Yes 05/08

array(
'agent' => 'Alexa',
'spidername' => 'ia_archiver',
),
// Alexa http://www.alexa.com *Must be detected after Internet Archive Yes 05/08

array(
'agent' => 'Omgili',
'spidername' => 'omgilibot',
),
// Omgili http://www.omgili.com Extremely aggressive Messageboard/forum Spider Yes 05/08

array(
'agent' => 'Speedy Spider',
'spidername' => 'Speedy Spider',
),
// EntireWeb http://www.entireweb.com Entire web spider Yes 05/08

array(
'agent' => 'Yahoo',
'spidername' => 'yahoo',
),
// Yahoo http://www.yahoo.com For Yahoo Publisher Network  (a variety in use) Yes 05/08

array(
'agent' => 'Yahoo JP',
'spidername' => 'Y!J',
),
// Yahoo http://www.yahoo.co.jp Spider for Yahoo Japan No

// Checkers/Testers/Robots Originating Site Website Description Recently Seen ?
array(
'agent' => 'DeadLinksChecker',
'spidername' => 'link validator',
),
// Dead-Links http://www.dead-links.com/ Checks your site for dead/bad links Yes 05/08

array(
'agent' => 'W3C Validator',
'spidername' => 'W3C_Validator',
),
// W3C http://validator.w3.org Checks standards validity of any html/xhtml page Yes 05/08

array(
'agent' => 'W3C CSSValidator',
'spidername' => 'W3C_CSS_Validator',
),
// W3C http://jigsaw.w3.org/css-validator/ Checks standards validity of css stylesheets Yes 05/08

array(
'agent' => 'W3C FeedValidator',
'spidername' => 'FeedValidator',
),
// W3C http://validator.w3.org/feed/ Checks standards validity of atom/rss feeds Yes 05/08

array(
'agent' => 'W3C LinkChecker',
'spidername' => 'W3C-checklink',
),
// W3C http://validator.w3.org/checklink Checks links on any html/xhtml page are valid Yes 05/08

array(
'agent' => 'W3C mobileOK',
'spidername' => 'W3C-mobileOK',
),
// W3C http://www.w3.org/2006/07/mobileok-ddc Checks page for how good it is for mobiles Yes 05/08

array(
'agent' => 'W3C P3PValidator',
'spidername' => 'P3P Validator',
),
// W3C http://www.w3.org/P3P/validator.html Checks something?? Yes 05/08

// Feed readers Originating Site Website Description Recently Seen ?
array(
'agent' => 'Bloglines',
'spidername' => 'Bloglines',
),
// Bloglines http://www.bloglines.com Spider for blog/rich web content (owned by Ask) Yes 05/08

array(
'agent' => 'Feedburner',
'spidername' => 'Feedburner',
),
// Feedburner http://www.feedburner.com Another RSS feed reader Yes 05/08

// Website Thumbnail/Snapshot/Thumbshot takers Originating Site /Website Description Recently Seen ?
array(
'agent' => 'SnapBot',
'spidername' => 'Snapbot',
),
// Snap http://www.snap.com Shapshots provider Yes 05/08

array(
'agent' => 'Picsearch',
'spidername' => 'psbot',
),
// Picsearch http://www.picsearch.com Picture/Image Search Engine Yes 05/08

array(
'agent' => 'Websnapr',
'spidername' => 'Websnapr',
),
// Websnapr http://www.websnapr.com Snapshot/site screenshot taker Yes 05/08

// More MINOR Spiders/Robots Originating Site Website Description Recently Seen ?
array(
'agent' => 'AllTheWeb',
'spidername' => 'FAST-WebCrawler',
),
// All The Web http://www.alltheweb.com Spider for alltheweb (now owned by Yahoo) No

array(
'agent' => 'Altavista',
'spidername' => 'Scooter',
),
// Altavista http://www.altavista.com Another Major Search Engine spider No

array(
'agent' => 'Asterias',
'spidername' => 'asterias',
),
// AOL http://www.aol.com Media Spider Yes 05/08

array(
'agent' => '192bot',
'spidername' => '192.comAgent',
),
// 192 http://www.192.com Spider to index for 192.com No

array(
'agent' => 'AbachoBot',
'spidername' => 'ABACHOBot',
),
// Abacho http://www.abacho.com Spider for multi language search engine/translator Yes 05/08

array(
'agent' => 'Abdcatos',
'spidername' => 'abcdatos',
),
// Abdcatos http://www.abcdatos.com/botlink/ Spider for Italian Search Engine Yes 05/08

array(
'agent' => 'Acoon',
'spidername' => 'Acoon',
),
// Acoon http://www.acoon.de Spider for small search engine Yes 05/08

array(
'agent' => 'Accoona',
'spidername' => 'Accoona',
),
// Accoona http://www.accoona.com Spider for Accoona Yes 05/08

array(
'agent' => 'BecomeBot',
'spidername' => 'BecomeBot',
),
// BecomeBot http://www.become.com Shopping/Products type search engine Yes 05/08

array(
'agent' => 'Daumoa',
'spidername' => 'Daumoa',
),
// Daum http://ws.daum.net/aboutkr.html South Korean Search Engine Spider Yes 05/08

array(
'agent' => 'Exabot',
'spidername' => 'Exabot',
),
// Exalead http://www.exalead.com Spider for small search engine Yes 05/08

array(
'agent' => 'Furl',
'spidername' => 'Furlbot',
),
// Furl http://www.furl.net Spider for Furl social bookmarking site Yes 05/08

array(
'agent' => 'FyperSpider',
'spidername' => 'FyberSpider',
),
// FyberSearch http://www.fybersearch.com Spider for Small Search Engine Yes 05/08

array(
'agent' => 'Geona',
'spidername' => 'GeonaBot',
),
// Geona http://www.geona.com Spider for another small search engine Yes 05/08

array(
'agent' => 'GirafaBot',
'spidername' => 'Girafabot',
),
// Girafabot http://www.girafa.com/ Thumbshot provider Yes 05/08

array(
'agent' => 'GoSeeBot',
'spidername' => 'GoSeeBot',
),
// GoSee http://www.gosee.com/bot.html Spider for small search engine Yes 05/08

array(
'agent' => 'Ichiro',
'spidername' => 'ichiro',
),
// Ichiro http://help.goo.ne.jp/door/crawler.html Spider for Japanese search engine Yes 05/08

array(
'agent' => 'LapozzBot',
'spidername' => 'LapozzBot',
),
// Lapozz http://www.lapozz.hu Spider for Hungarian search engine Yes 05/08

array(
'agent' => 'Looksmart',
'spidername' => 'WISENutbot',
),
// WISENutbot http://www.looksmart.com Spider related to advertising Yes 05/08

array(
'agent' => 'Lycos',
'spidername' => 'Lycos_Spider',
),
// Lycos http://www.lycos.com Spider for search engine No

array(
'agent' => 'Majestic12',
'spidername' => 'MJ12bot/v2',
),
// Majestic12 http://www.majestic12.co.uk/ Distributed Search Engine Project Yes 05/08

array(
'agent' => 'MLBot',
'spidername' => 'MLBot',
),
// MetaDataLabs http://www.metadatalabs.com/ Media indexing spider Yes 05/08

array(
'agent' => 'MSRBOT',
'spidername' => 'msrbot',
),
// Microsoft Research http://research.microsoft.com/research/sv/msrbot/  Microsoft Research bot No

array(
'agent' => 'Naver',
'spidername' => 'NaverBot',
),
// Naver http://www.naver.com South Korean Search Engine Spider Yes 05/08

array(
'agent' => 'Naver',
'spidername' => 'Yeti',
),
// Naver http://www.naver.com Another NaverBot for the South Korean Search Engine Yes 05/08

array(
'agent' => 'NoxTrumBot',
'spidername' => 'noxtrumbot',
),
// Noxtrum http://www.noxtrum.com Spider for Spanish search engine Yes 05/08

array(
'agent' => 'OmniExplorer',
'spidername' => 'OmniExplorer_Bot',
),
// Omniexplorer http://www.omni-explorer.com/ Spider No

array(
'agent' => 'OnetSzukaj',
'spidername' => 'OnetSzukaj',
),
// Onet http://szukaj.onet.pl Polish Search Engine Spider Yes 05/08

array(
'agent' => 'ScrubTheWeb',
'spidername' => 'Scrubby',
),
// ScrubTheWeb http://www.scrubtheweb.com Spider for Scrub the web Yes 05/08

array(
'agent' => 'SearchSight',
'spidername' => 'SearchSight',
),
// Searchsight http://www.searchsite.com Another search engine Yes 05/08

array(
'agent' => 'Seeqpod',
'spidername' => 'Seeqpod',
),
// Seeqpod http://www.seeqpod.com Spider for search engine (the google for mp3 files) Yes 05/08

array(
'agent' => 'Shablast',
'spidername' => 'ShablastBot',
),
// ShaBlast http://www.shablast.com Spider for a small search engine Yes 05/08

array(
'agent' => 'SitiDiBot',
'spidername' => 'SitiDiBot',
),
// Sitidi http://www.sitidi.net Spider for italian Sitidi search engine Yes 05/08

array(
'agent' => 'Slider',
'spidername' => 'silk/1.0',
),
// Slider http://www.slider.com Spider for Slider, but it only spiders DMOZ entries Yes 05/08

array(
'agent' => 'Sogou',
'spidername' => 'Sogou',
),
//Sogou http://www.sogou.com Spider for Chinese search engine Yes 05/08

array(
'agent' => 'StackRambler',
'spidername' => 'StackRambler',
),
// StackRambler http://www.rambler.ru/doc/robots.shtml Spider for Russian portal/search engine  Yes 05/08

array(
'agent' => 'SurveyBot',
'spidername' => 'SurveyBot',
),
// Domaintools http://www.domaintools.com Probe for website statistics (WhoIs  Source) Yes 05/08

array(
'agent' => 'Walhello',
'spidername' => 'appie',
),
// Wahello http://www.wahello.com/ Spider for wahello No

array(
'agent' => 'WebAlta',
'spidername' => 'WebAlta',
),
// WebAlta http://www.webalta.net Russian Search Engine Yes 05/08

array(
'agent' => 'YacyBot',
'spidername' => 'yacybot',
),
// Yacy http://www.yacy.com Crawler for distributed search engine Yes 05/08

array(
'agent' => 'YodaoBot',
'spidername' => 'YodaoBot',
),
// Yodao http://www.yodao.com Spider for Chinese Search Engine Yes 05/08

// Google-Wanna-Be's - Spiders/Robots for Startups Originating Site /Website Description Recently Seen ?
array(
'agent' => 'Charlotte',
'spidername' => 'Charlotte',
),
// SearchMe (Beta) http://www.searchme.com/support/ Spider for new search engine (in beta)   Yes 05/08

array(
'agent' => 'DiscoBot',
'spidername' => 'DiscoBot',
),
// DiscoveryEngine http://discoveryengine.com/discobot.html Spider for new search engine startup Yes 05/08

array(
'agent' => 'EnaBot',
'spidername' => 'EnaBot',
),
// EnaBall http://www.enaball.com/crawler.html Experimental new spider Yes 05/08

array(
'agent' => 'Gaisbot',
'spidername' => 'Gaisbot',
),
// Gaisbot http://gais.cs.ccu.edu.tw/robot.php Spider for search engine startup Yes 05/08

array(
'agent' => 'Kalooga',
'spidername' => 'kalooga',
),
// Kalooga http://www.kalooga.com Spider for new media search engine (in beta) Yes 05/08

array(
'agent' => 'ScoutJet',
'spidername' => 'ScoutJet',
),
// ScoutJet http://www.scoutjet.com/ Spider for new search engine (by the DMOZ founders) Yes 05/08

array(
'agent' => 'TinEye',
'spidername' => 'TinEye',
),
// TinEye http://tineye.com/crawler.html Spider for search engine startup Yes 05/08

array(
'agent' => 'Twiceler',
'spidername' => 'twiceler',
),
// Culli http://www.cuill.com/twiceler/robot.html Experimental Spider, (aggressive) Yes 05/08

// Software Originating Site Website Description Recently Seen ?
array(
'agent' => 'GSiteCrawler',
'spidername' => 'GSiteCrawler',
),
// GSiteCrawler http://www.gsitecrawler.com/ Windows Based Sitemap Generator Software Yes 05/08

array(
'agent' => 'HTTrack',
'spidername' => 'HTTrack',
),
// HTrack http://www.httrack.com HTTrack Website Copier - Offline Browser Yes 05/08

array(
'agent' => 'Wget',
'spidername' => 'Wget',
),
// WGet Software http://www.gnu.org/software/wget/ GNU software to retrieve files Yes 05/08

// Reason for detecting these: They can be very intensive. So seeing them in use, enables you to block if necessary.

//Rest of spiders:
array (
'agent' => 'Openbot',
'spidername' => 'Openfind spider',
),

array (
'agent' => 'Ask Jeeves',
'spidername' => 'Ask Jeeves',
),

array (
'agent' => 'IBM_Planetwide',
'spidername' => 'IBM_Planetwide',
),

array (
'agent' => 'Inktomi Slurp',
'spidername' => 'Inktomi Slurp',
),

array (
'agent' => 'Feedfetcher-Google',
'spidername' => 'Feedfetcher-Google',
),

array (
'agent' => 'http://www.relevantnoise.com',
'spidername' => 'relevantNOISE',
),

array (
'agent' => 'NewsGatorOnline/2.0',
'spidername' => 'NewsGatorOnline',
),

array (
'agent' => 'ping.blo.gs/2.0',
'spidername' => 'ping.blo.gs',
),

array (
'agent' => 'Jakarta Commons-HttpClient/3.0.1',
'spidername' => 'Amazon',
),

array (
'agent' => 'Jakarta Commons-HttpClient/3.0-rc2',
'spidername' => 'Amazon',
),

array (
'agent' => 'www.fi crawler',
'spidername' => 'www.fi spider',
),

);

foreach($known_spiders AS $poss)


Wklejcie porządnie i na zdrowie!

Uwaga na stringi!

Pozdrawiam :)
roco

roco

Sorry, jedno mi umknęło..:

Cytatwięc myślę ze te 90... to trochę przesadyzm ;]

Akurat w tym przypadku? To odpowiedz sobie na pytanie: Po co ludzie instalują Googlebot&Spiders ?

- ..moim zdaniem po to, żeby wiedzieć jakie pająki ich odwiedzają.. Wielu Adminów chce naprawdę realnie wiedzieć kto ich odwiedza.. A info, ze z pośród gości sa i roboty, jest cenne, dla kogoś, kto chce realnych statystyk.. I chyba lepiej mieć te 90 niż jakieś 20~ czy ileś tam.. Bo np. zaglądasz na listę "Who" i widzisz, że odwiedza Cię iluś gości.., a kiedy masz pełniejsze info to np. widzisz, że teraz jest mniej faktycznie gości, a więcej robotów, np. ten durny z onetu co potrafi się tak zaplatać, ze nawet parę giga transferu zeżre.. to prawda! ja je głównie blokuje..

Tak wiec w tym przypadku, im więcej tym precyzyjniej i lepiej. W dodatku te wpisy nie są w bazie, tylko w pliku, nie powodując dodatkowych zapytań, jak to ma miejsce przy smf2.0..

Więc reasumując, to jest na pewno ważne dla tych, którzy naprawdę wiedzieli po co zainstalowali tego moda..
Inaczej? to tak jakby kupić sobie piękną spluwę, tylko wzgardzić nabojami.. hehe będąc na naprawdę dzikim zachodzie, jak ma to teraz miejsce w netświecie..

Pozdrawiam :)
roco

Anette

Mnie jest Bardzo pomocna ta funkcja..wchodze patrze 10 gości klik i widze co jest popularne :)

cieplutki

całkiem całkiem ... u mnie to wygląda tak

dzięki roco plusik leci

Tomase

#21
Czy ten plik * Who.template będzie działał też na wersji smf 2.0 beta 4 ?

Już sam sprawdziłem, działa.