Skip to search.

Breaking News Visit Yahoo! News for the latest.

×Close this window

bixo-dev · Bixo Web Mining Toolkit

The Yahoo! Groups Product Blog

Check it out!

Group Information

  • Members: 113
  • Category: Open Source
  • Founded: May 17, 2009
  • Language: English
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Hear how Yahoo! Groups has changed the lives of others. Take me there.

Messages

Advanced
Messages Help
Messages 1286 - 1315 of 1315   Oldest  |  < Older  |  Newer >  |  Newest
Messages: Simplify | Expand Author Sort by Date ^
1286 Rehan Malek
rehan_malek75 Send Email
Jan 9, 2013
8:51 am
Thanks alot ken.! hadoop/mahout is required to view the contents of all the folders like content,html,parse and status  folder .? Thanks alot ken.!...
1287 Rehan Malek
rehan_malek75 Send Email
Jan 9, 2013
8:57 am
Thanks alot ken.! could you please tell me that hadoop/mahout is required to view the contents of all the folders like content,html,parse and status  folder...
1288 Nilay Upadhyay
nilayupadhyay17 Send Email
Jan 9, 2013
12:19 pm
 hello Chris..! ThankYou  for  your great response.  i am telling you what i have got it . please correct me if  i am wrong ,. here some of my questions. ...
1289 Ken Krugler
kkrugler Send Email
Jan 9, 2013
3:06 pm
... No - those are Hadoop SequenceFiles. They are binary, so there's no easy way to view them directly. You could switch to using a text format - see previous...
1290 Chris Schneider
schmed2000 Send Email
Jan 9, 2013
6:50 pm
Hi Nilay, I have attempted to answer your latest questions below. However, you must ultimately be responsible for reading and understanding the Bixo...
1291 Rehan Malek
rehan_malek75 Send Email
Jan 10, 2013
9:21 am
Hi all, i sincerely request  to give the detailed  answer  of this because its one of the most important part for all  the bixo developers community when...
1292 Vivek Magotra
vmagotra Send Email
Jan 10, 2013
1:26 pm
Hi Rehan, For a beginner, the first thing I would suggest is to get familiar with Cascading (http://www.cascading.org). Currently Bixo uses the 1.2.x release....
1293 Vivek Magotra
vmagotra Send Email
Jan 10, 2013
1:26 pm
Hi Rehan, For a beginner, the first thing I would suggest is to get familiar with Cascading (http://www.cascading.org). Currently Bixo uses the 1.2.x release. ...
1294 Nilay Upadhyay
nilayupadhyay17 Send Email
Jan 10, 2013
1:56 pm
Hello Chris.! thankyou so much  from bottom of my heart  for giving your valuable time. there are just few questions I have question about -url  argument ...
1295 Chris Schneider
schmed2000 Send Email
Jan 10, 2013
3:02 pm
Hi Nilay, ... I have tried to answer your latest set of questions below. I have now exhausted the time I have available to help you, at least until you are...
1296 Rehan Malek
rehan_malek75 Send Email
Jan 11, 2013
8:49 am
hi vivek :) Thankyou for your quick response. just gone through the cascading documents. but what should be done to get only urls for all fetched pages. and i...
1297 Vivek Magotra
vmagotra Send Email
Jan 13, 2013
7:44 pm
Hi Rehan, On Jan 11, 2013, at 5:49 PM, Rehan Malek <rehan_malek75@...> wrote: [snip] ... The status pipe (FetchPipe.getStatusTailPipe()) has the status...
1298 Rehan Malek
rehan_malek75 Send Email
Jan 17, 2013
1:11 pm
Thankyou vivek.! i am still unable to get all the urls associated with fetched pages. could you please provide cascading workflow for getting all urls.....?? ...
1299 rehan_malek75 Send Email Jan 18, 2013
8:05 am
Hi all, How to modify Democrawlworkflow to get all the urls of all fetched pages please explain it in Detail ....
1300 Vivek Magotra
vmagotra Send Email
Jan 19, 2013
2:11 am
Hi Rehan, ... To get all the urls of the fetched pages for the current loop here's what I would do : In the createFlow() method, after you get the statusPipe,...
1301 rehan_malek75 Send Email Jan 21, 2013
8:33 am
Thanks for giving response. and i am working on this...
1302 rehan_malek75 Send Email Jan 21, 2013
8:33 am
hi all, i am currently facing problem with status sub-folder inside output directory. i am unable to view the status sub-folder. as such by default its...
1303 Pat Ferrel
sillyaliases... Send Email
Jan 24, 2013
4:01 am
I think Vivek added all of the NotSoSimpleCrawlTool to Bixo's DemoCrawlTool. It produces the same hadoop sequence file in each loop dir. I wrote another...
1304 Ken Krugler
kkrugler Send Email
Feb 3, 2013
3:05 am
Hi all, Just a heads-up that Lewis McGibbney has just released 0.2 of the crawler-commons library. The next release of Bixo will use this jar, since it...
1305 markatasu Send Email Feb 21, 2013
5:53 am
Hi Everyone, I'm working with an early-stage well funded stealth mode start-up in the big data analytics space – creating a unified platform that collects,...
1306 jeffjeffrsn Apr 2, 2013
12:58 pm
Hi, In the DemoCrawlTool I added a new Pipe to the tail of the parsePipe. In it i use the parsed content and the url. Now i also need the original...
1307 Chris Schneider
schmed2000 Send Email
Apr 2, 2013
2:12 pm
Hi Jeff, I am not sure what you meant when you wrote "added a new Pipe to the tail of the parsePipe". If you did add a tail pipe containing only...
1308 jeffjeffrsn Apr 2, 2013
5:02 pm
Hi Chris, Thanks for the answer. Now I'm subclassing the baseparser. Thanks, - Jeff...
1309 jeffjeffrsn Apr 2, 2013
5:22 pm
Hi Eeryone, I noticed, that the democrawler stays at one domain. ... I've got the domain example.com. At this domain there are outlinks to test.example.com,...
1310 Ken Krugler
kkrugler Send Email
Apr 4, 2013
11:36 pm
Hi Vivek, I was looking at the DemoCrawlWorkflow source, and noticed this snippet: Pipe urlFromOutlinksPipe = new Pipe("url from outlinks",...
1311 Ken Krugler
kkrugler Send Email
Apr 4, 2013
11:41 pm
Hi Jeff, ... By default if you provide a -domain parameter, then URL filtering is set up such that only URLs for that domain are accepted (all other URLs are...
1312 Pat Ferrel
sillyaliases... Send Email
May 9, 2013
5:31 pm
It's been awhile since I did a new build of bixo. For some reason, though I haven't changed the code, I'm getting all sorts of test errors. I was getting an...
1313 Pat Ferrel
reallyreally... Send Email
May 9, 2013
6:00 pm
Hmm, comment out the test and it completes without errors. Maybe openDNS is the problem? On May 9, 2013, at 10:31 AM, Pat Ferrel <pat.ferrel@...> wrote: ...
1314 Pat Ferrel
reallyreally... Send Email
May 17, 2013
2:00 pm
Hi guys, I'm back to crawling Pinterest to update my experimental recommender. I created a merged miner/crawler, which was working fine if slowly. I added an...
1315 Ken Krugler
kkrugler Send Email
May 17, 2013
10:00 pm
Hi Pat, ... It will try to fetch every URL, but it will only make one HttpClient request for each URL. HttpClient will retry multiple times, and if the server...
Messages 1286 - 1315 of 1315   Oldest  |  < Older  |  Newer >  |  Newest
Add to My Yahoo!      XML What's This?

Copyright © 2010 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines NEW - Help