Skip to search.

Breaking News Visit Yahoo! News for the latest.

×Close this window

bixo-dev · Bixo Web Mining Toolkit

The Yahoo! Groups Product Blog

Check it out!

Group Information

  • Members: 113
  • Category: Open Source
  • Founded: May 17, 2009
  • Language: English
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Message search is now enhanced, find messages faster. Take it for a spin.

Messages

Advanced
Messages Help
  Newest  |  < Newer  |  Older >  |  Oldest
Topics Messages Latest Post

Hi Otis, ... Well, it's a toolkit - so hooking it up for a wide/big crawl means writing some code, but there's nothing in the Bixo architecture (after a few...
15 Mar 16, 2011
10:14 am

ivan.panachev
Send Email

Hello again, In the last few weeks, I have been thoroughly reading the documentation of Hadoop, Cascading and Bixo. I've also set up a small cluster in Nutch...
4 Mar 9, 2011
12:34 am

Chris Schneider
schmed2000
Send Email

... - Yes I know that; there are even some tools available which make this possible; it’s called “CSS Sprites”...
3 Mar 6, 2011
3:34 am

Fuad Efendi
fouad_efendi
Send Email

I just posted this to the Nutch list, but it seems likely to be of interest to Bixo users as well... -- Ken ...
1 Mar 4, 2011
7:32 pm

Ken Krugler
kkrugler
Send Email

I saw this post on the Nutch list, and thought that we should verify our handling of URLs (or rather, what Tika does) is correct in regards to resolving...
1 Mar 4, 2011
7:31 pm

Ken Krugler
kkrugler
Send Email

I am highly impressed with this product. I downloaded and installed the same in my system . I am able to run the Simple Crawler as documented in the Getting...
6 Mar 1, 2011
8:33 am

Natarajan
natarajansr_mdu
Send Email

Mike Bowles, PhD and Patricia Hoffman, PhD are teaching a Machine Learning Class. The class will begin at the level of elementary probability and statistics...
1 Feb 18, 2011
8:00 pm

hoffmantriciaphd
hoffmantrici...
Send Email

Hi, I am creating a parser as follows: HashSet<String&gt; tagNames = new HashSet<String&gt;(); tagNames.add("a"); tagNames.add("img"); HashSet<String&gt; attributes =...
3 Feb 15, 2011
11:40 pm

hrsht.rastogi
Send Email

Hi, I made modification to SimpleCrawlTool and able to extract the url of the images in the 1st iteration , now i want to download the images , currently i...
2 Feb 9, 2011
11:01 pm

Ken Krugler
kkrugler
Send Email

Hi, While running the example there are lot of messages being logged to console. How can i disable the messages from being logged .. Thanks Harshit...
2 Feb 8, 2011
3:26 am

Ken Krugler
kkrugler
Send Email

Hello, at the moment I am studing bixo, I read de slides and I play with de examples and the code, but It's very complicated understand the code without a...
4 Feb 7, 2011
3:31 pm

Ken Krugler
kkrugler
Send Email

Hi , Earlier i successfully created my project using bixo-core-1.0-SNAPSHOT.jar present in the distribution. Moving i am trying to move to maven . So in...
3 Feb 3, 2011
6:37 pm

hrsht.rastogi
Send Email

Hi , I run the SimpleCrawlTool and parameters are set as SimpleCrawlToolOptions options = new SimpleCrawlToolOptions(); options.setAgentName("tester&quot;); ...
5 Jan 25, 2011
6:19 pm

hrsht.rastogi
Send Email

Hi, I am new to Bixo,Cascading and Hadoop, I was able to run the example. I could see that various folders are created and when i run SimpleStatusTool . I get...
4 Jan 22, 2011
3:42 am

Vivek Magotra
vmagotra
Send Email

Hi, I run SimpleCrawlTool from Eclipse and I get this message dozens of time during one run: ERROR examples.CreateUrlDatumFromStatusFunction:83 - Unknown...
8 Jan 3, 2011
4:17 am

Ken Krugler
kkrugler
Send Email

Hi, I'm bixo newbie and have one important question. I'd like to use bixo as a tool for constant monitoring some range of domains and extract some data from...
2 Jan 2, 2011
2:11 pm

Ken Krugler
kkrugler
Send Email

When I try to compile Bixo, I get the following message from ant: [artifact:dependencies] Diagnosis: [artifact:dependencies] [artifact:dependencies] Unable to...
6 Dec 29, 2010
8:14 pm

Ken Krugler
kkrugler
Send Email

Hi ! I'm trying to install Bixo but i get a failing test on: [junit] -> at bixo.operations.ProcessRobotsTask.run(ProcessRobotsTask.java:135) Is something that...
4 Dec 24, 2010
4:23 pm

Ken Krugler
kkrugler
Send Email

Hi everyone, I am trying to upgrade to cascading 1.2 by replacing the cascading-core-1.x jar. and got an unexpected exception. Caused by:...
15 Dec 11, 2010
10:35 pm

Chris K Wensel
cwense1
Send Email

I'm trying to load OpenBixo into Eclipse basically by doing an "ant eclipse" and then in Eclipse "Import existing project". However it is not working for me -...
2 Nov 30, 2010
1:11 pm

Ken Krugler
kkrugler
Send Email

OK, I've got ahead and pushed this change. Let me know if it works for you. To summarize - you should now be able to set the list of supported link tags in the...
3 Nov 24, 2010
7:43 pm

Yuhan Zhang
yuhanz2003
Send Email

I see that from the docs we save fetched pages to some kind of permanent store. (I am assuming it would be some kind of Hadoop based NoSQL database but don't...
5 Nov 24, 2010
3:14 pm

Ken Krugler
kkrugler
Send Email

Are there any more examples like SimpleCrawlTool. I've looked through the code in bixo.tools but ideally I'd like something nearer to Nutch to start from....
4 Nov 23, 2010
1:57 pm

Vivek Magotra
vmagotra
Send Email

Hello, I'm going to implement a domain-specific crawler and studying bixo for this task. My problem is that web sites are few in number but they are very...
14 Nov 21, 2010
11:01 pm

ivan.panachev
Send Email

Hi everyone, I am new to bixo, and trying to use this crawler to retrieve attributes from tags other than <a href="...">. I am using the SimpleParser class, by...
4 Nov 20, 2010
7:54 pm

Ken Krugler
kkrugler
Send Email

Hi all, I am trying to work with the SimpleParser class to make it extract attribute values of different tags. Here is some design issue that I encountered: ...
3 Nov 11, 2010
9:34 pm

yuhanz2003
Send Email

I noticed that public static final java.lang.String CONTENT_FIELD; was changed to a private field in the latest bixo release. I use FetchedDatum.CONTENT_FIELD...
2 Nov 9, 2010
6:55 pm

Ken Krugler
kkrugler
Send Email

Hi All, Not sure what I'm doing wrong here. Eclipse Run Configuration: -agentname Tester -domain apple.com -numloops 2 -outputdir c:\temp&#92;bixo Output: 10/11/02...
7 Nov 6, 2010
11:59 pm

bcalverton
Send Email

Hi all, A quick note about the 0.5.1 release. We've fixed the dist build and the bin/bixo script so that you should now be able to run the tool from the...
1 Oct 30, 2010
11:44 pm

Ken Krugler
kkrugler
Send Email

Hi all, I just did a new build of Bixo, tagged as version 0.4.8 in GitHub. Various files of interest that are available for use: - The maven artifact is in the...
3 Oct 29, 2010
5:31 pm

Ken Krugler
kkrugler
Send Email
  Newest  |  < Newer  |  Older >  |  Oldest
Add to My Yahoo!      XML What's This?

Copyright 2010 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines NEW - Help