Search the web
Sign In
New User? Sign Up
TextAnalytics · Text Analytics
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Real people. Real stories. See how Yahoo! Groups impacts members worldwide.

Best of Y! Groups

   Check them out and nominate your group.
Having problems with message search? Fill out this form to ensure your group is one of the first to be migrated to the new message search system.

Messages

  Messages Help
Advanced
Messages 1 - 30 of 422   Newest  |  < Newer  |  Older >  |  Oldest
Messages: Show Message Summaries   (Group by Topic) Sort by Date v  
#30 From: "Curt A. Monash" <curtmonash@...>
Date: Fri Jul 21, 2006 1:32 pm
Subject: Re: Identifying variants
camonash
Offline Offline
Send Email Send Email
 
Thanks, William.

My question was -- what kind of network analyis FOR NORMALIZING AUTHORS AND AFFILIATIONS.

I.e., are you doing any yet, or envisioning how to do it, or just dreaming?

I'm guessing you're doing rudimentary stuff in that particular area, and wishing it weren't so rudimentary, but that's just a guess.  ;)

Best,

CAM

At 03:18 AM 7/21/2006, William Hayes wrote:
Hi Curt,

Not would, we run collaboration network analyses for Biogen Idec researchers looking at a disease area or target.  We've even looked at treatment technologies to figure out who to work with or consult based on whether they are a supernode or provide significant linking potential.  We use Cytoscape for visualization and Medline or Dialog for providing the literature sources for analysis.  Our main area of improvement is normalizing the authors and affiliations more effectively.  It's not a major problem, but it would enhance the utility of the results.

It's similar in nature to what we've seen from Boston Consulting Group, just more flexible and internally produced.

William



On 7/21/06, Curt A. Monash <curtmonash@...> wrote:

At 04:33 AM 7/18/2006, William Hayes wrote:
Hi Mary,

Thank you for your introduction.  I noticed with significant interest your work on collapsing name variants in a systematic fashion.  One of our outstanding problems for generating collaboration networks using Cytoscape is just that issue of name variants (and making sure the names refer to specific individuals).  We expect that network analysis will play a part in this normalization.  I'd like to suggest Medline (the National Library of Medicine's biomedical abstracts database) containing author and affiliation information with no associated unique primary ID's as an excellent database to perform this work :)

If you do find this interesting as part of your graduate studies, I'd be happy to help you get started.

Regards,

William



 

That sounds pretty interesting William.  What kind of network analysis would you do?

Thanks,

CAM

Curt A. Monash, Ph.D.
President, Monash Information Services
curtmonash@...
(978) 266-1815 (main)
Backups:  curtmonash@..., (978) 266-1866

Blogs: 
http://www.monashreport.com (Software industry, IT-related public policy, ventures, etc.)
http://www.dbms2.com (DBMS and related technologies)
http://www.texttechnologies.com (Search, text mining, etc.)
http://www.softwarememories.com (Software industry history)

Computerworld columns: http://www.computerworld.com/action/columnist.do?command=viewColumnist&bylineID=894
Core website (slightly out of date) http://www.monash.com

Curt A. Monash, Ph.D.
President, Monash Information Services
curtmonash@...
(978) 266-1815 (main)
Backups:  curtmonash@..., (978) 266-1866

Blogs: 
http://www.monashreport.com (Software industry, IT-related public policy, ventures, etc.)
http://www.dbms2.com (DBMS and related technologies)
http://www.texttechnologies.com (Search, text mining, etc.)
http://www.softwarememories.com (Software industry history)

Computerworld columns: http://www.computerworld.com/action/columnist.do?command=viewColumnist&bylineID=894
Core website (slightly out of date) http://www.monash.com


#29 From: "William Hayes" <william.s.hayes@...>
Date: Fri Jul 21, 2006 10:18 am
Subject: Re: Identifying variants
william_s_hayes
Offline Offline
Send Email Send Email
 
Hi Curt,

Not would, we run collaboration network analyses for Biogen Idec researchers looking at a disease area or target.  We've even looked at treatment technologies to figure out who to work with or consult based on whether they are a supernode or provide significant linking potential.  We use Cytoscape for visualization and Medline or Dialog for providing the literature sources for analysis.  Our main area of improvement is normalizing the authors and affiliations more effectively.  It's not a major problem, but it would enhance the utility of the results.

It's similar in nature to what we've seen from Boston Consulting Group, just more flexible and internally produced.

William



On 7/21/06, Curt A. Monash <curtmonash@...> wrote:

At 04:33 AM 7/18/2006, William Hayes wrote:

Hi Mary,

Thank you for your introduction.  I noticed with significant interest your work on collapsing name variants in a systematic fashion.  One of our outstanding problems for generating collaboration networks using Cytoscape is just that issue of name variants (and making sure the names refer to specific individuals).  We expect that network analysis will play a part in this normalization.  I'd like to suggest Medline (the National Library of Medicine's biomedical abstracts database) containing author and affiliation information with no associated unique primary ID's as an excellent database to perform this work :)

If you do find this interesting as part of your graduate studies, I'd be happy to help you get started.

Regards,

William



 

That sounds pretty interesting William.  What kind of network analysis would you do?

Thanks,

CAM

Curt A. Monash, Ph.D.
President, Monash Information Services
curtmonash@...
(978) 266-1815 (main)
Backups:  curtmonash@..., (978) 266-1866

Blogs: 
http://www.monashreport.com (Software industry, IT-related public policy, ventures, etc.)
http://www.dbms2.com (DBMS and related technologies)
http://www.texttechnologies.com (Search, text mining, etc.)
http://www.softwarememories.com (Software industry history)

Computerworld columns: http://www.computerworld.com/action/columnist.do?command=viewColumnist&bylineID=894
Core website (slightly out of date) http://www.monash.com



#28 From: "Curt A. Monash" <curtmonash@...>
Date: Fri Jul 21, 2006 11:33 am
Subject: Identifying variants
camonash
Offline Offline
Send Email Send Email
 
At 04:33 AM 7/18/2006, William Hayes wrote:
Hi Mary,

Thank you for your introduction.  I noticed with significant interest your work on collapsing name variants in a systematic fashion.  One of our outstanding problems for generating collaboration networks using Cytoscape is just that issue of name variants (and making sure the names refer to specific individuals).  We expect that network analysis will play a part in this normalization.  I'd like to suggest Medline (the National Library of Medicine's biomedical abstracts database) containing author and affiliation information with no associated unique primary ID's as an excellent database to perform this work :)

If you do find this interesting as part of your graduate studies, I'd be happy to help you get started.

Regards,

William



 

That sounds pretty interesting William.  What kind of network analysis would you do?

Thanks,

CAM

Curt A. Monash, Ph.D.
President, Monash Information Services
curtmonash@...
(978) 266-1815 (main)
Backups:  curtmonash@..., (978) 266-1866

Blogs: 
http://www.monashreport.com (Software industry, IT-related public policy, ventures, etc.)
http://www.dbms2.com (DBMS and related technologies)
http://www.texttechnologies.com (Search, text mining, etc.)
http://www.softwarememories.com (Software industry history)

Computerworld columns: http://www.computerworld.com/action/columnist.do?command=viewColumnist&bylineID=894
Core website (slightly out of date) http://www.monash.com


#27 From: "Diego Molla Aliod" <diego@...>
Date: Thu Jul 20, 2006 4:14 am
Subject: Re: Natural-language query/question answering
mollaaliod
Offline Offline
Send Email Send Email
 
Hi all,

Just a short introduction of myself and a response to Seth's question.
I'm a senior lecturer at Macquarie University, Sydney, Australia,
where I am doing research in the area of question answering.

And my answer to Seth's question is, besides suggesting you to look at
my own question answering project
<http://www.ics.mq.edu.au/~diego/answerfinder/>, to look at the TREC
question answering track of the TREC conferences
<http://trec.nist.gov/>, or at the CLEF conferences
<http://clef.isti.cnr.it/>, where much of current research in QA is
published.

Text-based question answering is a very dynamic area of R&D, and the
main web search engines are starting to incorporate QA technology.
Expect to see more and more QA abilities in the future web search
engines. Current systems focus on short fact-based questions, and
currently they are starting to attempt more complex questions where
the answer needs to be composed from bits and pieces found in various
sources.

The following list of QA systems is not exhaustive but it can give you
an idea of what you can find nowadays.

http://www.ics.mq.edu.au/~pizzato/repository

Hope this helps.

Cheers,

Diego

--- In TextAnalytics@yahoogroups.com, Seth Grimes <grimes@...> wrote:
>
> Hello all,
>
>  What's going on in the world of natural-language query/question
> answering?  For examples of this, see (and try!) --
>
> http://start.csail.mit.edu/
>
> http://brainboost.com
>
> http://answers.com
>
>  For that matter, go to http://google.com and enter "2 + 2 - 1/17"
> or "map Georgia."  Google Enterprise has partners that are extending
this
> capability to cover artifacts produced in response to a Google OneBox
> "search."  If I understand this correctly, the partner's software
inserts
> Google index entries and Google lists those artifacts along with
document
> hits.
>
>  I'm interested in particular in implementations for governmental /
> social / economic statistics (and maps).  The sites I cited do
alright for
> simple questions about demographics but fail on more complex but still
> typical questions.  I'd guess that's because they're broadly targeted;
> perhaps they could be tuned for the vocabularies and syntaxes of stats
> questions.
>
>  I'd like to hear about academic and industrial research and
> productizations and to get pointers to papers.
>
>  Thanks,
>
> 				 Seth
>
>
> --
> Seth Grimes   Alta Plana Corp, analytical computing & data management
>               Intelligent Enterprise magazine (CMP), Contributing Editor
> grimes@...       http://altaplana.com    301-270-0795
>

#26 From: Roxana Angheluta <roxana@...>
Date: Wed Jul 19, 2006 2:14 pm
Subject: Re: Open source text analytics
roxana@...
Send Email Send Email
 
I know also about Ngram Statistics Package (NSP).

"The Ngram Statistics Package (NSP) is a suite of programs that aids in
analyzing Ngrams in text files. We define an Ngram as a sequence of 'n'
tokens that occur within a window of at least 'n' tokens in the text;
what constitutes a "token" can be defined by the user.'

Project page:
http://search.cpan.org/~tpederse/Text-NSP-0.97/Docs/README.pod#DESCRIPTION

roxana

> I'm working on a catalog of open-source software for text-analytics and
> related functions.  Here's what I have so far.  Please add to the list.
> After a bit more review, I'll paste this info into the Wikipedia Text
> Analytics entry.
>
> It would be great to have your reactions to the various packages!
>
> 				 Seth
>
>
> OpenNLP
>
> "An organizational center for open source projects related to natural
> language processing....  OpenNLP also hosts a variety of java-based NLP
> tools which perform sentence detection, tokenization, pos-tagging,
> chunking and parsing, named-entity detection, and coreference using the
> OpenNLP Maxent machine learning package."
>
> Home page: http://opennlp.sourceforge.net/
>
> Project page: http://sourceforge.net/projects/opennlp/
>
>
> Carrot2
>
> "A search results clustering framework. Includes clustering components and
> a stand-alone meta search component. Combines well with indexing and
> search engines (open source and proprietary)."
>
> Home page: http://www.carrot2.org
>
> Project page: http://sourceforge.net/projects/carrot2/
>
>
> FreeLing
>
> "An open source language analysis tool suite."
>
> http://garraf.epsevg.upc.es/freeling/
>
>
> GATE -- General Architecture for Text Engineering
>
> "GATE is ... the leading toolkit for Text Mining ... comprised of an
> architecture, a free open source framework (or SDK) and graphical
> development environment."
>
> Home page: http://gate.ac.uk/index.html
>
> Project page: http://sourceforge.net/projects/gate
>
>
> Graphviz -- Graph Visualization Software
>
> Graph visualization is a way of representing structural information as
> diagrams of abstract graphs and networks....  The Graphviz layout programs
> take descriptions of graphs in a simple text language, and make diagrams
> in several useful formats such as images and SVG for web pages, Postscript
> for inclusion in PDF or other documents; or display in an interactive
> graph browser."
>
> http://graphviz.org
>
>
> jTokeniser
>
> "The jTokeniser package was designed to combine a set of tokenisers that
> range from basic whitespace tokenisers to more complex ones that deal
> intuitively with natural language....  Tokenisers include:
>
> * WhiteSpaceTokeniser
> * StringTokeniser (based on specified delimiters)
> * RegexTokeniser (regular expression defines a token)
> * RegexSeparatorTokeniser (define what is *not* a token)
> * BreatIteratorTokeniser (sophisticated locale-specific tokeniser)
> * SentenceTokeniser (sentence segmentation)"
>
> http://www.andy-roberts.net/software/jTokeniser/
>
>
> Kea
>
> "Kea-3.0 automatically extracts keyphrases from the full text of
> documents....  Kea-4.0 is a new version of Kea that has been developed for
> controlled indexing of documents in the domain of agriculture."
>
> http://www.nzdl.org/Kea/
>
>
> LingPipe ** free but not open source
>
> "A suite of Java libraries for the linguistic analysis of human language."
>
> http://www.alias-i.com/lingpipe/index.html
>
>
> LTC -- Linguistic Tree Constructor
>
> "LTC is a free program for building linguistic syntax trees from text."
>
> Home page: http://ltc.sourceforge.net
>
> Project page: http://sourceforge.net/projects/ltc
>
>
> Lucene
>
> "Apache Lucene is a high-performance, full-featured text search engine
> library written entirely in Java."
>
> http://lucene.apache.org/
>
>
> NLTK
>
> "NLTK, the Natural Language Toolkit, is a suite of program modules, data
> sets and tutorials supporting research and teaching in computational
> linguistics and natural language processing."
>
> Home page: http://nltk.sourceforge.net/index.html
>
> Project page: http://sourceforge.net/projects/nltk
>
>
> Nutch
>
> "Nutch builds on Lucene Java to provide web search application software."
>
> http://lucene.apache.org/nutch/
>
>
> TouchGraph
>
> "TouchGraph provides a hands-on way to visualize networks of interrelated
> information. Networks are rendered as interactive graphs, which lend
> themselves to a variety of transformations."
>
> Home page: http://www.touchgraph.com/
>
> Project page: http://touchgraph.sourceforge.net/
>
>
> Weka
>
> "Weka is a collection of machine learning algorithms for data mining
> tasks.... Weka contains tools for data pre-processing, classification,
> regression, clustering, association rules, and visualization. It is also
> well-suited for developing new machine learning schemes."
>
> http://www.cs.waikato.ac.nz/~ml/weka/
>
> See Weka-related projects:
> http://weka.sourceforge.net/wiki/index.php/Related_Projects
>
>
>
> --
> Seth Grimes   Alta Plana Corp, analytical computing & data management
>               Intelligent Enterprise magazine (CMP), Contributing Editor
> grimes@...       http://altaplana.com    301-270-0795
>
>
>
>
> Yahoo! Groups Links
>
>
>
>
>
>
>

#25 From: Seth Grimes <grimes@...>
Date: Wed Jul 19, 2006 2:05 pm
Subject: Re: Open source text analytics
sethgrimes
Offline Offline
Send Email Send Email
 
I forgot to include UIMA --

UIMA -- Unstructured Information Management Architecture

"An open, industrial-strength, scaleable and extensible platform for
creating, integrating and deploying unstructured information management
solutions from combinations of semantic analysis and search components."

Project site at IBM Research:  http://www.research.ibm.com/UIMA/

SDK site at IBM alphaWorks:  http://www.alphaworks.ibm.com/tech/uima

Framework site at SourceForge:  http://uima-framework.sourceforge.net





On Wed, 19 Jul 2006, Seth Grimes wrote:

> I'm working on a catalog of open-source software for text-analytics and
> related functions.  Here's what I have so far.  Please add to the list.
> After a bit more review, I'll paste this info into the Wikipedia Text
> Analytics entry.
>
> It would be great to have your reactions to the various packages!
>
> 				 Seth
>
>
> OpenNLP
>
> "An organizational center for open source projects related to natural
> language processing....  OpenNLP also hosts a variety of java-based NLP
> tools which perform sentence detection, tokenization, pos-tagging,
> chunking and parsing, named-entity detection, and coreference using the
> OpenNLP Maxent machine learning package."
>
> Home page: http://opennlp.sourceforge.net/
>
> Project page: http://sourceforge.net/projects/opennlp/
>
>
> Carrot2
>
> "A search results clustering framework. Includes clustering components and
> a stand-alone meta search component. Combines well with indexing and
> search engines (open source and proprietary)."
>
> Home page: http://www.carrot2.org
>
> Project page: http://sourceforge.net/projects/carrot2/
>
>
> FreeLing
>
> "An open source language analysis tool suite."
>
> http://garraf.epsevg.upc.es/freeling/
>
>
> GATE -- General Architecture for Text Engineering
>
> "GATE is ... the leading toolkit for Text Mining ... comprised of an
> architecture, a free open source framework (or SDK) and graphical
> development environment."
>
> Home page: http://gate.ac.uk/index.html
>
> Project page: http://sourceforge.net/projects/gate
>
>
> Graphviz -- Graph Visualization Software
>
> Graph visualization is a way of representing structural information as
> diagrams of abstract graphs and networks....  The Graphviz layout programs
> take descriptions of graphs in a simple text language, and make diagrams
> in several useful formats such as images and SVG for web pages, Postscript
> for inclusion in PDF or other documents; or display in an interactive
> graph browser."
>
> http://graphviz.org
>
>
> jTokeniser
>
> "The jTokeniser package was designed to combine a set of tokenisers that
> range from basic whitespace tokenisers to more complex ones that deal
> intuitively with natural language....  Tokenisers include:
>
> * WhiteSpaceTokeniser
> * StringTokeniser (based on specified delimiters)
> * RegexTokeniser (regular expression defines a token)
> * RegexSeparatorTokeniser (define what is *not* a token)
> * BreatIteratorTokeniser (sophisticated locale-specific tokeniser)
> * SentenceTokeniser (sentence segmentation)"
>
> http://www.andy-roberts.net/software/jTokeniser/
>
>
> Kea
>
> "Kea-3.0 automatically extracts keyphrases from the full text of
> documents....  Kea-4.0 is a new version of Kea that has been developed for
> controlled indexing of documents in the domain of agriculture."
>
> http://www.nzdl.org/Kea/
>
>
> LingPipe ** free but not open source
>
> "A suite of Java libraries for the linguistic analysis of human language."
>
> http://www.alias-i.com/lingpipe/index.html
>
>
> LTC -- Linguistic Tree Constructor
>
> "LTC is a free program for building linguistic syntax trees from text."
>
> Home page: http://ltc.sourceforge.net
>
> Project page: http://sourceforge.net/projects/ltc
>
>
> Lucene
>
> "Apache Lucene is a high-performance, full-featured text search engine
> library written entirely in Java."
>
> http://lucene.apache.org/
>
>
> NLTK
>
> "NLTK, the Natural Language Toolkit, is a suite of program modules, data
> sets and tutorials supporting research and teaching in computational
> linguistics and natural language processing."
>
> Home page: http://nltk.sourceforge.net/index.html
>
> Project page: http://sourceforge.net/projects/nltk
>
>
> Nutch
>
> "Nutch builds on Lucene Java to provide web search application software."
>
> http://lucene.apache.org/nutch/
>
>
> TouchGraph
>
> "TouchGraph provides a hands-on way to visualize networks of interrelated
> information. Networks are rendered as interactive graphs, which lend
> themselves to a variety of transformations."
>
> Home page: http://www.touchgraph.com/
>
> Project page: http://touchgraph.sourceforge.net/
>
>
> Weka
>
> "Weka is a collection of machine learning algorithms for data mining
> tasks.... Weka contains tools for data pre-processing, classification,
> regression, clustering, association rules, and visualization. It is also
> well-suited for developing new machine learning schemes."
>
> http://www.cs.waikato.ac.nz/~ml/weka/
>
> See Weka-related projects:
> http://weka.sourceforge.net/wiki/index.php/Related_Projects
>
>
>
> --
> Seth Grimes   Alta Plana Corp, analytical computing & data management
>               Intelligent Enterprise magazine (CMP), Contributing Editor
> grimes@...       http://altaplana.com    301-270-0795
>
>
>
>
> Yahoo! Groups Links
>
>
>
>
>
>
>

--
Seth Grimes   Alta Plana Corp, analytical computing & data management
               Intelligent Enterprise magazine (CMP), Contributing Editor
grimes@...       http://altaplana.com    301-270-0795

#24 From: Seth Grimes <grimes@...>
Date: Wed Jul 19, 2006 11:15 am
Subject: Open source text analytics
sethgrimes
Offline Offline
Send Email Send Email
 
I'm working on a catalog of open-source software for text-analytics and
related functions.  Here's what I have so far.  Please add to the list.
After a bit more review, I'll paste this info into the Wikipedia Text
Analytics entry.

It would be great to have your reactions to the various packages!

					 Seth


OpenNLP

"An organizational center for open source projects related to natural
language processing....  OpenNLP also hosts a variety of java-based NLP
tools which perform sentence detection, tokenization, pos-tagging,
chunking and parsing, named-entity detection, and coreference using the
OpenNLP Maxent machine learning package."

Home page: http://opennlp.sourceforge.net/

Project page: http://sourceforge.net/projects/opennlp/


Carrot2

"A search results clustering framework. Includes clustering components and
a stand-alone meta search component. Combines well with indexing and
search engines (open source and proprietary)."

Home page: http://www.carrot2.org

Project page: http://sourceforge.net/projects/carrot2/


FreeLing

"An open source language analysis tool suite."

http://garraf.epsevg.upc.es/freeling/


GATE -- General Architecture for Text Engineering

"GATE is ... the leading toolkit for Text Mining ... comprised of an
architecture, a free open source framework (or SDK) and graphical
development environment."

Home page: http://gate.ac.uk/index.html

Project page: http://sourceforge.net/projects/gate


Graphviz -- Graph Visualization Software

Graph visualization is a way of representing structural information as
diagrams of abstract graphs and networks....  The Graphviz layout programs
take descriptions of graphs in a simple text language, and make diagrams
in several useful formats such as images and SVG for web pages, Postscript
for inclusion in PDF or other documents; or display in an interactive
graph browser."

http://graphviz.org


jTokeniser

"The jTokeniser package was designed to combine a set of tokenisers that
range from basic whitespace tokenisers to more complex ones that deal
intuitively with natural language....  Tokenisers include:

* WhiteSpaceTokeniser
* StringTokeniser (based on specified delimiters)
* RegexTokeniser (regular expression defines a token)
* RegexSeparatorTokeniser (define what is *not* a token)
* BreatIteratorTokeniser (sophisticated locale-specific tokeniser)
* SentenceTokeniser (sentence segmentation)"

http://www.andy-roberts.net/software/jTokeniser/


Kea

"Kea-3.0 automatically extracts keyphrases from the full text of
documents....  Kea-4.0 is a new version of Kea that has been developed for
controlled indexing of documents in the domain of agriculture."

http://www.nzdl.org/Kea/


LingPipe ** free but not open source

"A suite of Java libraries for the linguistic analysis of human language."

http://www.alias-i.com/lingpipe/index.html


LTC -- Linguistic Tree Constructor

"LTC is a free program for building linguistic syntax trees from text."

Home page: http://ltc.sourceforge.net

Project page: http://sourceforge.net/projects/ltc


Lucene

"Apache Lucene is a high-performance, full-featured text search engine
library written entirely in Java."

http://lucene.apache.org/


NLTK

"NLTK, the Natural Language Toolkit, is a suite of program modules, data
sets and tutorials supporting research and teaching in computational
linguistics and natural language processing."

Home page: http://nltk.sourceforge.net/index.html

Project page: http://sourceforge.net/projects/nltk


Nutch

"Nutch builds on Lucene Java to provide web search application software."

http://lucene.apache.org/nutch/


TouchGraph

"TouchGraph provides a hands-on way to visualize networks of interrelated
information. Networks are rendered as interactive graphs, which lend
themselves to a variety of transformations."

Home page: http://www.touchgraph.com/

Project page: http://touchgraph.sourceforge.net/


Weka

"Weka is a collection of machine learning algorithms for data mining
tasks.... Weka contains tools for data pre-processing, classification,
regression, clustering, association rules, and visualization. It is also
well-suited for developing new machine learning schemes."

http://www.cs.waikato.ac.nz/~ml/weka/

See Weka-related projects:
http://weka.sourceforge.net/wiki/index.php/Related_Projects



--
Seth Grimes   Alta Plana Corp, analytical computing & data management
               Intelligent Enterprise magazine (CMP), Contributing Editor
grimes@...       http://altaplana.com    301-270-0795

#23 From: "William Hayes" <william.s.hayes@...>
Date: Tue Jul 18, 2006 2:10 pm
Subject: Re: Introduction
william_s_hayes
Offline Offline
Send Email Send Email
 
Hi Mary,

I am interested in whatever pointers to the literature you can give me for author or affiliation disambiguation and normalization. 

Good luck in your studies.  I can't imagine how hard it is to keep a PhD program going while working full-time.  I was lucky enough to be able to concentrate on just my PhD and thought it took forever.

Thanks,

William



On 7/18/06, Mary D. Taffet <mdtaffet@...> wrote:

William,

Thank you for your reply.

I'm sure that at some point my work will expand to include structured
data like citations, which have similar problems no matter what the
domain. But for now my work is limited to unstructured full text
documents in the domain of genealogy, so MedLine is not in the cards for
my dissertation, but perhaps for the future beyond my dissertation.

My husband is on the faculty of SUNY Upstate Medical University here in
Syracuse, so I imagine at some point I will get involved with medical or
perhaps biomedical domain work. During dinner with his colleagues one
night I was trying to explain my dissertation topic, and one of his
fellow researchers immediately drew a parallel with genes and proteins.

If you'd like, I can point you to literature that does involve work with
citations -- there's quite a bit out there.

-- Thanks again,


Mary D. Taffet
Ph.D. Candidate/Syracuse University School of Information Studies
Scientist/TextWise LLC
Syracuse, NY

William Hayes wrote:
> Hi Mary,
>
> Thank you for your introduction. I noticed with significant interest
> your work on collapsing name variants in a systematic fashion. One of
> our outstanding problems for generating collaboration networks using
> Cytoscape is just that issue of name variants (and making sure the names
> refer to specific individuals). We expect that network analysis will
> play a part in this normalization. I'd like to suggest Medline (the
> National Library of Medicine's biomedical abstracts database) containing
> author and affiliation information with no associated unique primary
> ID's as an excellent database to perform this work :)
>
> If you do find this interesting as part of your graduate studies, I'd be
> happy to help you get started.
>
> Regards,
>
> William
>
>
> [snip]



#22 From: "Mary D. Taffet" <mdtaffet@...>
Date: Tue Jul 18, 2006 12:25 pm
Subject: Re: Introduction
mdtaffet.geo
Offline Offline
Send Email Send Email
 
William,

Thank you for your reply.

I'm sure that at some point my work will expand to include structured
data like citations, which have similar problems no matter what the
domain.  But for now my work is limited to unstructured full text
documents in the domain of genealogy, so MedLine is not in the cards for
my dissertation, but perhaps for the future beyond my dissertation.

My husband is on the faculty of SUNY Upstate Medical University here in
Syracuse, so I imagine at some point I will get involved with medical or
perhaps biomedical domain work.  During dinner with his colleagues one
night I was trying to explain my dissertation topic, and one of his
fellow researchers immediately drew a parallel with genes and proteins.

If you'd like, I can point you to literature that does involve work with
citations -- there's quite a bit out there.

-- Thanks again,
     Mary D. Taffet
     Ph.D. Candidate/Syracuse University School of Information Studies
     Scientist/TextWise LLC
     Syracuse, NY


William Hayes wrote:
> Hi Mary,
>
> Thank you for your introduction.  I noticed with significant interest
> your work on collapsing name variants in a systematic fashion.  One of
> our outstanding problems for generating collaboration networks using
> Cytoscape is just that issue of name variants (and making sure the names
> refer to specific individuals).  We expect that network analysis will
> play a part in this normalization.  I'd like to suggest Medline (the
> National Library of Medicine's biomedical abstracts database) containing
> author and affiliation information with no associated unique primary
> ID's as an excellent database to perform this work :)
>
> If you do find this interesting as part of your graduate studies, I'd be
> happy to help you get started.
>
> Regards,
>
> William
>
>
> [snip]

#21 From: "William Hayes" <william.s.hayes@...>
Date: Tue Jul 18, 2006 11:40 am
Subject: Re: Re:Text Segmentation Algorithm
william_s_hayes
Offline Offline
Send Email Send Email
 
Hi Tam,

Without knowing why you need to segment your text and what you are going to do with it downstream, I'd have to agree with Dominic that parsing the text into paragraphs is one of the best ways to segment text passages that are consistent in content (at least in the European languages with which I'm familiar - caveat - I'm not a linguist).  Sentences are designed to express an atomic fact (mostly), and paragraphs are designed to present a concept and it's supporting evidence.

William

On 7/17/06, dominic_forest <dominic.forest@...> wrote:

Tam,

Text Tiling is obviously a good choice. However, as far as I know, the
implementation of this method is not easy, the algorithm is
time-consuming, and the results can be unpredictible.

Have you thought about simply dividing your documents into paragraphs
or (overlapping or non-overlapping) window passages (i.e. sequences of
words)?

Regards,
Dominic



--- In TextAnalytics@yahoogroups.com, tamer adel <tamadel2003@...> wrote:
>
> Hi,All
> I have text document of one mass and i want to divide it into
multi paragaph that are coherent portions....the subject is new to me
....i made search and i found text tiling algorithm is preferred
method to execute my task ... is any one know more than me guid to
another method or algorithm as i suggested to solve the suggestion
problem.
> pls, replay is urgent to me till the afternoon of tomorrow.
>
> regards,
> tam
>
>
> Tamer Abu Elenain
> Software Developer
> (+2) 012 562 74 21
> tamadel2003@...
>
>
>
> ---------------------------------

> Yahoo! Messenger with Voice. Make PC-to-Phone Calls to the US (and
30+ countries) for 2¢/min or less.
>



#20 From: "William Hayes" <william.s.hayes@...>
Date: Tue Jul 18, 2006 11:33 am
Subject: Re: Introduction
william_s_hayes
Offline Offline
Send Email Send Email
 
Hi Mary,

Thank you for your introduction.  I noticed with significant interest your work on collapsing name variants in a systematic fashion.  One of our outstanding problems for generating collaboration networks using Cytoscape is just that issue of name variants (and making sure the names refer to specific individuals).  We expect that network analysis will play a part in this normalization.  I'd like to suggest Medline (the National Library of Medicine's biomedical abstracts database) containing author and affiliation information with no associated unique primary ID's as an excellent database to perform this work :)

If you do find this interesting as part of your graduate studies, I'd be happy to help you get started.

Regards,

William


On 7/17/06, Mary D. Taffet <mdtaffet@...> wrote:

Hello,

My name is Mary D. Taffet. I have a Bachelor's degree in Linguistics
from UNC-Chapel Hill, a Master's degree in Linguistics from Syracuse
University, an MLS degree in Information and Library Science from
Syracuse University's School of Information Studies and am currently a
Ph.D. Candidate at the School of Information Studies.

In between my bachelor's and master's programs, I became a business
applications programmer working with COBOL on a mainframe. Needless to
say, at some point I realized that I didn't have to choose between
working with language and working with computers, both of which I both
enjoy and am fairly good at. So I went back to school with the goal of
learning about Natural Language Processing/Computational Linguistics,
which I have been focusing on since 1999. [And in the process became
one of the few skilled COBOL programmers to never ever work on a Y2K
project, though I got offers most every week it seemed...]

I was a Research Assistant at TextWise from 1999-2000, then was a
Research Assistant at the Center for Natural Language Processing at
Syracuse University's School of Information Studies from 2000-2004. Now
I'm back at TextWise as a fulltime employee since 2005 working on
contextual advertising.

At some point after I started grad school, I became addicted to
genealogy, and have done the bulk of my genealogical research online
since then, with a few trips to Salt Lake City and the Montreal Archives
along the way. I am a very frustrated online genealogical researcher
due to the difficulty of searching names online. Fortunately the
difficulty in searching names online is something that even a
non-genealogical researcher can do as a dissertation as it is a general
problem for all sorts of applications. So that's the focus of my
dissertation. I am looking at the relationship between people and the
way people are referred to in written documents. I hope to bring
together all variant forms of a person's name, while at the same time
teasing apart identical names that refer to different people. I have an
electronic corpus of 14,000+ biographies from a 1904 publication
supplied by Ancestry.com.

I had to put my dissertation work aside for a while during my father's
hospitalization last year, and am still trying to get back into the
swing of things after my father passed away. It's not easy with a
fulltime job, but hopefully I will get there before too much more time
has passed.

-- Mary D. Taffet
Ph.D. Candidate/Syracuse University-School of Information Studies
Scientist/TextWise LLC
Syracuse, NY



#19 From: "Mary D. Taffet" <mdtaffet@...>
Date: Tue Jul 18, 2006 1:49 am
Subject: Introduction
mdtaffet.geo
Offline Offline
Send Email Send Email
 
Hello,

My name is Mary D. Taffet.  I have a Bachelor's degree in Linguistics
from UNC-Chapel Hill, a Master's degree in Linguistics from Syracuse
University, an MLS degree in Information and Library Science from
Syracuse University's School of Information Studies and am currently a
Ph.D. Candidate at the School of Information Studies.

In between my bachelor's and master's programs, I became a business
applications programmer working with COBOL on a mainframe.  Needless to
say, at some point I realized that I didn't have to choose between
working with language and working with computers, both of which I both
enjoy and am fairly good at.  So I went back to school with the goal of
learning about Natural Language Processing/Computational Linguistics,
which I have been focusing on since 1999.  [And in the process became
one of the few skilled COBOL programmers to never ever work on a Y2K
project, though I got offers most every week it seemed...]

I was a Research Assistant at TextWise from 1999-2000, then was a
Research Assistant at the Center for Natural Language Processing at
Syracuse University's School of Information Studies from 2000-2004.  Now
I'm back at TextWise as a fulltime employee since 2005 working on
contextual advertising.

At some point after I started grad school, I became addicted to
genealogy, and have done the bulk of my genealogical research online
since then, with a few trips to Salt Lake City and the Montreal Archives
along the way.  I am a very frustrated online genealogical researcher
due to the difficulty of searching names online.  Fortunately the
difficulty in searching names online is something that even a
non-genealogical researcher can do as a dissertation as it is a general
problem for all sorts of applications.  So that's the focus of my
dissertation.  I am looking at the relationship between people and the
way people are referred to in written documents.  I hope to bring
together all variant forms of a person's name, while at the same time
teasing apart identical names that refer to different people.  I have an
electronic corpus of 14,000+ biographies from a 1904 publication
supplied by Ancestry.com.

I had to put my dissertation work aside for a while during my father's
hospitalization last year, and am still trying to get back into the
swing of things after my father passed away.  It's not easy with a
fulltime job, but hopefully I will get there before too much more time
has passed.

-- Mary D. Taffet
     Ph.D. Candidate/Syracuse University-School of Information Studies
     Scientist/TextWise LLC
     Syracuse, NY

#18 From: Seth Grimes <grimes@...>
Date: Mon Jul 17, 2006 11:42 pm
Subject: RE: Text Analytics e-mail list (fwd)
sethgrimes
Offline Offline
Send Email Send Email
 
---------- Forwarded message ----------
Date: Mon, 17 Jul 2006 16:37:12 -0500
From: "Marsh, Brice" <Brice.F.Marsh@...>

  I'm academicaly curious, but I don't have the time to become an active
participant. My name is Brice Marsh and I'm the Executive Director of Teen
Think Tanks of America, Inc. (www.teenthinktanks.org) and we generate lots
of collaborative material that we need to classify and organize for
reporting purposes. However, my "day job" is as a senior computer
scientist with a federal contractor for NASA at Marshall Space Flight
Center and that keeps me busy. But, I do wish to be able to stay abreast
of your work and the progress of your research. So in this regard, I must
be classified as a "taker" and not a "giver", I'm sorry; but you're more
than welcome to review/use any of the material we have posted on the TTT
website, only with attribution, please.

Thanks.

Brice F. Marsh
bricemarsh@...

#17 From: "dominic_forest" <dominic.forest@...>
Date: Mon Jul 17, 2006 11:33 pm
Subject: Re:Text Segmentation Algorithm
dominic_forest
Offline Offline
Send Email Send Email
 
Tam,

Text Tiling is obviously a good choice. However, as far as I know, the
implementation of this method is not easy, the algorithm is
time-consuming, and the results can be unpredictible.

Have you thought about simply dividing your documents into paragraphs
or (overlapping or non-overlapping) window passages (i.e. sequences of
words)?

Regards,
Dominic

--- In TextAnalytics@yahoogroups.com, tamer adel <tamadel2003@...> wrote:
>
> Hi,All
>   I have text document of one mass and  i want to divide it into
multi paragaph that are coherent portions....the subject is new to me
....i made search and i found text tiling algorithm is preferred
method to execute my task ... is any one know more than me guid  to
another method  or algorithm as i suggested to solve the suggestion
problem.
>   pls, replay is urgent to me till the afternoon of tomorrow.
>
>   regards,
>   tam
>
>
>  Tamer Abu Elenain
>       Software Developer
>       (+2) 012 562 74 21
>    tamadel2003@...
>
>
>
> ---------------------------------
> Yahoo! Messenger with Voice. Make PC-to-Phone Calls to the US (and
30+ countries) for 2¢/min or less.
>

#16 From: "William Hayes" <william.s.hayes@...>
Date: Mon Jul 17, 2006 12:42 pm
Subject: Re: Re: Thesauri/ontology management server?
william_s_hayes
Offline Offline
Send Email Send Email
 
Hi all,

Thank you for your suggestions.  I'm testing/reviewing them to figure out the best solution for our problem.  I'll submit my opinions/results back to the list when I'm done.  I appreciate your time and enthusiasm.  This list has a very nice cross-section of expertise and interests based on the introductions and responses so far.  Kudos to our moderators for starting this list.

William


#15 From: tamer adel <tamadel2003@...>
Date: Mon Jul 17, 2006 11:51 am
Subject: Re:Text Segmentation Algorithm
tamadel2003
Offline Offline
Send Email Send Email
 
Hi,All
I have text document of one mass and  i want to divide it into multi paragaph that are coherent portions....the subject is new to me ....i made search and i found text tiling algorithm is preferred method to execute my task ... is any one know more than me guid  to another method  or algorithm as i suggested to solve the suggestion problem.
pls, replay is urgent to me till the afternoon of tomorrow.
 
regards,
tam


Tamer Abu Elenain
    Software Developer
    (+2) 012 562 74 21
          


Yahoo! Messenger with Voice. Make PC-to-Phone Calls to the US (and 30+ countries) for 2¢/min or less.

#14 From: "lew_larson" <lew_larson@...>
Date: Mon Jul 17, 2006 5:39 am
Subject: Re: Thesauri/ontology management server?
lew_larson
Offline Offline
Send Email Send Email
 
Hi William,
You might want to check into Schemalogic. They are based in Kirkland
WA. www.schemalogic.com

Regards,
Lew Larson
--- In TextAnalytics@yahoogroups.com, "William Hayes"
<william.s.hayes@...> wrote:
>
> Hi all,
>
> Has anyone run across a good thesauri  or ontology management server.
> Something that will allow user editing of hierarchically tagged
canonical
> names with their synonyms, basic visualization and can export into
various
> formats such as ANSI Thesaurus format and is accessible via web
services for
> accessing synonym suggestions for enterprise search engines?  I know
that's
> a pretty tall order, but it's one that we need in the text analytics
area.
> I've run across some fairly expensive tools for this, but I was
hoping to
> find something a good deal less pricey (and hopefully easily
extendable).
>
> We need to manage protein, disease, tissue, cell line, adverse event,
> pathological process, etc thesauri.  I'd like to be able to tag the
protein
> thesauri with various relationship information such as Pathway
> participation, Molecular function, Biological process, etc with is-a and
> part-of relations maintained and be able to tag particular synonyms with
> filterable labels (such as 'rarely used', 'ambiguousWithGeneralText',
> 'ambiguousWithOtherProtein','stopword', etc.).  This would really be a
> terminological resource server for text analytic engines.  The more
easily
> edited (in an ad hoc) fashion, the easier it can be set up as a public
> resource to allow (moderated?) community curation of terminologies
for text
> mining.
>
> TIA,
>
> William
>

#13 From: "sanford schram" <sschram5@...>
Date: Sun Jul 16, 2006 5:31 pm
Subject: Sandy's Intro
barrymsbarry
Offline Offline
Send Email Send Email
 
I am new to the group and I would like to introduce myself. My name is Sanford Schram but everybody call me Sandy. I have a Masters Degree in Engineering. I had done a great deal of work, early in my career, in simulation. Then after seven years at Xerox Data Systems where I managed a number of development projects I formed Computer Strategies and developed Business Systems. I first got involved with TA when I designed and implemented a document processing system for a large client of mine, Baxter Healthcare. I moved on to Business Intelligence and developed a number of large reporting systems and assisted in the development of a number of Data Warehouses where I got introduced to data mining.
 
I have been teaching classes at the undergraduate and graduate level in database design, project management, decision support systems, and enterprise system development.
 
On a recent consulting engagement I had an opportunity to apply my TA experience in a new area, that of documenting software development systems. This is where I am currently focused. Bringing together the various documentation objects, database meta-data and code into a coherent knowledge base with multiple taxonomies so that a variety of development and design groups can work and collaborate efficiently.
 
So I bring a engineer's rather than a scientists perspective to the table. Hopefully my contribution in terms of how do we use specific techniques to solve business problems and my questions along that line will stimulate the more theoretical member of the group.
 
I am pleased to have been allowed to join this group.
 
 
SANFORD SCHRAM
Computer Strategies
(949) 261 7144

#12 From: tamer adel <tamadel2003@...>
Date: Sun Jul 16, 2006 12:50 pm
Subject: Re:Text Segmentations prblm
tamadel2003
Offline Offline
Send Email Send Email
 
Hi,All
It's first time to send msg or request help from Text Analytics group and i hope if you
can help me.
I'm making research about text segmentations and this subject is new to me to solve the following problem :
i want to know the procedures or implementation algorithm to partition or segments one block of text to multi coherent portions of text blocks to facilitates the retrieve or full text search to text document and so i ask if you can guid me to good any reference help me to know more with illustration about Text Segmentations with yr knowledge text segmentation is essential step in textual processing and one of natural language processing .

regards,
Tam.


Tamer Abu Elenain
    Software Developer
    (+2) 012 562 74 21
          


See the all-new, redesigned Yahoo.com. Check it out.

#11 From: eisai@...
Date: Sun Jul 16, 2006 5:40 am
Subject: Introduction
eisaijmf
Offline Offline
Send Email Send Email
 
Hello to Karl Wiig, Seth Grimes, Neil Raden, and everyone I've not yet had the
pleasure of meeting either electronically or in person,

I'm Joe Firestone, Managing Director and CEO of Center for the Open Enterprise,
LLC. COE is the parent company of the Knowledge Management Consortium
International (KMCI), the KMCI Publishing Group (KMCI Press and KMCI Online
Press) and the new Adaptive Metrics Center. Both KMCI and AMC do independent
research and also offer training and consulting services. KMCI in Knowledge
Management and AMC in Business Performance Management and Measurement.

My interest in text analytics goes back to the 1950s when I first learned about
content analysis applied to Soviet studies. Later on, I did some research for
the US Air Force on Intentions Analysis and Forecasting, and on applying
computerized content analysis to the study of national intentions and long-range
forecasting of inter-nation behavior. In the early to middle 70s, I published a
few academic articles using measures of national motives in statistical models
predicting national behavior. Since then, my changing interests have carried me
into many other areas, but I've always tried to keep up with the progress of
text analytics.

More recently, my work in Knowledge Management and Adaptive Metrics has led me
back to a greater focus on text analytics, since I'm persuaded that if you want
to measure the quality of problem and knowledge claim formulation, and also the
quality of knowledge claim evaluation, one of the best ways to develop metrics
is through analysis of the semantic patterns in text. This idea has an important
place in our training workshops and in our treatment of core software tools for
knowledge management in KMCI's CKIM Certificate Training Workshop. It is also
the idea I'll be pursuing most often in this group.

Best,


Joe

Joseph M. Firestone, Ph.D.
Managing Director, CEO
KMCI and the Adaptive Metrics Center
www.kmci.org
www.adaptivemetricscenter.com
http://radio.weblogs.com/0135950

CKO
Executive Information Systems, Inc.
www.dkms.com
703-461-8823

#10 From: TCasey <bcs@...>
Date: Sat Jul 15, 2006 8:30 pm
Subject: Introduction
cmbeachboy
Offline Offline
Send Email Send Email
 

Thank you for allowing me to participate in the TA discussion group.  I would also like an opportunity to fill you in about my work and my practice, Business Consulting Services.

Now in our 16th year, our focus continues in business process improvement and strategic information technology consulting.  Specifically, we seek to improve client performance by streamlining their business processes and work flows first, then recommending the proper technology to deliver the highest ROI.  We also provide support for SOX compliance in the areas of work flow, IT and document management policy.  While our work doesn't specialize in TA techniques per se, it often serves to introduce our clients to the values of the discipline, and broadens their planning perspectives.

I continue to expand and enhance our strategic partner roster.  Through our affiliation with Research and Organization Management (Bethesda, MD), we can perform assessments of staff and executive team's performance.  In addition, you can see how you compare with industry best practices and hundreds of other organizations.  Since last year, we provide knowledge management training and certification classes as an affiliate of the International Knowledge Management Institute (DC).  Classes are available for both groups and individuals, with special discounts for multiple registrations. 
 
We also deliver many ancillary services, such as research, analysis, performance metrics, feasibility studies, RFP development, vendor selection and project management, to name only a few.
 
If you think our services can augment or assist you in any way, I would be happy to discuss the possibilities with you. 

Regards,

Tom Casey, CMC, CCP

Business Consulting Services
610-328-9806

Please visit our web site:   WWW.CONSULTBIZ.COM

"Performance Improvement through Technology Planning and Operational Redesign"


Business Consulting Services improves operating results through business process improvement and information technology consulting.  Serving the business, government and non-profit communities, we provide only senior level resources and skill sets at competitive fees affordable to a client's budget.

* Certified Management Consultant (CMC) is a certification mark awarded by the Institute of Management Consultants USA and represents evidence of the highest standards of consulting, and his adherence to the technical and ethical canons of the profession.  Less than 1% of all management consultants have achieved this level of performance. Certified Computing Professional (CCP) is awarded by the Institute for the Certification of Computing Professionals, and certifies proficiency in the information technology field.

Tom Casey is one of fewer than 15 consultants in the world to have achieved both the Certified Management Consultant (CMC) and Certified Computing Professional (CCP) designations, the only internationally accepted certification in each field.  To achieve this distinction, Mr. Casey has undergone peer reviews, client audits, competency tests and oral interviews; he has complied with continuing education requirements and has pledged to uphold the Codes of Ethics for both organizations.



#9 From: "Robert Raisch" <raisch@...>
Date: Fri Jul 14, 2006 5:50 pm
Subject: Welcome to the list, and a short introduction...
robert_raisch
Offline Offline
Send Email Send Email
 

Hi.  I’m co-moderator of this list and my name is Rob Raisch.

 

I work for Financial Media Holdings Group here in Boston, MA, where we produce publications, products and events of and about corporate regulatory compliance and governance.  Our flag-ship publication, Compliance Week www.complianceweek.com , is a weekly online newsletter reaching more than 40,000 financial and legal executives at U.S. public companies.  Each month we also produce a snazzy physical (atoms not bits) version as well.

 

I’ve been involved with the Internet and other forms of online information retrieval for more than twenty years as a programmer, systems architect, writer, and entrepreneur, and along the way, I’ve consulted with some pretty large companies on a variety of online technologies.  (You’ll find a very out-of-date bio at www.raisch.com .)

 

A lot of what I do here at CW is to help our writers and analysts make sense of the documentation generated by public companies as required by the U.S. Securities and Exchange Commission. If you haven’t seen the mountains of filings public companies have to provide U.S. regulators each year, I think you’d be amazed to learn that the vast majority were designed to be reviewed and analyzed by human beings, rather than by machines.  Equally surprising, very little has changed in how these documents are structured since the SEC was commissioned in the 1930’s so you can imagine the problems they present to anyone interested in extracting usable knowledge from them using a computer.  (Check out http://edgar.sec.gov, the free online repository of some of these documents.)

 

Basically, it’s a big, poorly structured corpus of valuable information; just the thing for which text analytics exists!  For me, the only real saving grace is that this variety of business communications doesn’t deviate much from a small subset of expression, so the task isn’t completely impossible.  ::grin::

 

So, a lot of my day is spent coming up with interesting ways to determine which companies provide country club memberships (and other perquisites) to their executives, or which pharmaceutical companies reported ecologically-related issues as material weaknesses, or how companies over $5B in market cap account for executive stock options.  And while the data can be rather dry (arid!), it’s the hunt I find most fun and rewarding.

 

To do this, I use a loose bag of tools I’ve either collected from the public domain or developed myself including various tokenizers, lexers, parsers, parts-of-speech taggers, named-entity extractors, back-prop neural network classifiers, etc.  The only tools we’ve purchased are the real “lights-out” backbone systems, like our full-text search engine (from Coveo) and relational database (from Microsoft.)  But even then, I’ll use open-source replacements like Lucene and MySql if the job calls for them.

 

So yes, you guessed it!  I’m a serious geek and damned proud of it. 

 

Hopefully, I’ll provide some perspective for those of us working on custom systems using mostly home-grown solutions.  (Oh, I should also mention I have nothing but the greatest respect for the vendors in this space and for their incredibly cool tools.)

 

So, Welcome!  Glad you’re here.

 

--

Robert Raisch, CTO - Financial Media Holdings Group, Inc.

Publishers of "Compliance Week" <http://www.complianceweek.com>

 


#8 From: Seth Grimes <grimes@...>
Date: Fri Jul 14, 2006 5:02 pm
Subject: Natural-language query/question answering
sethgrimes
Offline Offline
Send Email Send Email
 
Hello all,

	 What's going on in the world of natural-language query/question
answering?  For examples of this, see (and try!) --

http://start.csail.mit.edu/

http://brainboost.com

http://answers.com

	 For that matter, go to http://google.com and enter "2 + 2 - 1/17"
or "map Georgia."  Google Enterprise has partners that are extending this
capability to cover artifacts produced in response to a Google OneBox
"search."  If I understand this correctly, the partner's software inserts
Google index entries and Google lists those artifacts along with document
hits.

	 I'm interested in particular in implementations for governmental /
social / economic statistics (and maps).  The sites I cited do alright for
simple questions about demographics but fail on more complex but still
typical questions.  I'd guess that's because they're broadly targeted;
perhaps they could be tuned for the vocabularies and syntaxes of stats
questions.

	 I'd like to hear about academic and industrial research and
productizations and to get pointers to papers.

	 Thanks,

					 Seth


--
Seth Grimes   Alta Plana Corp, analytical computing & data management
               Intelligent Enterprise magazine (CMP), Contributing Editor
grimes@...       http://altaplana.com    301-270-0795

#7 From: "tashlinc" <lincoln@...>
Date: Fri Jul 14, 2006 4:14 pm
Subject: Re: Text analysis for Mac
tashlinc
Offline Offline
Send Email Send Email
 
Why not use a Java-based solution so that it just doesn't matter?

#6 From: "Curt A. Monash" <curtmonash@...>
Date: Fri Jul 14, 2006 12:33 pm
Subject: Since we're doing introductions -- Curt Monash
camonash
Offline Offline
Send Email Send Email
 
Hi all,

I'm Curt Monash.  I've been an analyst of the software industry since 1981, and following linguistics-related technologies since about 1983, when I helped with an investment banking deal for natural language pioneer Artificial Intelligence Corp.   I'm writing a fair amount about text analytics these days, mainly in Computerworld (specifically in my monthly columns in July and probably also August), and even more so at www.texttechnologies.com    Experiences that helped me form my views including being involved in the rise and fall of the classical AI companies in the 1980s; having my own unsuccessful search/classification startup in the late 1990s; and helping build one of the Web's premier sites about public search engines, the Spider's Apprentice, also in the 1990s.   .

My other big areas of professional interest are all in the software and online services industries -- database management, analytics, knowledge discovery, etc..  Most of what I write about those can be found in Computerworld, at www.dbms2.com (database), and at www.monashreport.com (industry strategy and trends, public policy, analytics, etc.)

My Ph.D. was in game theory, and my only post-doc was in public policy.

Looking forward to good discussion,

CAM



Blogs: 
http://www.monashreport.com (Software industry, IT-related public policy, ventures, etc.)
http://www.dbms2.com (DBMS and related technologies)
http://www.texttechnologies.com (Search, text mining, etc.)
http://www.softwarememories.com (Software industry history)

Computerworld columns: http://www.computerworld.com/action/columnist.do?command=viewColumnist&bylineID=894
Core website (slightly out of date) http://www.monash.com


#5 From: "Dominic Forest" <dominic.forest@...>
Date: Thu Jul 13, 2006 5:58 pm
Subject: New member
dominic_forest
Offline Offline
Send Email Send Email
 
Hi all,

I am new to this group. Here are a few words about myself:

I hold a Ph.D. in cognitive computer science from Université du Québec à
Montréal. In my doctoral dissertation ("Application de techniques de forage
de textes de nature prédictive et exploratoire à des fins de gestion et
d'analyse thématique de documents textuels non structurés"), I explored and
validated the use of descriptive and predictive text mining techniques to
assist thematic analysis of unstructured documents. My current research
interests concern the use of hybrid text mining techniques (using concepts
and techniques from both linguistics and artificial intelligence) to assist
ontology development from unstructured documents. I also collaborate in
various projects concerning the application of text mining techniques in the
context of institutional repositories and digital libraries,
computer-assisted reading and text analysis, etc.
I am currently a postdoctoral fellow at Observatoire de Linguistique
Sens-Texte (Université de Montréal) and will be (starting December 1st,
2006) assistant professor at École de Bibliothéconomie et des Sciences de
l'Information (Université de Montréal).

Regards,
Dominic Forest

¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
Dominic Forest
Chercheur postdoctoral
Observatoire de Linguistique Sens-Texte (OLST)
Université de Montréal
 
Courrier électronique : dominic.forest@...
Sites Internet : www.dominicforest.com
_____________________________________________________
 

#4 From: Karl M Wiig <kmwiig@...>
Date: Thu Jul 13, 2006 4:26 am
Subject: Text analysis for Mac
kmwiig
Offline Offline
Send Email Send Email
 
Whereas we help clients with "Windows" and Linux environments, we use
Mac OS 10.4.7 exclusively.  Any suggestions for what you pursue?

Greetings
--
Karl M. Wiig
Chairman
Knowledge Research Institute, Inc.
7101 Lake Powell Drive, Arlington, TX 76016 USA
Phone: (817) 572-6254 / Cell: (682) 554-3998 / Fax: (817) 478-1048
http://www.krii.com

#3 From: "William Hayes" <william.s.hayes@...>
Date: Thu Jul 13, 2006 11:30 am
Subject: Thesauri/ontology management server?
william_s_hayes
Offline Offline
Send Email Send Email
 
Hi all,

Has anyone run across a good thesauri  or ontology management server.  Something that will allow user editing of hierarchically tagged canonical names with their synonyms, basic visualization and can export into various formats such as ANSI Thesaurus format and is accessible via web services for accessing synonym suggestions for enterprise search engines?  I know that's a pretty tall order, but it's one that we need in the text analytics area.  I've run across some fairly expensive tools for this, but I was hoping to find something a good deal less pricey (and hopefully easily extendable).   

We need to manage protein, disease, tissue, cell line, adverse event, pathological process, etc thesauri.  I'd like to be able to tag the protein thesauri with various relationship information such as Pathway participation, Molecular function, Biological process, etc with is-a and part-of relations maintained and be able to tag particular synonyms with filterable labels (such as 'rarely used', 'ambiguousWithGeneralText', 'ambiguousWithOtherProtein','stopword', etc.).  This would really be a terminological resource server for text analytic engines.  The more easily edited (in an ad hoc) fashion, the easier it can be set up as a public resource to allow (moderated?) community curation of terminologies for text mining.

TIA,

William

#2 From: "Megan Delaney" <megan.delaney@...>
Date: Wed Jul 12, 2006 2:46 am
Subject: Oracle Opportunities!
techrecruite...
Online Now Online Now
Send Email Send Email
 
Hello!  Oracle Corp in Redwood Shores, CA has a few full-time opportunities for software developers to work on the Oracle Text development team.  If you have real-world experience with developing Search, Information Retrieval, or NLP and have expertise in C or C++, please take a look at www.oracle.com/technology/products/text/index/html to gain more information.  If you are interested in applying, please send your resume to me at megan.delaney@...
I look forward to hearing from you!

 

 

Take care,

Megan Delaney

Sr. Technical Recruiter
 Oracle Corporation Global HomePage 
   805.643.3299(Direct) 
   805.844.0658 (Mobile) 
   805.643.3299 (Fax) 

 Email :
megan.delaney@...

Register for great jobs at Oracle on iRecruitment https://irecruitment.oracle.com/

 

Oracle Recruiting: "Continuously selected by our clients as the exclusive vendor of preeminent talent"

 

The information in this email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution, or any action taken or omitted to betaken in reliance on it, is prohibited and may be unlawful. No internal Oracle email, except that clearly intended for public distribution (e.g.Oracle Press Releases), should be sent to any party outside Oracle.
 

 

 

#1 From: Seth Grimes <grimes@...>
Date: Thu Jul 13, 2006 10:55 am
Subject: Welcome to the Text Analytics group
sethgrimes
Offline Offline
Send Email Send Email
 
A quick welcome to the Text Analytics discussion group.  I think you'll
find we have a nice mix of researchers, vendors, and practioners here;
also some recruiters.

While I set up the group and Rob Raisch, CTO of Compliance Week is serving
as co-moderator, I hope to do very little moderating.  You're free to post
whatever statements, questions, problems, and announcements you wish so
long as they relate to text analytics as you define that term.  Just
follow the usual rules regarding respect for other list members, and
please identify yourself when posting unless you have a good reason not
to.  Do introduce yourself to the list if you wish.

Thanks all,

				 Seth Grimes


--
Seth Grimes   Alta Plana Corp, analytical computing & data management
               Intelligent Enterprise magazine (CMP), Contributing Editor
grimes@...       http://altaplana.com    301-270-0795

Messages 1 - 30 of 422   Newest  |  < Newer  |  Older >  |  Oldest
Advanced
Add to My Yahoo!      XML What's This?

Copyright © 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help