Search the web
Sign In
New User? Sign Up
ilug-goa

Group Information

  • Members: 484
  • Category: Linux
  • Founded: Feb 27, 2000
  • Language: English
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Message search is now enhanced, find messages faster. Take it for a spin.

Messages

  Messages Help
Advanced
Offtopic: How to count the most used words from a large document?   Message List  
Reply Message #21143 of 21145 |
Re: [ILUG-GOA] Re: Offtopic: How to count the most used words from a large document?

See if this works for you. It will give you the word counts for counts
greater than 20. You can supply multiple text files when invoking it...
$ ./mostUsed.py /tmp/*.txt

adieus,
cheerio,
b2.

#!/usr/bin/python

import sys, operator, re

def mostUsed(files, minWordCount=20):
words = {}
regex = re.compile('[a-zA-Z]+')
for f in files:
for l in (regex.findall(x) for x in open(f) ):
for w in l: words[w.lower()] =
1+words.setdefault(w.lower(), 0)

for i in sorted( filter(lambda x:x[1]>minWordCount,
words.iteritems()), key=operator.itemgetter(1)):
print i[0], ':', i[1]

if __name__ == '__main__':
mostUsed(sys.argv[1:])




On Mon, Jan 9, 2012 at 3:06 PM, Kavita Singh <kavita.j.singh@...>wrote:

> **
>
>
> The wordcount command - wc
> All Unix and Linux systems will have this
>
>
> [Non-text portions of this message have been removed]
>
>
>


[Non-text portions of this message have been removed]




Mon Jan 9, 2012 10:44 am

b2ornot2b
Offline Offline
Send Email Send Email

Message #21143 of 21145 |
Expand Messages Author Sort by Date

Hello, This might be a off topic question here, but I need some help. How to count the most used words from a large document? What are the tools available, ...
Niju Mohan
nijuweb Offline Send Email
Jan 9, 2012
9:21 am

The wordcount command - wc All Unix and Linux systems will have this [Non-text portions of this message have been removed]...
Kavita Singh
kavitajohry Offline Send Email
Jan 9, 2012
9:36 am

See if this works for you. It will give you the word counts for counts greater than 20. You can supply multiple text files when invoking it... $ ./mostUsed.py...
Blinston Fernandes
b2ornot2b Offline Send Email
Jan 9, 2012
10:44 am
Advanced

Copyright © 2010 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines NEW - Help