See if this works for you. It will give you the word counts for counts
greater than 20. You can supply multiple text files when invoking it...
$ ./mostUsed.py /tmp/*.txt
adieus,
cheerio,
b2.
#!/usr/bin/python
import sys, operator, re
def mostUsed(files, minWordCount=20):
words = {}
regex = re.compile('[a-zA-Z]+')
for f in files:
for l in (regex.findall(x) for x in open(f) ):
for w in l: words[w.lower()] =
1+words.setdefault(w.lower(), 0)
for i in sorted( filter(lambda x:x[1]>minWordCount,
words.iteritems()), key=operator.itemgetter(1)):
print i[0], ':', i[1]
if __name__ == '__main__':
mostUsed(sys.argv[1:])
On Mon, Jan 9, 2012 at 3:06 PM, Kavita Singh <kavita.j.singh@...>wrote:
> **
>
>
> The wordcount command - wc
> All Unix and Linux systems will have this
>
>
> [Non-text portions of this message have been removed]
>
>
>
[Non-text portions of this message have been removed]