Here's a example script that performs a simple task. The task is:
suppose you a bunch of files named
access_log.1.gz
access_log.2.gz
access_log.3.gz
these are weekly log files with their date range contained in the file.
you want to rename these files to indicate their date range.
20050101-20050105.gz
20050105-20050110.gz
20050110-20050115.gz
The following code is a pure Python script that does the job.
-------------------------------
# -*- coding: utf-8 -*-
# Python
import os,re,gzip
# go to the current dir
# get a list of file names of the form access_log.*.gz
# for each file
# unzip it (make sure it doesn't override existing file)
# find the date from the first line and last line
# rename the file such as 0605-0612.log.gz
# note: this description does not necessarily match the code
# but gives a basic view
mydir= '/Users/t/logs/'
mon={
'Jan':'01',
'Feb':'02',
'Mar':'03',
'Apr':'04',
'May':'05',
'Jun':'06',
'Jul':'07',
'Aug':'08',
'Sep':'09',
'Oct':'10',
'Nov':'11',
'Dec':'12',
}
def getdate (li):
li = li.split(' ')[3][1:12];
datelist = li.split('/');
dd=datelist[0]
mmm=datelist[1]
yyyy=datelist[2]
return str(yyyy) + str(mon[mmm]) + str(dd)
for ff in os.listdir(mydir):
if re.search(r'access_log\.\d\.gz',ff):
print ff
unzippedname=ff[0:-3]
inF = gzip.GzipFile(ff, 'rb');
s=inF.readlines()
inF.close()
start_date = getdate(s[0])
end_date = getdate(s[-1])
new_file_name= start_date + '-' + end_date + '.txt'
outF = file(new_file_name, 'w');
outF.write(''.join(s))
outF.close()
# do compression of the new files
# delete the original gzip file
s[0:-1]=[]
-------------------------------
This script does not use system call, so it is system independent and
can be used in Unix or Windows. The problem with this script is that
usually log files are very large in size, usually over 100 Mega bytes.
Because this script reads the whole file into memory, it is very slow
and memory intensive.
a solution is to call OS programs such as gzip to do the decompression,
and use unix “head” and “tail” to get the first or last line of the
file for extraction of the date range. We will study that and post
solution later.
Xah
xah@...
∑ http://xahlee.org/
☄