Addendum to the tutorial on Sorting in Perl and Python at:
http://xahlee.org/perl-python/sort_list.html
Language and Sort Optimization: decorate-sort-dedecorate
2006-08-27
There are many algorithms for sorting. Each language can chose which
to use. See wikipedia for a detailed list and explanation:
Sorting_algorithm↗ .
However, does not matter which algorithm is used, ultimately it will
need the order-deciding function on the items to be sorted. Suppose
your items are (a,b,c,d,...), and your order-deciding function is F.
Various algorithms will try to minimize the number of times F is
called, but nevertheless, F will be applied to a particular element
in the list multiple times. For example, F(a,b) may be called to see
which of “a” or “b” comes first. Then, later the algorithm
might need to call F(m,a), or F(a,z). The point here is that, F will
be called many times on arbitrary two items in your list, even if one
of the element has been compared to others before.
Now suppose, you are sorting some polygons in 3D space, or personal
records by the person's address's distance from a location, or
sorting matrixes by their eigen-values in some math application, or
ordering files by number of occurrences of some text in the file.
In general, when you define your decision function F(x,y), you will
need to extract some property from the elements to be sorted. For
example, when sorting points in space by a criterion of distance, one
will need to compute the distance for the point. When sorting
personal records from database by the person's location, the decision
function will need to retrieve the person's address from the
database, then find the coordinate of that address, that compute the
distance from there to a given coordinate. In sorting matrixes in
math by eigen-values, the order-decision will first compute the eigen-
value of the matrix. A common theme from all of the above is that
they all need to do some non-trivial computation on each element.
As we can see, the order-decision function F may need to do some
expensive computation on each element first, and this is almost
always the case when sorting elements other than simple numbers.
Also, we know that a sorting algorithm will need to call F(x,y) many
times, even if one of x or y has been compared to others before. So,
this may result high inefficiency. For example, you need to order
people by their location to your home. So when F(Mary,Jane) is
called, Mary's address is first retrieved from a database across a
network, the coordinate of her address is looked up in another
database. Then the distance to your home are computed using spherical
geometry. The exact same thing is done for Jane. But later on, it may
call F(Mary,Chrissy), F(Mary,Jonesy), F(Mary,Pansy) and so on, and
the entire record retrieval for Mary is repeated many times.
One solution, is to do the expensive extraction one time for each
element, then associate that with the corresponding elements. Suppose
this expensive extraction function is called gist(). So, you create a
new list ([Mary,gist(Mary)], [Jane,gist(Jane)], [John,gist(John)],
[Jenny,gist(Jenny)], ...) and sort this list instead, when done,
remove associated gist. This technique is sometimes called decorate-
sort-dedecorate.
In Perl programing, this decorate-sort-dedecorate technique is
sillily known as Schwartzian Transform as we have demonstrated
previously. In Python, they tried to incorporate this technique into
the language, by adding the “key” optional parameter, which is our
gist() function.
Xah
xah@...
∑ http://xahlee.org/
☄
The Condition of Industrial Programers
Xah Lee, 2006-05
Before i stepped into the computing industry, my first industrial
programing experience is at Wolfram Research Inc as a intern in 1995.
(Wolfram Research is famously known for their highly successful
flagship product Mathematica) I thought, that the programers at
Wolfram are the world's top mathematicians, gathered together to
research and decide and write a extremely advanced technology. But i
realized it is not so. Not at all. In fact, we might say it's just a
bunch of Ph Ds (or equivalent experience). Each person there are not
unlike average white-collar Joes. Each working individually. And,
fights and bouts of arguments between co-workers are not uncommon.
Sometimes downright ugly. Almost nothing is as i naively imagined, as
if some world's top mathematicians are gathered together there, daily
to confer and solve the world's top problems as in some top secret
government agency depicted in movies.
Well, that was my introduction to the industry. The bulk of my
surprise is due to my naiveness and inexperience of the industry, of
any industry, as i was just a intern and this is my first experience
seeing how the real world works.
After Wolfram, after a couple of years i went into the web programing
industry in 1998, using unix, Perl, Apache, Java, database
technologies, in the center of world's technology the Silicon Valley.
My evaluation of industrial programers and how software are written
is a precipitous fall from my observations at Wolfram. In the so-
called Info Tech industry, the vast majority of programers are
despicable morons. I learned this from my colleagues, and in dealing
with programers from other companies, service providers, data
centers, sys admins, API gateways, and duties of field tutoring. I
didn't think i had very qualified expertise in what i do, but the
reality i realized is that most are far lesser than me, and that is
the common situation. That they have no understanding of basic
mathematics such as trigonometry or calculus. Most have no interest
in math whatsoever, and would be hard pressed for them to explain
what is a “algorithm”.
I have always thought, that programing X software of field Y usually
means that the programers are thoroughly fluent in languages,
protocols, tools of X, and also being a top expert in field of Y. But
to my great surprise, the fact is that that is almost never the case.
In fact, most of the time the programers simply just had to learn a
language, protocol, software tool, right at the moment as he is
trying to implement a software for a field he never had experience
in. I myself had to do jobs half of the time i've never done before.
Constantly I'm learning new languages, protocols, systems, tools,
APIs, other rising practices and technologies, reading semi-written
or delve into non-existent docs. It is the norm in the IT industry,
that most products are really produces of learning experiences.
Extremely hurried grasping of new technologies in competition with
deadlines. There is in fact little actual learning going on, as there
are immense pressure to simply “get it to (demonstrably) work” and
ship it.
Thinking back, in fact the Wolfram people are the most knowledgeable
and inquisitive people i've met as colleagues, by far.
What prompted me to write this essay is after reading the essay Teach
Yourself Programming in Ten Years by Peter Norvig, 2001, at http://
www.norvig.com/21-days.html (local copy). In which, the Lisp
dignitary Peter Norvig derides the widely popular computing books in
the name of Teaching Yourself X In (Fast) Days. Although i agree with
his general sentiment that a language or technology takes time to
master and use well, that these books is a damaging fad and subtly
generate ignorance, but he fails to address the main point, that is:
the cause of the popularity of such books, and how to remedy the
situation.
These books are the bedrock of the industry. It is not because people
are impatient, or that they wish to hurry, but rather, it is the
condition of the IT industry, in the same way modern society drives
people to live certain live styles. No amount of patience or
proselytization can right this, except that we change the industry's
practice of quickly churning out bug-ridden software products to beat
competitors. Companies do that due to market forces, and the market
forces is a result of how people and organizations actually choose to
purchase software. In my opinion, a solution to this is by installing
the concept of responsible licenses, as i've detailed in the essay
Responsible Software Licensing, at http://xahlee.org/UnixResource_dir/
writ/responsible_license.html .
----
This post is archived at:
http://xahlee.org/UnixResource_dir/writ/it_programers.html
☄
Tabs versus Spaces in Source Code
Xah Lee, 2006-05-13
In coding a computer program, there’s often the choices of tabs or
spaces for code indentation. There is a large amount of confusion
about which is better. It has become what's known as “religious
war” — a heated fight over trivia. In this essay, i like to
explain what is the situation behind it, and which is proper.
Simply put, tabs is proper, and spaces are improper. Why? This may
seem ridiculously simple given the de facto ball of confusion: the
semantics of tabs is what indenting is about, while, using spaces to
align code is a hack.
Now, tech geekers may object to this simple qualification because
they itch to drivel about different editors and so on. Simply put,
the alleged problem created by tabs as seen in the industry coders
are caused by 2 things: (1) stupid tech geekers who did not
understand things and think about things. (i.e. the semantics of tabs
vs spaces) (2) Due to (1), it has created a massive none-
understanding and mis-use, to the degree that many editors are fucked
upfront (e.g. vi), so that in the end spaces seem to be actually
better. In short, this is a phenomenon of misunderstanding begetting
a snowball of misunderstanding, such that it created a cultural
milieu to embrace this misunderstanding and kick what is true. (this
happens a lot in unix. For one non-unix example, is the file name's 3-
character “extension” (e.g. “.txt”, “.html”) as part of
the file name. Another popular example is the HTML practices in the
industry, where stupid coding and misunderstanding are so wide-spread
such that they force the correct way to the side by any eventual
standardization)
Now, tech geekers may still object, that tabs requires the editors to
set their positions, and plain files don't carry that information.
This is a good question, and the solution is to advance the sciences
such that your source code in some way embed such information. This
would be progress. However, this is never thought of because the
fucking unix already fucked up people's mind by hacking things. In
this case, many will simply use the character intended to word
separation for the purpose of indentation or alignment, and spread it
with hostile drivels.
Now, given the already fucked up situation of the tabs vs spaces by
the unixers (vi) and unix brain-washing of the coders in the
industry... What can we do NOW? I do not have a good proposition,
other than just use whichever that works for you but put more
critical thinking into things to prevent mishaps like this.
Tabs vs Spaces can be thought of as parameters vs hard-coded values,
or HTML vs ascii format, or XML/CSS vs HTML 4, or structural vs
visual, or semantic vs format. In these, in is always easy to convert
from the former to the latter, but near impossible from the latter to
the former. And, that is because the former encodes information that
is lost in the latter. If we look ta tabs vs spaces, indeed, it is
easy to convert tabs to space in a source code, but more difficult to
convert from spaces to tabs. Because, tabs as indentation actually
contains the semantic information about indentation. With spaces,
this critical information is botched.
This issue is intimately related to another issue in source code:
soft-wrapped lines versus physical, hard-wrapped lines by EOL (end of
line character). This issue here is far more consequences than tabs
vs space, and the unixer's unthinking has made far-reaching damages
in the computing industry. Due to unix's EOL ways of thinking, it has
created languages based on EOL (just about ALL languages except the
Lisp family including Mathematica) and tools based on EOL (cvs, diff,
grep, and basically every tool in unix), thoughts based on EOL (code
size estimating by counting EOL, hard-coded email quoting system by
“>” prefix, and silent line-truncations in many unix tools), such
that any progress towards a “programing code unit” or
“algorithmic code unit” concept are suppressed. I have not written
a full account on this issue, but i've touched it in this essay:
“The Harm of hard-wrapping Lines”, at http://xahlee.org/
UnixResource_dir/writ/hard-wrap.html
----
This post is archived at
http://xahlee.org/UnixResource_dir/writ/tabs_vs_spaces.html
☄
Python, Lambda, and Guido van Rossum
Xah Lee, 2006-05-05
In this post, i'd like to deconstruct one of Guido's recent blog
about lambda in Python.
In Guido's blog written in 2006-02-10 at http://www.artima.com/
weblogs/viewpost.jsp?thread=147358
is first of all, the title “Language Design Is Not Just Solving
Puzzles”. In the outset, and in between the lines, we are told
that “I'm the supreme intellect, and I created Python”.
This seems impressive, except that the tech geekers due to their
ignorance of sociology as well as lack of analytic abilities of the
mathematician, do not know that creating a language is a act that
requires little qualifications. However, creating a language that is
used by a lot people takes considerable skill, and a big part of that
skill is salesmanship. Guido seems to have done it well and seems to
continue selling it well, where, he can put up a title of
belittlement and get away with it too.
Gaudy title aside, let's look at the content of his say. If you
peruse the 700 words, you'll find that it amounts to that Guido does
not like the suggested lambda fix due to its multi-line nature, and
says that he don't think there could possibly be any proposal he'll
like. The reason? Not much! Zen is bantered about, mathematician's
impractical ways is waved, undefinable qualities are given, human's
right brain is mentioned for support (neuroscience!), Rube Goldberg
contrivance phraseology is thrown, and coolness of Google Inc is
reminded for the tech geekers (in juxtaposition of a big notice that
Guido works there.).
If you are serious, doesn't this writing sounds bigger than its
content? Look at the gorgeous ending: “This is also the reason why
Python will never have continuations, and even why I'm uninterested
in optimizing tail recursion. But that's for another installment.”.
This benevolent geeker is gonna give us another INSTALLMENT!
There is a computer language leader by the name of Larry Wall, who
said that “The three chief virtues of a programmer are: Laziness,
Impatience and Hubris” among quite a lot of other ingenious
outpourings. It seems to me, the more i learn about Python and its
leader, the more similarities i see.
So Guido, i understand that selling oneself is a inherent and
necessary part of being a human animal. But i think the lesser beings
should be educated enough to know that fact. So that when minions
follow a leader, they have a clear understanding of why and what.
----
Regarding the lambda in Python situation... conceivably you are right
that Python lambda is perhaps at best left as it is crippled, or even
eliminated. However, this is what i want: I want Python literatures,
and also in Wikipedia, to cease and desist stating that Python
supports functional programing. (this is not necessarily a bad
publicity) And, I want the Perl literatures to cease and desist
saying they support OOP. But that's for another installment.
----
This post is archived at:
http://xahlee.org/UnixResource_dir/writ/python_lambda_guido.html
Xah
xah@...
∑ http://xahlee.org/
☄
What is Expressiveness in a Computer Language
Xah Lee, 200502.
In languages human or computer, there's a notion of expressiveness.
English for example, is very expressive in manifestation, witness all
the poetry and implications and allusions and connotations and
dictions. There are a myriad ways to say one thing, fuzzy and warm
and all. But when we look at what things it can say, its power of
expression with respect to meaning, or its efficiency or precision,
we find natural languages incapable.
These can be seen thru several means. A sure way is thru logic,
linguistics, and or what's called Philosophy of Languages. One can
also glean directly the incapacity and inadequacy of natural
languages by studying the artificial language lojban, where one
realizes, not only are natural languages incapable in precision and
lacking in efficiency, but simply a huge number of things are near
impossible to express thru them.
One thing commonly misunderstood in computing industry is the notion
of expressiveness. If a language has a vocabulary of (smile, laugh,
grin, giggle, chuckle, guffaw, cackle), then that language will not
be as expressive, as a language with just (severe, slight, laugh,
cry). The former is “expressive” in terms of nuance, where the
latter is expressive with respect to meaning.
Similarly, in computer languages, expressiveness is significant with
respect to semantics, not syntactical variation.
These two contrasting ideas can be easily seen thru Perl versus
Python languages, and as one specific example of their text pattern
matching capabilities.
Perl is a language of syntactical variegations. Lisp on the other
hand, is a example of austere syntax uniformity, yet its efficiency
and power in expression, with respect to semantics, showcases Perl's
poverty in specification.
200509, 200603.
Example: String Patterns in Regex
In Perl, the facilities for finding and replacing a text pattern is
this construct “$myText =~ s/myPattern/replStr/”.
In Python, it is “re.sub( myPattern, replStr, myText , myCount)”,
where the replacement string repStr can be a function, and a max
number of replacement myCount can be specified. If there is a match,
and if replStr is given a function myFunc instead, then a special
regex match object is feed to myFunc and it can return different
strings based on how the pattern is matched.
This is a instance where the language is more expressive. In Python's
regex facilities, there are several other flexibilities not present
in Perl. For example, its regex pattern can be frozen as a compiled
object and moved about. And, when there is a match, Python returns a
special object that contains all information about the regex, such as
the original pattern, the number of matches, the set of matched
strings and so on, and this object can be passed around, or have
information extracted. These facilities are absent in Perl.
In this case, the flexibilities and facilities of Python's texual
patterns matching's capabilities is a instance of superior
expressiveness to Perl's.
For more about Python's Regex facilities, see the Re-written Python
Regex Documentation at: http://xahlee.org/perl-python/python_re-write/
lib/module-re.html
Example: Range, and Computation with Fractions
In Perl, there's a range construct “(a..b)” that returns a list of
numbers from a to b. However, this construct cannot generate a list
with regular intervals such as (1, 3, 5, 7, 9). Nor would it generate
decreasing lists such as (4, 3, 2, 1) when a > b.
In Python, its range function range(a,b,c) returns a range of numbers
from a to b with steps c. For example, range(3,-5, -2) returns [3, 2,
1, -1, -3]. The arguments can be negative, however, they all must be
integers.
In the Mathematica language, there's also a range function Range
[a,b,c]. Here, the arguments can be either real numbers or exact
fractions. If one of the input argument is a decimal, then the
computation is done as machine precision approximations and the
result list's elements are also in decimals. If all inputs are in
fractions (including integers), returned list's elements will also be
exact fractions.
In Mathematica, the language support fractions wherever a number is
used, and if all numbers are specified as fractions (or integers) in
the code, the result of the computation is also in exact fractions,
with it automatically in a reduced fraction form where common factors
in the numerator and denominator are taken out. (Fractions in
Mathematica is written as a/b with a and b integers.)
This section compares the expressiveness of a particular list
generating facility in 3 languages. But far more importantly, being
able to compute with fractions naturally is a expressibility unheard
of in non-commercial languages.
Example: Rich and Systematic Function Parameters
In many languages, there is a function called “map”. Usually it
takes the form map(f, myList) and returns a list where f is applied
to every element in myList. For example, here is a example from
Python and Perl and lisp:
# python
map( lambda x:x**2 , [1,2,3])
# perl
map {$_**2} (1,2,3);
; lisp
(map (lambda (x) (expt x 2)) (list 1 2 3))
In the Mathematica language, its map function takes a optional third
argument, which specifies the level of the list to map to. In the
following example, f is mapped to the leafs of the list {1,2,{3,4}}.
In[1]:=
Map[Function[#^2],{1,2,{3,4}}, {-1}]
Out[2]=
{1,4,{9,16}}
The expressive power of a language does not solely lies with its
functions taking more options. Such is only a small aspect of
expressibility. However, if a language's functions are designed such
that they provide important, useful features in a consistent scheme,
certainly makes the language much more expressive.
In the functional language Mathematica, there is a host of functions
that manipulate lists, with options that work on the list's
structural level or syntactical pattern. In terms of expressiveness,
this is a superior example.
Example: Symbolic Computation
Lisp differs from most imperative programing languages in that it
deals with symbols. What does this mean?
In imperative languages, a value can be assigned a name, and this
name is called a variable. For example, “x=3”, and whenever this
“name” is encountered, it is evaluated to its value. It does not
make any sense, to have variables without a assigned value. That is,
the “x” is not useful and cannot be used until you assign
something to it.
However, in lisp, there is a concept of Symbols. As a way of
explanation, a “variable” needs not to be something assigned of a
value. Symbols can stand by themselves in the language. And, when a
symbol is assigned a value, the symbol can retain its symbolic form
without becoming a value.
This means that in lisp, “variables” can be manipulated in its un-
evaluated state. The situation is like the need for the “evaluate”
command in many languages, where the programer can built code as
strings and do “evaluate(myCodeString)” to achieve meta-programing.
For example, in imperatives languages once you defined “x=3”, you
cannot manipulate the variable “x” because it gets evaluated to 3
right away. If you want, you have to build a string “"x"” and
manipulate this string, then finally use something like “evaluate
(myCodeString)” to achieve the effect. If the imperative language
does provide a “evaluate()” function, its use breaks down quickly
because the language is not designed for doing it. It's extremely
slow, and impossible to debug, because there lacks facilities to deal
with such meta programing.
In lisp, variable's unevaluated form are always available. One just
put a apostrophe ' in front of it. In “x=3”, the x is a variable
in the contex of the code logic, x is a name of the variable in the
context of meaning analysis, and x is a symbol in the context of the
programing language. This Symbols concept is foreign to imperative
languages. It is also why lisp are known as symbolic languages. Such
makes meta-programing possible.
The power of symbolic processing comes when, for example, when you
take user input as code, or need to manipulate math formulas, or
writing programs that manipulates the source code, or generate
programs on the fly. These are often needed in advanced programing
called Artificial Intelligence. This is also the reason, why lisp's
“macro” facilities is a powerful feature unlike any so-called
“pre-processors” or “templates” in imperative languages that
works by processing strings.
Mathematica for example, is sometimes known as a Computer Algebra
System. It too, works with symbols in the language. They can be
manipulated, transformed, assigned a value, evaluated, hold
unevaluated etc.
One way for a imperative programer to understand symbols, is to think
of computing with strings, such as which Perl and Python are well
known for. With strings, one can join two strings, select sub
strings, use string pattern (regex) to transform strings, split a
string into a list for more powerful manipulation, and use “eval”
to make the string alive. Now imagine all these strings need not be
strings but as symbols in the language, where the entire language
works in them and with them, not just string functions. That is
symbolic computation.
Here we see, a expressibility unseen in non-lisp family of languages.
Example: Expressiveness in Syntax Variability
See: The Concepts and Confusions of Pre-fix, In-fix, Post-fix and
Fully Functional Notations, at: http://xahlee.org/UnixResource_dir/
writ/notations.html
Example: Classes in OOP (in Java, JavaScript, Python)
----
This post is archived at:
http://xahlee.org/perl-python/what_is_expresiveness.html
☄
The Concepts and Confusions of Pre-fix, In-fix, Post-fix and Fully
Functional Notations
http://xahlee.org/UnixResource_dir/writ/notations.html
A side note: the terminology “Algebraic” Notation is a misnomer.
It seems to imply that such notations have something to do with the
branch of math called algebra while other notation systems do not.
The reason the name Algebraic Notation is used because when the
science of algebra was young, around 1700s mathematicians are dealing
with equations using symbols like “+ × =” written out similar to
the way we use them today. This is before the activities of
systematic investigation into notation systems as necessitated in the
studies of logic in 1800s or computer languages in 1900s. So, when
notation systems are actually invented, the conventional way of
infixing “+ × =” became known as algebraic because that's what
people think of when seeing them.
☄
The Concepts and Confusions of Pre-fix, In-fix, Post-fix and Fully
Functional Notations
Xah Lee, 2006-03-15
Let me summarize: The LISP notation, is a functional notation, and is
not a so-called pre-fix notation or algebraic notation.
Algebraic notations have the concept of operators, meaning, symbols
placed around arguments. In algebraic in-fix notation, different
symbols have different stickiness levels defined for them. e.g. “3
+2*5>7” means “(3+(2*5))>7”. The stickiness of operator symbols
are normally called “Operator Precedence”. It is done by giving a
order specification for the symbols, or equivalently, give each
symbol a integer index, so that for example if we have
“a⊗b⊙c”, we can unambiguously understand it to mean one of
“(a⊗b)⊙c” or “a⊗(b⊙c)”.
In a algebraic post-fix notation known as Polish Notation, there
needs not to have the concept of Operator Precedence. For example,
the in-fix notation “(3+(2*5))>7” is written as “3 2 5 * + 7
>”, where the operation simply evaluates from left to right.
Similarly, for a pre-fix notation syntax, the evaluation goes from
right to left, as in “> 7 + * 5 2 3”.
While functional notations, do not employ the concept of Operators,
because there is no operators. Everything is a syntactically a
“function”, written as f(a,b,c...). For example, the same
expression above is written as “>( +(3, *(2,5)), 7)” or
“greaterThan( plus(3, times(2,5)), 7)”.
For lisps in particular, their fully functional notation is
historically termed sexp (short for S-Expression, where S stands for
Symbolic). It is sometimes known as Fully Parenthesized Notation. For
example, in lisp it would be (f a b c ...). In the above example it
is: “(> (+ 3 (* 2 5)) 7)”.
The common concepts of “pre-fix, post-fix, in-fix” are notions in
algebraic notations only. Because in Full Functional Notation, there
is no concept for the positioning of the operator among its operands.
A Function's arguments are simply explictly written out inside a pair
of enclosing delimiters.
Another way to see that lisp notation are not “pre” anything, is
by realizing that the “head” f in (f a b c) can be defined to be
placed anywhere. e.g. (a b c f) or even (a f b c), and it's still not
pre- or in- or post- anything. For example, in the language
Mathematica, f(a b c) would be written as f[a,b,c] where the argument
enclosure symbols is the square bracket instead of parenthesis, and
argument separator is comma instead of space, and the function symbol
(or head) is placed in outside and in front of the argument enclosure
symbols.
The reason for the misconception that lisp notations are “pre-fix”
is because the head appears before the enclosed arguments. Such use
of the term “pre-fix” is a confusion engenderer because the
significance of the term lies in Algebraic notation systems.
2000-02-21
The common name for the lisp way is Fully Parenthesized Notation.
This syntax is the most straightforward to represent a tree, but it's
not the only choice. For example, one could have Fully Parenthesized
Notation by simply moving the semantics of the first element to the
last. You write (arg1 arg2 ... f) instead of the usual (f arg1 arg2).
Like wise, you can essentially move f anywhere and still make sense.
In Mathematica, they put the f in front of the paren, and use square
brackets instead. e.g. f[a,b,c], Sin[3], Map[f,list] ... etc. The f
in front of parent makes better conventional sense until f is itself
a list which then we'll see things like f[a,b][c, g[3,h]] etc. It's
worse when there are arbitrary nesting of heads.
A pre-fix notation in Mathematica is represented as “f@arg”.
Essentially, a pre-fix notation in this context limits it to uses for
function that has only one argument. More example: “f@a@b@c” is
equivalent to “f[a[b[c]]]” or in lispy “(f (a (b c)))”. A post-
fix notation is similar. In Mathematica it is, e.g. “c//b//a//f”.
For example “List[1,2,3]//Sin” is syntactically equivalent to
“Sin[List[1,2,3]]” or “Sin@List[1,2,3]”. (and they are
semantically equivalent to “Map[Sin, List[1,2,3]]” in Mathematica)
For in-fix notation, the function symbol is placed between its
arguments. In Mathematica, the generic form for in-fix notation is by
sandwiching the tilde symbol around the function name. e.g. “Join
[List[1,2],List[3,4]]” can be written as “List[1,2] ~Join~ List
[3,4]”.
In general, when we say C is a in-fix notation language, we don't
mean it's strictly in-fix but the situation is one-size-fits-all for
convenience. Things like “i++”, “++i”, “for(;;)”, 0x123,
“sprint(...%s...,...)”, ... are syntax whimsies. (that is, a ad
hoc syntax soup)
In Mathematica for example, there is quite a lot syntax sugars beside
the above mentioned systimatic constructs. For instance, Plus[a,b,c]
can be written as “a+b+c”, “Plus[a+b,c]”, “Plus[Plus
[a,b],c]”, or “(a+b)~Plus~c”.
The gist being that certain functions such as Plus is assigned a
special symbol '+' with a particular syntax form to emulate the
irregular and inefficient but nevertheless well-understood
conventional notation. For another example: Times[a,b] can be also
written as “a*b” or just “a b”. Mathematica also have C
language's convention of “i++”, “++i”, “i+=1” for examples.
As a side note, the Perl mongers are proud of their slogan of There
Are More Than One Way To Do It in their gazillion ad hoc syntax
sugars but unaware that in functional languages (such as Mathematica,
Haskell, Lisp) that there are consistent and generalized constructs
that can generate far far more syntax variations than the ad hoc
prefixed Perl both in theory AND in practice. (in lisps, their power
syntax variation comes in the guise of macros.) And, more
importantly, Perlers clamor about Perl's “expressiveness” more or
less on the useless syntax level but don't realize that semantic
expressibility is what's really important.
----
This post is archived at:
http://xahlee.org/UnixResource_dir/writ/notations.html
☄
Examples of Quality Documentation
Xah Lee, 200512
I had the pleasure to read the PHP's manual today.
http://www.php.net/manual/en/
Although Pretty Home Page is another criminal hack of the unix
lineage, but if we are here to judge the quality of its
documentation, it is a impeccability.
It has or possesses properties of:
• To the point and useful.
PHP has its roots in mundaness, like Perl and Apache. Its doc being
practicality oriented isn't a surprise, as are the docs of Perl and
Apache.
• Extreme clarity!
The doc is extremely well-written. The authors's writing skill shows,
that they can present their ideas clearly, and also that they have
put thoughts into what they wanted to say.
• Ample usage examples.
As with Perl's doc, PHP doc is not afraid to show example snippets,
yet not abuse it as if simply slapping on examples in lieu of proper
spec or discussion.
• Appropriate functions or keywords are interlinked.
This documentation media's utility aspect is also almost always
employeed in other quality docs, such as Mathematica, Java, MS
JScript, Perl's official docs.
• No abuse of jargons.
In fact, it's so well written that there's almost no jargon in its
docs, yet conveys its intentions to a tee. This aspect can also be
seen in Mathematica's doc, or Microsoft's JScript doc, for examples.
• No author masturbation. (in fact, it is not written using a first-
person perspective, as is the case with most professional
documentation.)
We must truely appreciate the authors of the PHP doc. Because, PHP,
as a free shit in the unix shit culture, with extreme ties to Perl
and Apache (both are the worse cases of tech writing), but can wean
itself from a shit milieu and stand pure and clean to be a paragon of
technical writing.
The world's worst docs are the Unix (man pages) http://
www.FreeBSD.org/cgi/man.cgi,
Perl http://perldoc.perl.org/,
Apache http://httpd.apache.org/docs/1.3/,
Python http://www.python.org/doc/2.4.2/ .
As i have alluded or expounded before, the unix & Apache are
criminally the worst, Perl being a immediate follow up. Python's on a
class of its own, being a mutated Computer Sciency pretension whose
usability is worse than Perl's.
Here is a sample list of a variety of quality technical writings:
• Mathematica
http://documents.wolfram.com/mathematica/
• Microsoft's JScript official docs
http://msdn.microsoft.com/library/en-us/script56/html/
js56jsconjscriptfundamentals.asp
• Emacs lisp introduction by Robert J Chassell
http://www.gnu.org/software/emacs/emacs-lisp-intro/ (GNU project's
documentations are almost always quality documentations. For example,
the official emacs manual http://www.gnu.org/software/emacs/manual/
and elisp manual http://www.gnu.org/software/emacs/elisp-manual/
elisp.html
ale both of high quality.)
• Java's official doc
http://java.sun.com/j2se/1.4.2/docs/api/index.html
Java, being a bottled-up inflexible language with incessant lies
backup by huge amounts of $money$, nevertheless hired professional
writers for its huge official documentation — produced a very well
done doc for a very complex language. (however, the official Java
Tutorial “Learning the Java Language” http://java.sun.com/docs/
books/tutorial/java/index.html is a documentation garbage. (due, in
this case, not for lack of writing skill, but lack of understanding
OOP and pedagogy.))
• Scheme's official doc (R5RS)
http://www.schemers.org/Documents/Standards/R5RS/HTML/r5rs-Z-H-2.html
• Scheme tutorial (Teaching yourself Scheme in Fixnum Days, by Dorai
Sitaram)
http://www.ccs.neu.edu/home/dorai/t-y-scheme/t-y-scheme-Z-H-1.html
These are all quality technical writings. They have different styles
and audiences and coverages. If you want to see clarity and
concision, see JScript, PHP, and Scheme tutorial. If you want to see
clarity with verbosity, see Emacs Lisp Intro, Emacs manual, Elisp
manual. For clarity sans arcana yet covers esoterica, see the
Mathematica doc. Some of these are written for people with no
experience in programing, yet functions as equivalent to teaching/
documenting extremely advanced programing concepts. If you want to
see proper use of jargons at a IT professional level, see the Java
doc. If you want to see exemplary tech writing in a academic style,
see the Scheme R5RS.
------------
http://xahlee.org/perl-python/quality_docs.html
☄
One-Liner Loop in Functional Style
Xah Lee, 200510
Today we show a example of a loop done as a one-liner of Functional
Programing style.
Suppose you have a list of file full paths of images:
/Users/t/t4/oh/DSCN2059m-s.jpg
/Users/t/t4/oh/DSCN2062m-s.jpg
/Users/t/t4/oh/DSCN2097m-s.jpg
/Users/t/t4/oh/DSCN2099m-s.jpg
/Users/t/Icons_dir/icon_sum.gif
For those ending with -s (indicate a smaller version), you want to
change it to the full version without the -s, if it exists. Here's the
code:
# -*- coding: utf-8 -*-
# python
import re, os.path
imgPaths=[u'/Users/t/web/Periodic_dosage_dir/lanci/t4/oh/DSCN2059m-
s.jpg',
u'/Users/t/web/Periodic_dosage_dir/lanci/t4/oh/DSCN2062m-s.jpg',
u'/Users/t/web/Periodic_dosage_dir/lanci/t4/oh/DSCN2097m-s.jpg',
u'/Users/t/web/Periodic_dosage_dir/lanci/t4/oh/DSCN2099m-s.jpg',
u'/Users/t/web/Icons_dir/icon_sum.gif']
# change the image path to the full sized image, if it exists
# that is, if image ends in -s.jpg, find one without the '-s'.
imgPaths2=[]
for myPath in imgPaths:
p=myPath
(dirName, fileName) = os.path.split(myPath)
(fileBaseName, fileExtension)=os.path.splitext(fileName)
if(fileBaseName[-2:] == '-s'):
p2=os.path.join(dirName,fileBaseName[0:-2]) + fileExtension
if os.path.exists(p2): p=p2
imgPaths2.append(p)
print imgPaths2
But how do you do it in a functional programing style? Namely,
something of the from imgPath2=f(imgPath), where the f is some
function. Normally, the f would be a pure function construct made up on
the spot, that is, lambda. But Python's lambda is limited to only a
expression as its body. Nevertheless, one can achieve a functional
style with some workaround, using map twice. Here's the code:
imgPaths3 = map( lambda x: os.path.exists(x[1]) and x[1] or x[0], \
map(lambda x: (x, re.sub( r"^(.+?)-s(\.[^.]+)$",r"\1\2", x)), imgPaths))
The first map:
newList = map(lambda x: (x, re.sub( r"^(.+?)-s(\.[^.]+)$",r"\1\2", x)),
imgPaths)
generate a list of pairs (x, y), where x is the same element in
imgPaths, and y is one without the -s. Then, a second map:
map( lambda x: os.path.exists(x[1]) and x[1] or x[0], newList,
imgPaths))
checks if the path y exists, if so, use that, else, use the x part. The
function body is essentially of a conditional EXPRESSION, of the from
(test, if true return result1, else return result2). This form of a
test that returns a expression is very important in functional
programing. Because, in functional programing a common pattern is to
sequence functions and passing values. One cannot stop in the middle
and use a block structure. Here's how this form's syntax in several
languages:
test? trueExpression: falseExpression (C, Perl)
If[test, trueExpression, falseExpression] (Mathematica)
(test trueExpression falseExpression) (LISP)
In Python, there's no such form for this, but a semantically equivalent
workaround is to sequence boolean expressions as used in our example.
Namely:
test and trueExpression or falseExpression (Python)
This works because it exploits a particular behavior of how Python
treats boolean sequences. In Python, “x and y” returns y if x is true.
And, “x or y” returns x if x is true. This behavior is compatible with
the mathematical sense of booleans. For example, mathematically “x and
y” should be true if both are true, yet in Python if x is true then y
is returned, and if y is true then this is compatible with the math
sense, but if y is false then the whole result is also false, so it
also satisfies the math sense. Similar is the case of “x or y”. The
point here is that one of the element is returned, instead of a real
True or False value. Therefore, in a twisted way this can be used for
the If[test, trueExpression, falseExpression] form. Example.
result1= True and 4 or 5 # returns 4
result2= False and 4 or 5 # returns 5
print result1
print result2
Such language behavior is a result of the laziness of the compiler
implementation, particular seen in unix shells (&& and ||). The problem
with this design is that codes relying on the the returned element of a
boolean sequence does not clearly indicate the programer's intention.
It works by language pecularities, instead of expressed program logic.
For the official doc on evaluating booleans, see:
http://python.org/doc/2.4.2/lib/boolean.html
Here's the Perl code of the same loop block:
# perl
use File::Basename;
@imgPaths=(
'/Users/t/web/Periodic_dosage_dir/lanci/t4/oh/DSCN2059m-s.jpg',
'/Users/t/web/Periodic_dosage_dir/lanci/t4/oh/DSCN2062m-s.jpg',
'/Users/t/web/Periodic_dosage_dir/lanci/t4/oh/DSCN2097m-s.jpg',
'/Users/t/web/Periodic_dosage_dir/lanci/t4/oh/DSCN2099m-s.jpg',
'/Users/t/web/Icons_dir/icon_sum.gif');
# change the image path to the full sized image, if it exists
# that is, if image ends in -s.jpg, find one without the '-s'.
$imgPaths2=();
for $myPath (@imgPaths){
$p=$myPath;
($fileBaseName, $dirName, $fileExtension) = fileparse($myPath,
('\.[^.]+$') );
if (substr($fileBaseName,-2,2) eq '-s') {
$p2= $dirName . '/' .
substr($fileBaseName,0,length($fileBaseName)-2) . $fileExtension;
print $p2, "\n";
if (-e $p2) { $p=$p2}
}
push(@imgPaths2,$p);
}
use Data::Dumper;
print Dumper(\@imgPaths2)
In Perl, this can be written in a functional style one-liner:
@imgPaths3= map{
$y = $_;
$y =~ s/^(.+?)-s(\.[^.]+)$/$1$2/;
-e $y ? $y : $_;
} @imgPaths;
For the official doc of Perl's map, type in command line: “perldoc -f
map”
------
this article is archived at:
http://xahlee.org/perl-python/one-liner_loop.html
Xah
xah@...
∑ http://xahlee.org/
☄
Split File Fullpath Into Parts
Xah Lee, 20051016
Often, we are given a file fullpath and we need to split it into the
directory name and file name. The file name is often split into a core
part and a extension part. For example:
'/Users/t/web/perl-python/I_Love_You.html'
becomes
'/Users/t/web/perl-python/' (directory name)
'I_Love_You' (file's base name)
'.html' (file's “extension”)
Depending on the language, some language will remove the trailing slash
after the dir name, and some will omit the dot before the suffix.
In Python, to split a full path into parts is done with the os.path
module. Example:
# python
import os.path
myPath = '/Users/t/web/perl-python/I_Love_You.html'
(dirName, fileName) = os.path.split(myPath)
(fileBaseName, fileExtension)=os.path.splitext(fileName)
print dirName # /Users/t/web/perl-python
print fileName # I_Love_You.html
print fileBaseName # I_Love_You
print fileExtension # .html
The official doc of the os.path module is at:
http://www.python.org/doc/2.4.1/lib/module-os.path.html
In Perl, spliting a full path into parts is done like this:
# perl
use File::Basename;
$myPath = '/Users/t/web/perl-python/I_Love_You.html';
($fileBaseName, $dirName, $fileExtension) = fileparse($myPath,
('\.html') );
print $fileBaseName, "\n"; # I_Love_You
print $dirName, "\n"; # /Users/t/web/perl-python/
print $fileExtension, "\n"; # .html
Note: the second argument to fileparse() is a list of regex. In
particular, you need to escape the dot.
For the official doc, type in command line: “perldoc File::Path”.
----------
this post is archived at:
http://xahlee.org/perl-python/split_fullpath.html
☄
continued from sorting in Python at
http://xahlee.org/perl-python/sort_list.html
-----------------------------
Sorting in Perl
In Perl, to sort a list, do like this:
@li=(1,9,2,3);
@li2 = sort {$a <=> $b} @li;
print join(' ', @li2);
In Perl, sort is a function, not some Object Oriented thing. It returns
the sorted result as another list. This is very simple and nice.
It works like this: sort takes the form “sort {...} @myList”. Inside
the enclosing braces is the body of the ordering function, where
variables $a and $b inside are predefined by the language to represent
two elements in the list. The operator “<=>” returns -1 if left element
is less than right element. If equal, it returns 0, else 1. It is
equivalent to Python's “cmp” function.
Perl provides another comparison operator “cmp”, which treats its
operands as strings. So, for example:
print "3" <=> "20"; # prints -1
print "3" cmp "20"; # prints 1
In Perl, numbers and strings are mutually automatically converted if
needed.
Another form of sort is “sort orderFunctionName @list”, which uses a
function name in place of the comparison block
Just for completeness, let's define a Perl equivalent of a Python
example above.
@li=(
'my283.jpg',
'my23i.jpg',
'web7-s.jpg',
'fris88large.jpg',
);
# sorts a list of strings by their embedded number
@li2 = sort { ($a=~m/(\d+)/)[0] <=> ($b=~m/(\d+)/)[0];} @li;
print join(' ', @li2); # prints web7-s.jpg my23i.jpg fris88large.jpg
my283.jpg
Note, that Perl also has pure functions (lambda) construct. In Perl, a
pure function is simply done as “def {...}”, and applied to argument by
the form “pureFunction->(args)”. Unlike Python, Perl has no limitation
in its pure function construct. Because of this, Perl supports a
limited form of functional programing. Here's a simple example:
$result = sub($) {$_[0]+1}->(3);
print $result; # prints 4
# a value obtained by a pure function that adds 1 to its argument,
# applied to a argument of 3.
Perl, like Python, whose compiler is not very smart. In sorting, the
ordering function is called unnecessarily repetitiously. Like Python,
Perlers have sought means to avoid this penalty. If the programer knew
in advance that his matrix is huge or knew in advance his ordering
function, then he can code his sort by writing it using a very complex
workaround.:
Here's how this work around works: Suppose you want to sort a huge list
with a expensive ordering function. If you simply do “sort
orderingFunction @myList”, then Perl is going to call your
orderingFunction wantonly, and you lose. To beat Perl, you first
generate a copy newList, whose elements are pairs (x,y), where x is
from the original list, and y is the sorting key extracted from x.
Then, you call Perl's sort function on this newList and using a
ordering function based on the second list element (the y). Once this
newList is sorted, you construct your original list by extracting the
first element from this sorted newList. Here's the code example:
# the following two lines are equivalent
@li2 = sort { ($a=~m/(\d+)/)[0] <=> ($b=~m/(\d+)/)[0];} @li;
@li2 = map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [ $_,
($_=~m/(\d+)/)[0] ] } @li;
In the above Perl code, the part “map { [ $_, ($_=~m/(\d+)/)[0] ] }
@li;” generates the intermediate newList. Then, sort is applied to it,
then, another map “map { $_->[0] }” gets the original list sorted.
In this way, the cost of the internals of the ordering function is
avoided. (it runs on each list element once) However, your huge list is
copied 1 extra time. So, there are pros and cons. Because this work
around is very complex in both its semantics and syntax, it has
acquired a coolness factor among Perl coders, and is given the name
Schwartzian Transform.
It is interesting to note what compiler flaws can do to imperative
languages and its people. In Python, the language syntax is tainted. In
Perl, a complex construct is invented. In both camps, the basic
mathematics of sorting and its implementation aspects are completely
belied.
For the official doc of Perl's sort, type on the command line: “perldoc
-f sort”.
---------
this post is archived at:
http://xahlee.org/perl-python/sort_list.html
Xah
xah@...
∑ http://xahlee.org/
☄
New material has been added to the Python's documentation problem
example on sort().
The following post is archived at:
http://xahlee.org/perl-python/python_doc_sort.html
-------------
Python Doc Problem Example: sort()
Xah Lee, 200503
Exhibit: Incompletion & Imprecision
Python doc “3.6.4 Mutable Sequence Types” at
http://python.org/doc/2.4/lib/typesseq-mutable.html
in which contains the documentation of the “sort” method of a list.
Quote:
-------------
Operation Result Notes
s.sort([cmp[, key[, reverse]]]) sort the items of s in place (7), (8),
(9), (10)
(7) The sort() and reverse() methods modify the list in place for
economy of space when sorting or reversing a large list. To remind you
that they operate by side effect, they don't return the sorted or
reversed list.
(8) The sort() method takes optional arguments for controlling the
comparisons.
cmp specifies a custom comparison function of two arguments (list
items) which should return a negative, zero or positive number
depending on whether the first argument is considered smaller than,
equal to, or larger than the second argument: "cmp=lambda x,y:
cmp(x.lower(), y.lower())"
key specifies a function of one argument that is used to extract a
comparison key from each list element: "cmp=str.lower"
reverse is a boolean value. If set to True, then the list elements are
sorted as if each comparison were reversed.
In general, the key and reverse conversion processes are much faster
than specifying an equivalent cmp function. This is because cmp is
called multiple times for each list element while key and reverse touch
each element only once.
Changed in version 2.3: Support for None as an equivalent to omitting
cmp was added.
Changed in version 2.4: Support for key and reverse was added.
(9) Starting with Python 2.3, the sort() method is guaranteed to be
stable. A sort is stable if it guarantees not to change the relative
order of elements that compare equal -- this is helpful for sorting in
multiple passes (for example, sort by department, then by salary
grade).
(10) While a list is being sorted, the effect of attempting to mutate,
or even inspect, the list is undefined. The C implementation of Python
2.3 and newer makes the list appear empty for the duration, and raises
ValueError if it can detect that the list has been mutated during a
sort.
-------------
As a piece of documentation, this is a lousy one.
The question Python doc writers need to ask when evaluating this piece
of doc are these:
• can a experienced programer who is expert at several languages but
new to Python, and also have read the official Python tutorial, can he,
read this doc, and know exactly how to use sort with all the options?
• can this piece of documentation be rewritten fairly easily, so that
the answer to the previous question is a resounding yes?
To me, the answers to the above questions are No and Yes. Here are some
issues with the doc:
• In the paragraph about the “key” parameter, the illustration given
is: “cmp=str.lower”. It should be be “key=str.lower”
• This doc lacks examples. One or two examples will help a lot,
especially to less experienced programers. (which comprises the
majority of readers) In particular, it should give a full example of
using the comparison function and one with the “key” parameter.
Examples are particularly needed here because these parameters are
functions, often with the “lambda” construct. These are unusual and
advanced constructs among imperative languages.
• This doc fails to mention what happens when the predicate and the
shortcut version conflicts. e.g. “myList.sort(cmp=lambda x,y: cmp(x[0],
y[0]), key=lambda x: str(x[1]) )”
• The syntax notation Python doc have adopted for indicating optional
parameters, does not give a clear view just exactly what combination of
optional parameters can be omitted. The notation: “s.sort([cmp[, key[,
reverse]]])” gives the impression that only trailing arguments can be
omitted, which is not true.
• The doc gives no indication of how to omit a optional arg. Should it
be “nul”, “Null”, 0, or left empty? Since it doesn't give any examples,
doc reader who isn't Python experts is left to guess at how true/false
values are presented in Python.
• On the whole, the way this doc is written does not give a clear
picture of the roles of the supplied options, nor how to use them.
Suggested Quick Remedy: add a example of using the cmp function. And a
example using the “key” function. Add a example of Using one of them
and with reverse. (the examples need not to come with much
explanations. One sentence annotation is better than none.)
Other than that, the way the doc is laid out with a terse table and
run-on footnotes (employed in several places in Python doc) is not
inductive. For a better improvement, there needs to be a overhaul of
the organization and the attitude of the entire doc. The organization
needs to be programing based, as opposed to implementation or computer
science based. (in this regard, one can learn from the Perl folks). As
to attitude, the writing needs to be Python-as-is, as opposed to
computer science framework, as indicated in the early parts of this
critique series.
--------------------------------
Addendum, 200510
Here's further example of Python's extreme low quality of
documentation. In particular, what follows focuses on the bad writing
skill aspect, and comments on some language design and quality issues
of Python.
From the Official Python documentation of the sort() method, at:
http://python.org/doc/2.4.2/lib/typesseq-mutable.html, Quote:
«The sort() method takes optional arguments for controlling the
comparisons.»
It should be “optional parameter” not “optional argument”. Their
difference is that “parameter” indicates the variable, while “argument”
indicates the actual value.
«... for controlling the comparisons.»
This is a bad writing caused by lack of understanding. No, it doesn't
“control the comparison”. The proper way to say it is that “the
comparison function specifies an order”.
«The sort() and reverse() methods modify the list in place for economy
of space when sorting or reversing a large list. To remind you that
they operate by side effect, they don't return the sorted or reversed
list. »
This is a example of tech-geeking drivel. The sort() and reverse()
methods are just the way they are. Their design and behavior are really
not for some economy or remind programers of something. The Python doc
is bulked with these irrelevant drivels. These littered inanities
dragged down the whole quality and effectiveness of the doc implicitly.
«Changed in version 2.4: Support for key and reverse was added.»
«In general, the key and reverse conversion processes are much faster
than specifying an equivalent cmp function. This is because cmp is
called multiple times for each list element while key and reverse touch
each element only once.»
When sorting something, one needs to specify a order. The easiest way
is to simply list all the elements as a sequence. That way, their order
is clearly laid out. However, this is in general not feasible and
impractical. Therefore, we devised a mathematically condensed way to
specify the order, by defining a function f(x,y) that can take any two
elements and tell us which one comes first. This, is the gist of
sorting a list in any programing language.
The ordering function, being a mathematically condensed way of
specifying the order, has some constraints. For example, the function
should not tell us x < y and y < x. (For a complete list of these
constraints, see http://xahlee.org/perl-python/sort_list.html )
With this ordering function, it is all sort needed to sort a list.
Anything more is interface complexity.
The optional parameters “key” and “reverse” in Python's sort method is
a interface complexity. What happened here is that a compiler
optimization problem is evaded by moving it into the language syntax
for programers to worry about. If the programer does not use the “key”
syntax when sorting a large matrix (provided that he knew in advance of
the list to be sorted or the ordering function), then he is penalized
by a severe inefficiency by a order of magnitude of execution time.
This situation, of moving compiler problems to the syntax surface is
common in imperative languages.
«Changed in version 2.3: Support for None as an equivalent to omitting
cmp was added.»
This is a epitome of catering towards morons. “myList.sort()” is
perfect but Pythoners had to add “myList.sort(None)” interface
complexity just because idiots need it.
The motivation here is simple: a explicit “None” gives coding monkeys a
direct sensory input of the fact that “there is no comparison
function”. This is like the double negative in black English “I ain't
no gonna do it!”. Logically, “None” is not even correct and leads to
bad thinking. What really should be stated in the doc, is that “the
default ordering function to sort() is the ‘cmp’ function.”.
«Starting with Python 2.3, the sort() method is guaranteed to be
stable. A sort is stable if it guarantees not to change the relative
order of elements that compare equal -- this is helpful for sorting in
multiple passes (for example, sort by department, then by salary
grade).»
One is quite surprised to read this. For about a decade of a language's
existence, its sort functionality is not smart enough to preserve
order?? A sort that preserves original order isn't something difficult
to implement. What we have here is sloppiness and poor quality common
in OpenSource projects.
Also note the extreme low quality of the writing. It employes the
jargon “stable sort” then proceed to explain what it is, and the latch
on of “multiple passes” and the mysterious “by department, by salary”.
Here's a suggested rewrite: “Since Python 2.3, the result of sort() no
longer rearrange elements where the comparison function returns 0.”
☄
Perl-Python-a-Day: Sort a List
Sort a List
Xah Lee, 200510
In this page, we show how to sort a list in Python & Perl and also
discuss some math of sort.
To sort a list in Python, use the “sort” method. For example:
li=[1,9,2,3];
li.sort();
print li;
Note that sort is a method, and the list is changed in place.
Suppose you have a matrix, and you want to sort by second column.
Example Solution:
li=[[2,6],[1,3],[5,4]]
li.sort(lambda x, y: cmp(x[1],y[1]))
print li; # prints [[1, 3], [5, 4], [2, 6]]
The line “li.sort(lambda x, y: cmp(x[1],y[1]))” can also be written as
“li.sort(cmp=lambda x, y: cmp(x[1],y[1]))”
The argument to sort is a function of two arguments, that returns -1,
0, 1. This function is a decision function that tells sort() how to
decide the order of any two elements in your list. If the first
argument is “less” then second argument, the function should return -1.
If equal, then 0. Else, 1.
Here's a more complex example. Suppose you have a list of strings.
'my283.jpg'
'my23i.jpg'
'web7-s.jpg'
'fris88large.jpg'
...
You want to sort them by the number embedded in them. What you have to
do, is to provide sort() method a function, that takes two strings, and
compares the integer inside the string. Here's the solution:
li=[
'my283.jpg',
'my23i.jpg',
'web7-s.jpg',
'fris88large.jpg',
]
def myComp (x,y):
import re
def getNum(str): return float(re.findall(r'\d+',str)[0])
return cmp(getNum(x),getNum(y))
li.sort(myComp)
print li # returns ['web7-s.jpg', 'my23i.jpg', 'fris88large.jpg',
'my283.jpg']
Here, we defined a function myComp to tell sort about the ordering.
Normally, one would use the “lambda” construct, but Python's lambda
construct can only represent the simplest functions.
Some Math about Sorting
In general, the function f used to determine the order of any two
element must satisfy some constraints:
• f(a,a)==0
• if f(a,b)==0 then f(b,a)==0
• if f(a,b)==0 and f(b,c)==0, then f(a,c)==0.
• if f(a,b)==-1 and f(b,c)==-1, then f(a,c)==-1.
• if f(a,b)==-1, then f(b,a)==1.
If the comparison function does not behave as the above, then it is not
consistent, meaning that the result “ordered” list is may actually be
different depending how the language happens to implement sort.
The significance of all these is that in real software you may want to
sort a list of non-simple entities by a specialized ordering. For
example, you may want to sort a list of polygonal surfaces in 3D space,
for particular reasons in implementing some computer graphics features.
Say, you want to sort these polygons by their spacial orientations. It
is in advanced cases like these, understanding the basic math about
ordering is important. Otherwise, you might have a bewildering result
yet unable to locate any flaws in your code.
Python's “sort” method's optional parameters: “key” and “reverse”
Most of the time, sorting is done for a list of atomic element such as
[3,2,4]. This is simply done by myList.sort() without any argument.
Other than simple list, sort is frequently used on matrixes (e.g.
[[2,6],[1,3],[5,4]]). For matrixes, almost always a particular column
is used for the basis of ordering. For example, if we want to sort by
second column, we do: “li.sort(lambda x, y: cmp(x[1],y[1]))”. Since
this is frequently used, Python provides a somewhat shorter syntax for
it, by specifying the column used as the ordering “key”. For example:
li=[[2,6],[1,3],[5,4]]
li.sort(key=lambda x:x[1] ) # is equivalent to the following
#li.sort(lambda x, y: cmp(x[1],y[1]))
print li; # prints [[1, 3], [5, 4], [2, 6]]
Because Python's implementation is not very refined , this specialized
syntax is actually much speedier than the general form “lambda x, y:
cmp(x[1],y[1])”. It is a burden on the programer to always use the
“key” syntax idiosyncrasy if he is sorting a large matrix.
Another idiosyncratic provision is the optional “reverse” argument.
This parameter is somewhat necessary when using the “key” parameter.
One can reverse the ordering by using the “reverse” keyword as a
argument to sort. Example:
The following are equivalent:
li.sort(key=lambda x:x[1], reverse=True )
li.sort(lambda x, y: cmp(x[1],y[1]), reverse=True)
li.sort(lambda x, y: cmp(y[1],x[1]))
The official doc on Python's sort method is at (bottom):
http://python.org/doc/2.4/lib/typesseq-mutable.html
Sorting in Perl
(to be posted in a couple of days)
This post is archived at:
http://xahlee.org/perl-python/sort_list.html
Xah
xah@...
∑ http://xahlee.org/
☄
The following script takes a directory and print all the sizes of html
files, counting the sizes of inline images.
This script is useful in making sure that HTML file are under certain
size. This is useful because web visitors with slow connection may take
a long time to load html files with lots of inline images.
# -*- coding: utf-8 -*-
# Python
# Wed Oct 5 15:50:31 PDT 2005
# given a dir, report all html file's size. (counting inline images)
# XahLee.org
import re, os.path, sys
inpath= '/Users/t/web/'
while inpath[-1] == '/': inpath = inpath[0:-1] # get rid of trailing
slash
if (not os.path.exists(inpath)):
print "dir " + inpath + " doesn't exist!"
sys.exit(1)
##################################################
# subroutines
def getInlineImg(file_full_path):
'''getInlineImg($file_full_path) returns a array that is a list of
inline images. For example, it may return ['xx.jpg','../image.png']'''
FF = open(file_full_path,'rb')
txt_segs = re.split( r'src', unicode(FF.read(),'utf-8'))
txt_segs.pop(0)
FF.close()
linx=[]
for linkBlock in txt_segs:
matchResult = re.search(r'\s*=\s*\"([^\"]+)\"', linkBlock)
if matchResult: linx.append( matchResult.group(1) )
return linx
def linkFullPath(dir,locallink):
'''linkFullPath(dir, locallink) returns a string that is the full
path to the local link. For example,
linkFullPath('/Users/t/public_html/a/b', '../image/t.png') returns
'Users/t/public_html/a/image/t.png'. The returned result will not
contain double slash or '../' string.'''
result = dir + '/' + locallink
result = re.sub(r'//+', r'/', result)
while re.search(r'/[^\/]+\/\.\.', result): result =
re.sub(r'/[^\/]+\/\.\.', '', result)
return result
def listInlineImg(htmlfile):
'''listInlineImg($html_file_full_path) returns a array where each
element is a full path to inline images in the html.'''
dir=os.path.dirname(htmlfile)
imgPaths = getInlineImg(htmlfile)
result = []
for aPath in imgPaths:
result.append(linkFullPath( dir, aPath))
return result
##################################################
# main
fileSizeList=[]
def checkLink(dummy, dirPath, fileList):
for fileName in fileList:
if '.html' == os.path.splitext(fileName)[1] and
os.path.isfile(dirPath+'/'+fileName):
totalSize = os.path.getsize(dirPath+'/'+fileName)
imagePathList = listInlineImg(dirPath+'/'+fileName)
for imgPath in imagePathList: totalSize +=
os.path.getsize(imgPath)
fileSizeList.append([totalSize, dirPath+'/'+fileName])
os.path.walk(inpath, checkLink, 'dummy')
fileSizeList.sort(key=lambda x:x[0],reverse=True)
for it in fileSizeList: print it
print "done reporting."
The following is a Perl version. The Python version above is a direct
translation of this Perl version.
# perl
# Tue Oct 4 14:36:48 PDT 2005
# given a dir, report all html file's size. (counting inline images)
# XahLee.org
use Data::Dumper;
use File::Find;
use File::Basename;
$inpath = '/Users/t/web/';
while ($inpath =~ m@^(.+)/$@) { $inpath = $1;} # get rid of trailing
slash
die "dir $inpath doesn't exist! $!" unless -e $inpath;
##################################################
# subroutines
# getInlineImg($file_full_path) returns a array that is a list of
inline images. For example, it may return ('xx.jpg','../image.png')
sub getInlineImg ($) { $full_file_path= $_[0];
@linx =(); open (FF, "<$full_file_path") or die "error: can not open
$full_file_path $!";
while (<FF>) { @txt_segs = split(m/src/, $_); shift @txt_segs;
for $linkBlock (@txt_segs) {
if ($linkBlock =~ m@\s*=\s*\"([^\"]+)\"@) { push @linx, $1; }
}
} close FF;
return @linx;
}
# linkFullPath($dir,$locallink) returns a string that is the full path
to the local link. For example,
linkFullPath('/Users/t/public_html/a/b', '../image/t.png') returns
'Users/t/public_html/a/image/t.png'. The returned result will not
contain double slash or '../' string.
sub linkFullPath($$){
$result=$_[0] . $_[1];
$result =~ s@\/+@\/@g;
while ($result =~ s@/[^\/]+\/\.\.@@g) {};
return $result;
}
# listInlineImg($html_file_full_path) returns a array where each
element is a full path to inline images in the html.
sub listInlineImg($) {
my $htmlfile= $_[0];
my ($name, $dir, $suffix) = fileparse($htmlfile, ('\.html') );
my @imgPaths = getInlineImg($htmlfile);
my @result=();
foreach my $aPath (@imgPaths) { push @result,
linkFullPath($dir,$aPath);}
return @result;
}
##################################################
# main
sub checkLink {
if (
$File::Find::name =~ m@\.html$@ && -T $File::Find::name
) {
$totalSize= -s $File::Find::name;
@imagePathList = listInlineImg($File::Find::name);
for my $imgPath (@imagePathList) {$totalSize += -s $imgPath;};
push (@fileSizeList, [$totalSize, $File::Find::name]);
};
}
find(\&checkLink, $inpath);
@fileSizeList = sort { $b->[0] <=> $a->[0]} @fileSizeList;
print Dumper(\@fileSizeList);
print "done reporting.";
-------------
this post is archived at
http://xahlee.org/perl-python/check_html_size.html
☄
http://xahlee.org/perl-python/python_3000.html
addendum, 20051001
here i will try to illuminate some miscellaneous things regarding the
lambda in Python issue.
as i have hinted (
http://xahlee.org/perl-python/list_comprehension.html ), the so-called
List Comprehension is just a irregular syntax to facilitate generating
lists. The name is a terrible jargon, and the means is also bent. The
proper name should be something like List Generating Form, and the
proper means should be the plain function.
For instance, Python's range() is such a list generator, only that it
is limited in scope.
For a example of a powerful list generator, see Mathematica's Table
function:
http://documents.wolfram.com/mathematica/functions/Table
Note Table's power in generating not just flat lists, but trees (nested
lists). And if one really want flat lists, there's the Flatten function
that flats any nested lists. (Python should have this too)
Python's reduce() is Mathematica's Fold. See
http://documents.wolfram.com/mathematica/functions/Fold
Besides Fold, there's FoldList, FixedPoint, FixedPointList, Nest,
NestList and others. FoldList is like reduce() except it returns a list
of each steps. FixedPoint recursively applies a function to itself
until the result no longer changes (or when a optional function returns
true) Nest is similar, except it does the recursion by a specified
number of times. The NestList and FixedPointList are similar except
that they return a list, containing all the steps.
All these can be written as a For loop, but they make the code
condensed and meaning clear. More so, they are important when
programing in a functional style. In functional programing, you don't
litter lots of variables or temporary functions or intermediate loops
here or there on every other line. The code is usually tight and
inline. When sequencing a series of functions, you can't stop in the
middle and do some loop or auxiliary calculation. All these are made
inline into a function. (that is: constructed as lambda) A block of
code usually corresponds to a unit of the algorithm used, as opposed to
a block of implementation. You don't read the minute details of the
code. You read the algorithmic unit's comments, or just the input and
output of a code block.
Also, these inline loop constructs are not only suitable for computing
numbers as Guido ignorantly thought. They are specialized forms of
generic loop constructs. Their first argument is a function, and second
argument is a list. Their generality lies with the fact that their
first argument is a function. If a language does not provide a
convenient way to represent the concept of a function, than these
functional loop constructs's usability will suffer.
The Pythoners, did not provide a convenient way to represent a
function. (they tried, with their limited implementation of lambda and
shun it like a plaque)
The way Guido puts it gives us a nice glimpse of their retarded
mentality:
Also, once map(), filter() and reduce() are gone, there aren't a whole
lot of places where you really need to write very short local
functions;
As we can see here, in Pythoner's mind, lambda is for very short local
functions.
Python's limited lambda and imperative programer's unfamiliarity of it
made functional programing suffer in Python. Consequently, it is not
surprising for one to feel a need to chop off things unfamiliar.
For Python's map(), look at Mathematica's Map on how it might be
extended.
http://documents.wolfram.com/mathematica/functions/Map
Note the ability to map to not just flat lists but trees (nested
lists). Note the power of expressing the concept of levels of a tree.
For Python's filter(), check out the equivalent in Mathematica's Select:
http://documents.wolfram.com/mathematica/functions/Select
Note how it provides a third option for picking just the first n items.
Also note, that Select is just a way to pick elements in a list.
Mathematica provides a set to do these: Part, Take, Drop, Select,
Cases, Delete, DeleteCases... All uniformly uses the function syntax
and all operate semantically by returning a new list. In Python and
other imperative clown's languages, usually they provide a limited
varieties to do such a task, and also inconsistent as if piled on.
(e.g. In Python we have alist[5:9], filter(f,alist), alist.remove(...),
del alist[...]. Some modify the list in-place, some returns a new
list.)
one is quite sorry to read a big shot contemplating on petty issues
with a ambitious name Python THREE THOUSAND.
For the grand Python THREE THOUSAND, what about supporting non-trivial
things such as exact fractions? What about a built-in worry-free
exact-arithmetics once for all? What about supporting pattern matching
for expression structures?
the features of Mathematica mentioned above existed over a decade ago.
But today, OpenSourcing bigwigs can contemplate and dither nothing but
which lipstick to use.
A good number of the industrial dignitaries are selfish liers. And
today we have the most human-labor intensive language Java and the
crazily hacked up Perl.
-------
This post is archived at:
http://xahlee.org/perl-python/python_3000.html
Lambda in Python 3000
Xah Lee, 200509
On Guido van Rossum's website:
http://www.artima.com/weblogs/viewpost.jsp?thread=98196
dated 20050826, he muses with the idea that he would like to remove
lambda, reduce(), filter() and map() constructs in a future version
Python 3000.
Guido wrote:
filter(P, S) is almost always written clearer as [x for x in S if
P(x)], and this has the huge advantage that the most common usages
involve predicates that are comparisons, e.g. x==42, and defining a
lambda for that just requires much more effort for the reader (plus the
lambda is slower than the list comprehension)
the form “[x for x in S if P(x)]” is certainly not more clear than
“filter(P, S)”. The latter is clearly a function, what is the former? A
function every programer in any language can understand and appreciate
its form and function. Why would anyone to expect everyone to
appreciate a Python syntactical idiosyncrasy “[x for ...]”?
also, the argument that the from “filter(F,S)” being cumbersome because
the first argument is a function and that mostly likely it would be a
function that returns true and false thus most people will probably use
the form “lambda” and that is quite cumbersome than if the whole thing
is written with the syntactical idiosyncrasy “[x for ...]”, is rather
inane, as you can now see.
The filter(function,list) form is clean, concise, and helps thinking.
Why it helps thinking? Because it condenses the whole operation into
its mathematical essence with the most clarity. That is, it filters, of
a list, and by a yes/no decision function. Nothing is more, and nothing
can be less. It is unfortunate that we have the jargon Lambda and
Predicate developed by the tech geekers of the functional programing
community. The lambda could be renamed Pure Function and the Predicate
could be called True/False function, but the world being the way they
are already, it is unwise to rewrite every existing Perl program just
because somebody invented another language.
If the predicate in lambda in filter() is cumbersome, so would exactly
the same thing appear in the syntactical idiosyncrasy “[x for x in S if
P(x)]”.
Guido added this sting as a afterthought:
(plus the lambda is slower than the list comprehension)
Which is faster is really the whim and capacity of Python
implementators. And, just before we were using criterion of simplicity.
The concept of a function every programer understands, but what the
heck is a List Comprehension? Why don't you scrap list comprehension in
Python 3000 and create a table() function that's simpler in syntax and
more powerful in semantics? ( See
http://xahlee.org/perl-python/list_comprehension.html )
Why drop lambda? Most Python users are unfamiliar with Lisp or
Scheme, so the name is confusing; also, there is a widespread
misunderstanding that lambda can do things that a nested function can't
-- I still recall Laura Creighton's Aha!-erlebnis after I showed her
there was no difference! Even with a better name, I think having the
two choices side-by-side just requires programmers to think about
making a choice that's irrelevant for their program; not having the
choice streamlines the thought process. Also, once map(), filter() and
reduce() are gone, there aren't a whole lot of places where you really
need to write very short local functions; Tkinter callbacks come to
mind, but I find that more often than not the callbacks should be
methods of some state-carrying object anyway (the exception being toy
programs).
In the outset Guido here assumes a moronitude about the set of Python
users and what they are familiar of. Python users 10 years ago are not
the same Python users today, and will certainly not be the same 10
years later if you chop off lambda. Things change, math literacy
advances, and what users you have changes with what you are. A pure
function (lambda) is the gist of a mathematical idea embodied in
computer languages, not something from LISP or Scheme as tech geekers
wont to think.
... there is a widespread misunderstanding that lambda can do
things that a nested function can't...
One is so insulted by a bigshot in the industry of quoting something so
disparate then shot it down as if showing his perspicacity.
A lambda is a syntax for function or a name for the concept of
function. What does it mean that a lambda isn't as powerful as nested
function??
The lambda in Python is really ill. It is designed with a built-in
limitation in the first place, and regarded as some foreign substance
in the Imperative crowd such as the Pythoners. If there's any problem
with lambda, it is with lambda in Python and Pythoner's attitude.
Also, once map(), filter() and reduce() are gone, there aren't a
whole lot of places where you really need to write very short local
functions;
Of course, one begin to write things like Java: in three thousand words
just to state its puniness.
The removing of elements in a language is in general not a good idea.
Removing powerful features so that cheap coders can use it is moronic.
(e.g. Java) Removing “redundant” constructs is not always smart (e.g.
Scheme), because it pinches on practicality. Removing existing language
features by a visionary upgrade is a waste. It forces unnecessary
shakeup and can cause death.
So now reduce(). This is actually the one I've always hated most,...
The existance of reduce() in Python is probably caused by tech geeking
clowns of the computing industry. Basically, nobody really have a clear
understanding of mathematics or computing semantics, but every elite
tech geeker knew about bags of constructs of various languages. So, you
add this, i want that, and the language becomes a incoherent soup of
constructs, with the backlash of wanting to chop off things again, with
that good things.
Suggestions: lambda, reduce(), filter() and map() all should stay. I'm
not sure exactly what's the ins and outs of Python 3000. If one wants
to shake things up based on a vision: don't. There are already
gazillion languages and visions; the world don't really need another
bigshot's say for their personal advancement. As for improvement,
lambda in Python should be expanded to remove its built-in limitation
(and Imperative Programing Crowd such as Pythoners should cease to have
lambda attitude problem). The function map() could also be considered
for expansion. (see “What is Expresiveness in a Computer Language” at
http://xahlee.org/perl-python/what_is_expresiveness.html ) Function
reduce() should stay because it's already there, even if it is not very
useful and odd. filter() should stay as it is as it is superb and
proper.
-------
This post is archived at:
http://xahlee.org/perl-python/python_3000.html
☄
Perl and Python Documentations
Xah Lee, 20050921
Perl's documentation has come of age: http://perldoc.perl.org/
Python people really need to learn:
• ample code examples.
• keywords in examples are linked to the appropriate doc location.
• written in a task-oriented style, or manifest-functionality style.
That is, it does not have pretensions of a computer sciency mode or
austerity. It either is oriented towards tasks programers need to do,
as in module documentations, or document the language as it manifestly
functions. (e.g. input, output, side effects; concrete description of
object's methods and variables.)
The Python docs (http://docs.python.org/), is organized in some
incomprehensible computer sciency structure that is impossible to find
anything. And the entire doc go to the extra mile to avoid any
task-oriented writing or examples as if those are pests that can lower
their status. And when the Python doc tries to document its functions,
it doesn't talk straight but instead throwing a bunch of abstract
objects and models jargons.
The Perl documentations, at least in its presentation (organization,
orientation) and technology format (DHTML...) aspects, has come of age.
However, the Perl doc's content and writing per se, remains the worst
garbage possible. (and Python's is in the same ball park) The negative
aspects people want to avoid are:
• Do not tech geek. Both perlers and pythoners do tech geeking. That
is, mentioning of extraneous jargons, warnings, implementation details,
little style guide here and there, unconscious opportunistic OpenSource
propaganda pitch-ins, historic information provisions, insider jokes,
author masturbation on language design and comparisons... etc. (with
Perl, this may be understandable or irrelevant because it is their
nature and design to be juvenile. They revel in it. But with Python,
it's worse because of Pythoner's computer sciency aloofness and
cleanness pretensions meanwhile don't really exhibit any ability to
think and write better.)
• Do think clearly before writing. Both Perl and Python docs's writing
quality are extremely bad. What they primarily lack is the ability to
think clearly before writing. Perl docs write in the fashion of
happy-go-lucky juvenile ramble. They write down their thought process
and intuitive understanding of how things are, and meanwhile don't
hesitate to throw in childish unix jokes and cheap word play.
Pythoner's in the fashion of computer sciency confoundedness. They
don't have mathematical clarity of how things are, but meanwhile force
their understanding into a irrelevant fomal structure with computer
science jargons. Both are incomprehensible.
One easy way to test this, is for Pythoners to read Perl docs and vice
versa.
Pythoners will find that, you really don't know what the Perlers are
talking about. Same with Perler with Python docs. However, you will not
get the same feeling on well written docs, such as Java's doc or
Mathematica's doc. (assume that the people here have been in the
programing industry for several years, and are not familiar with the
other languages in question.)
What the Perlers & Pythoners need to do, is to horn their skills
outside of coding. Study philosophy, study economics, history, social
sciences, and mathematics. Also, study functional programing or hang
out in functional programing communities or hardcore GNU community will
also improve vastly
-------------
this post is archived at:
http://xahlee.org/perl-python/perl_python_docs.html
☄
Addendum to the Python doc problem at
http://www.python.org/doc/2.4.1/lib/module-os.path.html
I was working on a program where i needed to split a path into dirname,
corename, and suffix.
I came to this page and took me a while to understand what split() is
about. There are other path related functions splitext(), splitdrive(),
basename(), dirname(). User has to scan the whole page and read
painfully each one to fully understand how to choose and use them for
the task at hand.
As i have explained before (see references at bottom), documentation
should be organized oriented towards programer's tasks, not
alphabetically, compiler view, or computer sciency scheme. On this
os.path module, split(), splittext(), dirname(), basename() should all
be under one section. This way, their usefulness and each's fitness
becomes clearer, and also easier to document as a collective. Other
functions that test files or get info about files should be grouped
together. Don't be afraid of having functions that won't fit into some
grouping scheme. For exapmle, the walk() and
supports_unicode_filenames() can be lumped at the bottom as Other. The
need to present materials in some aloof, computer sciency, academic,
technically precise way is a major usability problem of the Python doc.
(the Pythoners's need to present materials in a formal style is a
backlash of the happy-go-lucky sloppiness of unix/perl community.
However, they being pretty much the same crowd without significant
critical thinking and writing skills, cannot do better by hiding in
formality.)
Also, at the top we see:
Warning: On Windows, many of these functions do not properly support
UNC pathnames. splitunc() and ismount() do handle them correctly.
As indicated before, this is a exhibition of tech geeking and
jargonizing. If this warning is necessary, place it at the bottom of
the page as a footnote. Also, spell out UNC, and provide a link to its
proper spec.
Tech geekers are very pretentious and cryptic in their tech docs. They
are afraid, as if spelling out UNC would make them unprofessional, that
their peers would deem them inferior. There are a myriad of technical
standards that any programer could only be familiar with a fraction,
confined to his area of expertise. Standards and its acronyms come and
go, and each with varying degrees of precision, actual relevance, and
they are intermingled with de facto “standards” in the commercial world
that may not even have official names. The tech geekers are clouded by
their tech-expertise. The purpose of documentation is not some cold
academic presentation. Vast majority who came to use os.path wouldn't
know what UNC is nor do they need to know. Spell things out when in
doubt.
UNC here, isn't really a significant “standard”. This warning should be
left out.
-----------
This post is archived at:
http://xahlee.org/perl-python/python_doc_os_path_split.html
☄
On 18/09/05, xah lee <xah@...> wrote:
> Python Doc Problem Example: os.path.split()
>
> Xah Lee, 20050918
>
> Quote from:
> http://www.python.org/doc/2.4.1/lib/module-os.path.html
>
> split(path)
> Split the pathname path into a pair, (head, tail) where tail is the
> last pathname component and head is everything leading up to that. The
> tail part will never contain a slash; if path ends in a slash, tail
> will be empty. If there is no slash in path, head will be empty. If
> path is empty, both head and tail are empty. Trailing slashes are
> stripped from head unless it is the root (one or more slashes only). In
> nearly all cases, join(head, tail) equals path (the only exception
> being when there were multiple slashes separating head from tail).
>
> This confusive verbiage is a result of the author's pretention in a
> austere style and his failure to think clearly before writing.
>
> Suggested rewrite:
>
> split(path)
> returns a pair (dirname,filename), where dirname is the part of
> path up to the last slash, and filename is the rest of the string after
> the last slash.
>
> Exceptional cases are:
> if path is a single slash (or repeated), then path == dirname and
> filename is empty.
> If the "last" slash is repeated, they are treated as one single
> slash.
> -----------
> This post is archived at:
> http://xahlee.org/perl-python/python_doc_os_path_split.html
Umm nearly - but the exceptions should be as follows:
* If the last slash is repeated, and it is not the first set of
slashes, it treated as single slash.
* If the path ends in a slash, filename will be empty
* It the path has no slashes, dirname will be empty.
This behaviour means that it preserves UNC names.
Cheers,
Danny
--
http://orionrobots.co.uk - Build Robots
Python Doc Problem Example: os.path.split()
Xah Lee, 20050918
Quote from: http://www.python.org/doc/2.4.1/lib/module-os.path.html
split(path)
Split the pathname path into a pair, (head, tail) where tail is the
last pathname component and head is everything leading up to that. The
tail part will never contain a slash; if path ends in a slash, tail
will be empty. If there is no slash in path, head will be empty. If
path is empty, both head and tail are empty. Trailing slashes are
stripped from head unless it is the root (one or more slashes only). In
nearly all cases, join(head, tail) equals path (the only exception
being when there were multiple slashes separating head from tail).
This confusive verbiage is a result of the author's pretention in a
austere style and his failure to think clearly before writing.
Suggested rewrite:
split(path)
returns a pair (dirname,filename), where dirname is the part of
path up to the last slash, and filename is the rest of the string after
the last slash.
Exceptional cases are:
• if path is a single slash (or repeated), then path == dirname and
filename is empty.
• If the “last” slash is repeated, they are treated as one single
slash.
-----------
This post is archived at:
http://xahlee.org/perl-python/python_doc_os_path_split.html
☄
The official documentation for Python is
http://docs.python.org/
The doc is downloadable.
However, this official documentation is very lousy.
A excellent quick reference sheet is available at
http://rgruet.free.fr/
, by Richard Gruet.
A free book is available. Dive Into Python, by Mark Pilgrim, 2004.
Available free at
http://www.diveintopython.org/
for Python beginners.
-------------
If you know other quality references online, please email me
xah@... . Thanks.
☄
Hi there,
You could save recompressing an intermediate file on a unix system by
changing your input to the gzip commnad to include the -c option
This will then place the decompressed file on stdout. You can then
just pipe stdout from this to the stdin of the tail command (if you
dont pass a file to tail it will expect its input on stdin). You
should probably remove the wait as well, and only once you have
started both processes, should you wait for them to finish.
This small saving would speed up a program a little, and save an
intermediate file being created - so it has fewer side effects, fewer
disk writes etc.
import subprocess
# -*- coding: utf-8 -*-
# Python
gzip_process = subprocess.Popen([r"gzip","-cd", "x.txt.gz", stdout = PIPE])
tail_process = subprocess.Popen([r"tail","-n 1"], stdin =
gzip_process.stdout, stdout=PIPE)
last_line = tail_process.communicate()[0]
print last_line
Danny
--
http://orionrobots.co.uk - Build Robots
On 08/09/05, xah lee <xah@...> wrote:
> 200509: Making System Calls (with unix and Python 2.4)
>
> To make a system call, use:
>
> subprocess.Popen([r"gzip","-d", "x.txt.gz"]).wait()
>
>
> The subprocess module is available only since Python 2.4, and it is
> intended to replace several other older ones:
>
> os.system
> os.spawn*
> os.popen*
> popen2.*
> commands.*
>
>
> The subprocess.Popen([...]) creates a object. The .wait() makes the
> code wait until the system call finished. To not wait for it, simply
> use:
>
> subprocess.Popen([r"gzip","-d", "x.txt.gz"])
>
>
> To make a system call and get its output, do, for example:
>
> last_line=subprocess.Popen([r"tail","-n 1", "x.txt"],
> stdout=subprocess.PIPE).communicate()[0]
>
>
> The "communicate()" automatically makes the call wait.
>
> In the following example, we make a system call to decompress a gzip
> file (and wait for it), then get the last line of the file using
> "tail", then gzip it again.
>
> # -*- coding: utf-8 -*-
> # Python
>
> import subprocess
>
> subprocess.Popen([r"gzip","-d", "x.txt.gz"]).wait()
>
> last_line=subprocess.Popen([r"tail","-n 1", "x.txt"],
> stdout=subprocess.PIPE).communicate()[0]
>
> subprocess.Popen([r"gzip","x.txt"]).wait()
>
> print last_line
>
>
> Reference:
> http://www.python.org/doc/2.4/lib/module-subprocess.html
> --------
> this post is archived at
> http://xahlee.org/perl-python/system_calls.html
>
>
>
> ☄
>
>
> ________________________________
> YAHOO! GROUPS LINKS
>
>
> Visit your group "perl-python" on the web.
>
> To unsubscribe from this group, send an email to:
> perl-python-unsubscribe@yahoogroups.com
>
> Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service.
>
> ________________________________
>
200509: Making System Calls (with unix and Python 2.4)
To make a system call, use:
subprocess.Popen([r"gzip","-d", "x.txt.gz"]).wait()
The subprocess module is available only since Python 2.4, and it is
intended to replace several other older ones:
os.system
os.spawn*
os.popen*
popen2.*
commands.*
The subprocess.Popen([...]) creates a object. The .wait() makes the
code wait until the system call finished. To not wait for it, simply
use:
subprocess.Popen([r"gzip","-d", "x.txt.gz"])
To make a system call and get its output, do, for example:
last_line=subprocess.Popen([r"tail","-n 1", "x.txt"],
stdout=subprocess.PIPE).communicate()[0]
The “communicate()” automatically makes the call wait.
In the following example, we make a system call to decompress a gzip
file (and wait for it), then get the last line of the file using
“tail”, then gzip it again.
# -*- coding: utf-8 -*-
# Python
import subprocess
subprocess.Popen([r"gzip","-d", "x.txt.gz"]).wait()
last_line=subprocess.Popen([r"tail","-n 1", "x.txt"],
stdout=subprocess.PIPE).communicate()[0]
subprocess.Popen([r"gzip","x.txt"]).wait()
print last_line
Reference: http://www.python.org/doc/2.4/lib/module-subprocess.html
--------
this post is archived at
http://xahlee.org/perl-python/system_calls.html
☄
Here's a example script that performs a simple task. The task is:
suppose you a bunch of files named
access_log.1.gz
access_log.2.gz
access_log.3.gz
these are weekly log files with their date range contained in the file.
you want to rename these files to indicate their date range.
20050101-20050105.gz
20050105-20050110.gz
20050110-20050115.gz
The following code is a pure Python script that does the job.
-------------------------------
# -*- coding: utf-8 -*-
# Python
import os,re,gzip
# go to the current dir
# get a list of file names of the form access_log.*.gz
# for each file
# unzip it (make sure it doesn't override existing file)
# find the date from the first line and last line
# rename the file such as 0605-0612.log.gz
# note: this description does not necessarily match the code
# but gives a basic view
mydir= '/Users/t/logs/'
mon={
'Jan':'01',
'Feb':'02',
'Mar':'03',
'Apr':'04',
'May':'05',
'Jun':'06',
'Jul':'07',
'Aug':'08',
'Sep':'09',
'Oct':'10',
'Nov':'11',
'Dec':'12',
}
def getdate (li):
li = li.split(' ')[3][1:12];
datelist = li.split('/');
dd=datelist[0]
mmm=datelist[1]
yyyy=datelist[2]
return str(yyyy) + str(mon[mmm]) + str(dd)
for ff in os.listdir(mydir):
if re.search(r'access_log\.\d\.gz',ff):
print ff
unzippedname=ff[0:-3]
inF = gzip.GzipFile(ff, 'rb');
s=inF.readlines()
inF.close()
start_date = getdate(s[0])
end_date = getdate(s[-1])
new_file_name= start_date + '-' + end_date + '.txt'
outF = file(new_file_name, 'w');
outF.write(''.join(s))
outF.close()
# do compression of the new files
# delete the original gzip file
s[0:-1]=[]
-------------------------------
This script does not use system call, so it is system independent and
can be used in Unix or Windows. The problem with this script is that
usually log files are very large in size, usually over 100 Mega bytes.
Because this script reads the whole file into memory, it is very slow
and memory intensive.
a solution is to call OS programs such as gzip to do the decompression,
and use unix “head” and “tail” to get the first or last line of the
file for extraction of the date range. We will study that and post
solution later.
Xah
xah@...
∑ http://xahlee.org/
☄
Python Doc Problem Example: os.system
Xah Lee, 2005-09
today i'm trying to use Python to call shell commands. e.g. in Perl
something like
output=qx(ls)
in Python i quickly located the the function due to its well-named-ness:
import os
os.system("ls")
however, according to the doc
http://www.python.org/doc/2.4/lib/os-process.html the os.system()
returns some esoteric unix thing, not the command output. The doc
doesn't say how to get the output of the command.
by chance someone told me that in python 2.4 the os.system is
supplanted by subprocess.call(), but this isn't mentioned in the doc!
upon finding the new doc location
http://www.python.org/doc/2.4/lib/module-subprocess.html i'm told that
this module replaces:
os.system
os.spawn*
os.popen*
popen2.*
commands.*
interesting. Since i'm not Python expert, i like to look at these. But
the doc gives ample gratis links to OpenSource this or that or links to
remote book i don't really care about, but here there's no link.
Problem summary:
* does not focus on the task users need to do. Instead, the doc is
oriented towards tech geeking.
* does not inform the reader at the right place where a new function is
replacing the old.
* does not provide relevant cross-links. (while provding many
irrelevant links because of OpenSource or Tech Geeking fanaticism)
Solution Suggestion:
* Add examples.
* Add cross-links to relevant modules.
* Mention and add link at the right place supplanted functions.
* Orient the doc to tasks and manifest functionalities. Think like
functional programing: input and output specification, and document
them. This will help focus and precision in the doc. Avoid essay-like
descriptions. Avoid drilling on remotely related tech/unix/C esoterica.
e.g. Do not mention as a documentation how they are implemented.
Mention implementation on the side if necessary. This way, the language
becomes focused as a independent tool (e.g. Mathematica, Java, Scheme,
emacs) (which may provide ample capabilities to interface/connect to
other technologies), instead of heavily intermixed and dependent with a
bunch of other things (unix things: Perl, Apache, shells).
--------------------------------------------------
This article is archive at:
http://xahlee.org/UnixResource_dir/writ/python_doc_os.html
☄
On 02/09/05, xah lee <xah@...> wrote:
> 20050830
>
> Here is a example of how to decompress a gzip file using Python.
>
> # -*- coding: utf-8 -*-
> # Python
>
> import gzip
> inF = gzip.GzipFile("/Users/t/access_log.1.gz", 'rb');
> s=inF.read()
> inF.close()
>
> outF = file("/Users/t/access_log.1", 'wb');
> outF.write(s)
> outF.close()
>
>
> Here is a example of compressing a gzip file using Python.
>
> # -*- coding: utf-8 -*-
> # Python
>
> import gzip
>
> inF = file("x.txt", 'rb');
> s=inF.read()
> inF.close()
>
> outF = gzip.GzipFile("x.txt.gz", 'wb');
> outF.write(s)
> outF.close()
>
>
> For more detail, see
> http://python.org/doc/2.4.1/lib/module-gzip.html
>
> Perl does not come with a gzip module bundled, but one can easily call
> the unix gzip with qx. (assuming you are on unix)
>
> # perl
>
> qx(gzip x.txt);
> qx(gzip -d x.txt.gz);
>
>
> Several module are available online to do compression. IO::Zlib,
> PerlIO::gzip.
>
> ----------
> this post is archived at:
> http://xahlee.org/perl-python/gzip.html
>
> ☄
>
You could shorten the Perl unix calls by using the backtick operator:
`gzip x.txt`
`gzip -d x.txt.gz`
And for the same behaviour as the Python module, where you actually
have the decompressed stream and not a temporary file, you would want
to use (when compressing or decompressing) the -c switch with gzip.
This instructs it to put its output onto the console (on std::out)
which in this case you would be capturing by simply assigning the
return from the backtick.
my $compressedFile = `gzip -c x.txt`;
my $decompressedFile = `gzip -dc x.txt`;
I would probably however prefer to use a real Perl module for the sake
of portability though. After all - you are also importing a module
into Python. Python has the advantage here of a more comprehensive
core module library, where in Perl you would need to install it.
Orion
--
http://orionrobots.co.uk - Build Robots
20050830
Here is a example of how to decompress a gzip file using Python.
# -*- coding: utf-8 -*-
# Python
import gzip
inF = gzip.GzipFile("/Users/t/access_log.1.gz", 'rb');
s=inF.read()
inF.close()
outF = file("/Users/t/access_log.1", 'wb');
outF.write(s)
outF.close()
Here is a example of compressing a gzip file using Python.
# -*- coding: utf-8 -*-
# Python
import gzip
inF = file("x.txt", 'rb');
s=inF.read()
inF.close()
outF = gzip.GzipFile("x.txt.gz", 'wb');
outF.write(s)
outF.close()
For more detail, see http://python.org/doc/2.4.1/lib/module-gzip.html
Perl does not come with a gzip module bundled, but one can easily call
the unix gzip with qx. (assuming you are on unix)
# perl
qx(gzip x.txt);
qx(gzip -d x.txt.gz);
Several module are available online to do compression. IO::Zlib,
PerlIO::gzip.
----------
this post is archived at:
http://xahlee.org/perl-python/gzip.html
☄