Copyright 2008 David Dodds
'Generating myfoo:analysis text content'
40,60,89,70,27,09,60,33,25,71,50,61,75, (19)95,(20)07
(heights, dates in the data set)
time knowledge and time terms: events, sequence(s) start-mid(dle)-end,
continues,
year, decade, (steep,shallow)trend[up(ward),flat,down(ward)],
early-mid-late, begin-mid(dle)-end, (no,little,some,much, very)
variability, decline-advance,retard-promote, decline-growth,
flatten-expand
perform data analysis of available data (heights vs dates)
what kinds of analysis can be done with that kind of data? slopes, trends
findings: date sequence is 1995 through 2007, decades=90's, (20)00's
heights range from 09 through 89
height-year pairs: 40/95, 60/96 ,89/97 ,70/98 ,27/99 ,09/00 ,60/01
,33/02 ,25/03 ,71/04 ,50/05 ,61/06 ,75/07
((sla means sequence look-ahead))
growth of [widget], "widget growth"
sequence: first item=[yr]95, value:40; decade=90,index=1
last 90's item=[yr]99, value:27, index=5
item index=6[yr](20)00, value:09, index=6,
sla=decade=(20)00
last item index=13[yr](20)07, value:75
sequence start= 1995, = mid 90's. " starting mid-90's, "
slope analysis= (19)95 through (19)97, trend=upward, "upward trend",
of {growth of [widget]}
slope m=89-40/97-95=49/2=+24.5
sequence loc= 1998-99, = late 90('s). " late-90's, "
slope analysis= (19)98 through (19)99, trend=downward, "downward
trend", of {growth of [widget]}
slope m=27-70/99-98=-43/1=-43 . -43 is a steep downward slope, aka
'steep decline'.
(19)99 is 'end of decade' (of 1990's).
sequence loc= 2000-2001, = early 2000('s). " early-2000's, "
slope analysis= (20)00 through (20)01, trend=upward, "upward trend",
of {growth of [widget]}
slope m=60-09/01-00=+51/1=+51 . +51 is a steep upward slope, aka
'steep incline'.
(start of (20)00 decade value=lowest magnitude of entire sequence)
sequence loc= 2002-2003, = early 2000('s). " early-2000's, "
slope analysis= (20)05 through (20)07, trend=upward, "upward trend",
of {growth of [widget]}
slope m=25-33/03-02=-08/1=-08 . -08 is a shallow downward slope, aka
'shallow decline'.
sequence loc= 2005-2007, = early 2000('s). " early-2000's, "
slope analysis= (20)05 through (20)07, trend=upward, "upward trend",
of {growth of [widget]}
slope m=25-33/03-02=-08/1=-08 . -08 is a shallow downward slope, aka
'shallow decline'.
sequence loc= 2000-2007, = muchOf =yr((first)7 of 10). " much of
(20)00's, "
slope analysis= (20)00 through (20)07, trend=sequence(steep upward,
mid downward, shallow upward, steep upward), frequency of
slope-change= substantial count of articulation, "substantial variability"
sequence loc= 2005-2007, = mid to later (20)00{'s}. " later in
decade of (20)00's, "
slope analysis= (20)05 through (20)07, trend=sequence(upward, upward),
"upward trend"
Near=Distancefunction(89-75), maxvalue=89, yrOf(maxvalue)=97 "near
record (value)", "record in '97"
From the sequences of analysis (shown above) we get the following
strings :
"widget growth" " starting mid-90's, " "upward trend" "
late-90's, " "downward trend"
'steep decline' 'end of decade' " early-2000's, " "upward trend",
'steep incline'
"early-2000's, " "downward trend", 'shallow decline'
" early-2000's, " "upward trend" 'shallow decline'.
" much of (20)00's, " "substantial variability"
" later in decade of (20)00's, " "upward trend"
"near record (value)", "record in '97"
We could program the computer to just brain-dead abut the sequence of
strings and call the joke the 'output'. If we were to program the
computer to do some processing which at least appears to be closer to
'making real English' we would have the program examine these string
sequences and attempt to make some decent sounding English from them.
The first cut might sound decidedly like a 'Dick and Jane' (reader):
The topic is widget growth, starting mid-90's, upward trend. (1)
Late-90's downward trend, steep decline, end of decade. (2)
Early-2000's, upward trend, steep incline. (3)
Early-2000's, downward trend, shallow decline. (4)
Early-2000's, upward trend, shallow decline. (5)
Much of (20)00's, substantial variability. (6)
Later in decade of (20)00's, upward trend. (7)
Near record (value), record in '97. (8)
More processing could be applied to sentences 1 - 8 to improve the
sophistication level of the English.
Sentence 1 could be processed into 'Starting mid-90's upward trend of
widget growth.'
Sentence 2 could be processed into 'Late-90's downward trend, with
steep decline, at end of decade.'
One of the rules or processes we learned as speakers of English was to
graduate from the 'Dick and Jane' sentence level was to combine simple
sentences into compound sentences and to collapse 'commonalities' such
as topic or subject name explicitly repeated by simple abutment of
'Dick and Jane' sentences into a single sentence instance providing
coverage of all the points in that collection of simple sentences. A
processor which watches for 'commonalities in a sequence of simple
sentences' would detect sentences 3-5 as candidates for such compression.
Early-2000's, upward trend, steep incline. (3)
Early-2000's, downward trend, shallow decline. (4)
Early-2000's, upward trend, shallow decline. (5)
A pattern-detector helps recognize that sentences 3-5 can be
collapsed. The pattern is
"Early-2000's" + $magnitude("trend") + $magnitude("in | de") + "cline").
Furthermore, when the program 'knows' (is programmed to use) about the
ways in which the variable '$magnitude' has 'values' (such as being
'numeric' in some way), then the program can 'know' (by being
programmed in a particular way), such as in this case, that
"$magnitude("trend")" (ie numeric sequences) can exhibit what the
mathematicians call 'variability' and a linear regression curve fitted
using moving end-points may well exhibit 'inflection' which is change
of 'slope-steepness' (sort of like absolute value of slope angle) but
also slope-sign (ie upwards or downwards). And all that jazz
compresses the simple sentences 3-5 into sentence 6.
Much of (20)00's with substantial variability. (6)
If the program makes sentence 6 the first part of a compound sentence
(with 7 and 8) by inserting a comma replacing the period of sentence 6
(and 7) we get
Much of (20)00's with substantial variability, later in decade of
(20)00's, upward trend, near record (value), record in '97. (9)
The second instance of the text "of (20)00's," would be removed in a
sophistication analysis of sentence 9 (remove exact repetition)
Much of (20)00's with substantial variability, later in decade upward
trend to near record in '97. (10)
In sentence 10 we see '(value), record' was removed because that
'record' was repetitive and implied things, words in parentheses, are
removed.
Much of (20)00's with substantial variability, later in decade upward
trend to near record (in '97) widget growth. (11)
Sentence 11, which is pretty close to the last sentence appearing in
the myfoo:analysis content text, was made from sentence 10 by making
the date of the record "in '97" into a so-called parenthetical
expression, and also by affixing to the end of the compound sentence
the string ("widget growth") that the program is using as subject of
the analysis paragraph.
Application of the sophistication analysis program on sentence 10
determines that ' "record" what' was not explicitly stated. In other
words what was the data domain that was (highest=) record value? The
domain was named by the subject string.
In a future episode we will look at further examples and discussion of
what I call "linguification", processes which transform ideas or
concepts into language. While humans can "think" using words/language
they/it are/is really a concomitant activity which results from
linguification processing acting upon the actual
thoughts/ideas/concepts. These latter are not linguistic. You do not
govern your walking along a crowded sidewalk by listening to an
internal voice telling you how/where to walk, nor about (vocalizing
(inwardly)) about what deliberations to have to decide how to walk.
You are able to "see" without accompanying words/language, you are
able to "hear" without accompanying words/language, and you are able
to perform non-trivial physical movements without accompanying
words/language. So what is language about / for then? It is a "tablet"
upon which our mind can write (and erase) and examine its thoughts. It
is as though external, like an actual tablet, as though another 'self'
with which "I" can have a linguistically mediated discussion. The
linguification process is one which transforms the content of
ideas/thoughts/concepts into strongly sequence (ie grammar) oriented
'discreta'. The grammarized expressions of mentation, through the
process of linguification, captures and depicts discrete, less-rich
symbols _representing_ the rich, flowing, perhaps seemingly contiguous
'mind-stuff' that is mentation/its content. It may well be that each
of our minds is populated with actually different 'stuff' in there.
The beauty/"magic" of language is that it (mostly) overcomes any such
differences by more or less correctly transforming the stuff of the
mind into _culturally defined_ symbols. Words in languages have
meaning only in the same way that (fiat) money has (monetary) value
and that is because both are used / defined in terms of _agreement_
among some collection of people / persons that those words or those
certificates / coins have that meaning / usage. Words are the coins of
the mind.
In a previous episode we saw discussion about the colour yellow and
bars in my barchart. It is rather unlikely that the qualia I
experience when I detect / "see" such a yellow barchart bar is
identical in every way to the qualia you experience when you detect /
"see" the exact same yellow barchart bar. By means of the 'magic' of
language we have learned to map the word "yellow" to reference
whatever qualia we each 'have' / 'see' / experience when we 'look at'
/ 'see' the physical object we have been calling 'the yellow barchart
bar'. The word "yellow" comes from the culture, elsewhere it is, for
instance, "gelb" (German). Calling that barchart bar "gelb" does not
alter your visual experience of it. Also notice that from time to time
one hears "I'm at a loss for words. Words can't describe it. I'm
speechless." Which means it is not 100% of the time that one can
generate language to completely / adequately convey the rich mentation
that one "has" / experiences, words / language is a serialization and
discretization of something continuous and simultaneous (in our mind).
In languages like English at least it has to be transmitted in a
serialized-frame. Receivers use their (language community) frame (such
as grammar) to de-serialize the symbols, possibly into recalled
mental-experiences. (When you hear or see the word yellow you are able
to fetch from your memory a _rendition_ of the "yellow" qualia. (if
you so choose) (But, in fact, you can also choose any other
qualia-memory and re-experience it instead. But "you" would 'know'
that this other qualia wasnt "yellow".)).
Linguification and metaphor-forming have some related processes, which
is why they are covered in this group.