Skip to search.

Breaking News Visit Yahoo! News for the latest.

×Close this window

agileDatabases · agileDatabase

The Yahoo! Groups Product Blog

Check it out!

Group Information

? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
Real people. Real stories. See how Yahoo! Groups impacts members worldwide.

Messages

Advanced
Messages Help
Messages 1709 - 1738 of 2744   Oldest  |  < Older  |  Newer >  |  Newest
Messages: Show Message Summaries Sort by Date ^  
#1709 From: Scott Ambler <scottwambler@...>
Date: Thu May 24, 2007 7:11 pm
Subject: Database testing was RE: RE: Questioning Traditional Data Management
scottwambler
Send Email Send Email
 
As with all forms of testing, you need to write tests
which add value.  Validating nullability, constraints,
and RI (typically supported by foreign keys) often
adds significant value because they are
implementations of business rules.  Constraints can
easily be dropped or reworked, therefore it makes
sense to test them.  Nullability is critical to test
for because a not null constraint can also easily be
dropped and sometimes "quasi-nulls" such as empty
strings should also be enforced.

Field length can be a bit questionable to validate
unless there are some interesting rule around length.
Perhaps all values in a column are at least two
characters (arguably a constraint).  Verifying that a
specific varchar column is at least X characters in
size might make sense to do as well in some
situations, although that seems like a trivial test to
me.  Hard to say.

Tests that validate that I can CRUD a business entity,
such as a Customer, are a bit more interesting.  They
arguably validate the table(s) structure supporting
that entity.

A fundamental value of database testing is around
ensuring that applications update the data values
correctly and that functionality implemented in the
database works as expected.  Performance and security
access control are also important things to ensure.

- Scott

--- "Garris, Nicole" <Nicole.Garris@...> wrote:

> Has anyone created tests/regression tests for table
> structures? I.e.,
> data types, nullability, lengths, constraints,
> whether a primary key
> exists and what column(s) its defined on, etc.? (I'm
> not referring to
> stored procedures or other types of code that we
> choose to store within
> the database ...) How did you do it and did you
> consider it worthwhile?
>
>
<snip>

Scott W. Ambler
Practice Leader Agile Development, IBM Methods Group
http://www-306.ibm.com/software/rational/bios/ambler.html


       Get a sneak peak at messages with a handy reading pane with All new Yahoo!
Mail: http://mrd.mail.yahoo.com/try_beta?.intl=ca

#1710 From: Curt Sampson <yahoo@...>
Date: Fri May 25, 2007 12:17 am
Subject: Re: Re: Database Eye for the Application Guy
cjstokyo
Send Email Send Email
 
On Mon, 21 May 2007, cathyfarrell_ct wrote:

> Regarding Curt's question "Does a general understanding suffice in
> most cases? Or must one learn the ins and outs of the optimizer for
> the particular DBMS that one is using?", even within a particular
> DBMS, the factors affecting performance can vary from version to
> version. Check out Tom Kyte's Things You "Know" presentation in the
> archives of the New York Oracle Users Group (www.nyoug.org).

The presentation is available at

      http://www.nyoug.org/Presentations/2005/kyte_you_know.pdf

to save those of you who are interested the hassle of digging through
the site to find it.

And what did I learn from this presentation? The general knowledge is
critical, and specific "knowledge" can hurt more than help if it's
specific knowledge of something that was true only for another version
of the DB, if it was ever true at all.

Applying small unconnected bits of specific knowledge can hurt as much
as it can help. You need to build a model of how your DBMS and the
database its running work, test that your model matches what's really
going on, and use the model to predict specific changes that will help
fix your performance problems.

cjs
--
Curt Sampson         <cjs@...>         +81 90 7737 2974
               http://www.starling-software.com
The power of accurate observation is commonly called cynicism
by those who have not got it.    --George Bernard Shaw

#1711 From: "Jennifer Riefenberg" <jennifer@...>
Date: Thu May 24, 2007 5:16 pm
Subject: RE: RE: Questioning Traditional Data Management
dba401k
Send Email Send Email
 
We have been working in Agile for the last 3+ years and do not test the
table structures - no, it has not been found worthwhile as once the table is
"in production" these things do not change in the same way that code does,
db changes are then refactorings.  I guess in some environments it could be
worthwhile, however, I am not sure where.  Stored procedures, etc., yes, are
tested, but the actual, physical structures, no.



Another reason is that some of our development db environments have NO
CONSTRAINTS/RI so that the developers can have their junit databases
refreshed quickly (constraints, triggers, etc. add a lot of overhead when
re-setting the database for the next test suite).  If we have problems with
the defaults/nullability/etc. - it is in the code and is caught by our other
testing methods.   By the way, the lack of structural testing has not caused
any errors in the overall system at all.  Just my 2cents:-)



Jennifer Riefenberg, DBA

  <mailto:Jennifer@...> Jennifer@...

   _____

From: agileDatabases@yahoogroups.com [mailto:agileDatabases@yahoogroups.com]
On Behalf Of Garris, Nicole
Sent: Thursday, May 24, 2007 10:57 AM
To: agileDatabases@yahoogroups.com
Subject: RE: [agileDatabases] RE: Questioning Traditional Data Management



Has anyone created tests/regression tests for table structures? I.e.,
data types, nullability, lengths, constraints, whether a primary key
exists and what column(s) its defined on, etc.? (I'm not referring to
stored procedures or other types of code that we choose to store within
the database ...) How did you do it and did you consider it worthwhile?

________________________________

From: agileDatabases@ <mailto:agileDatabases%40yahoogroups.com>
yahoogroups.com
[mailto:agileDatabases@ <mailto:agileDatabases%40yahoogroups.com>
yahoogroups.com] On Behalf Of Garris, Nicole
Sent: Thursday, May 24, 2007 7:19 AM
To: agileDatabases@ <mailto:agileDatabases%40yahoogroups.com>
yahoogroups.com
Subject: [agileDatabases] RE: Questioning Traditional Data Management

Someone please give me a specific example of how one would regression
test a table structure. Programs are code which can be executed.
Regression testing consists of executing the code and ensuring that
specified outputs result from specified inputs. As a DBA, you won't
catch me testing the DBMS code-that's Oracle's/Microsoft's/etc. job.

I inspect the physical database structures (tables, indexes,
constraints, etc.) to ensure I didn't make a mistake coding the DDL. But
they are best tested by running (testing) the programs. The program
tests catch far more types of errors than my coding errors (e.g., design
errors, requirements errors, etc.).

Sorry, but programs and data sources ARE different. Only one is
executable.

[Non-text portions of this message have been removed]





[Non-text portions of this message have been removed]

#1712 From: Andrew Gregovich <andrew_gregovich@...>
Date: Thu May 24, 2007 12:57 pm
Subject: Re: Questioning Traditional Data Management
andrew_grego...
Send Email Send Email
 
For me, data governance means:
1) Setting standards/conventions in terms of data-modelling, formatting and such
(assuming that they are experts in those areas), and
2) Ensuring that data can be consolidated and presented to management in a
consistent and meaningful manner

Regarding 1), I agree with Scott that software developers should not be
constrained by the traditional waterfall and bureaucratic processes, but at some
stages in the lifecycle the data people should do a QA on the developers and
slap their wrists if developers do not adhere to the defined standards for no
particular reason (which happens quite often in reality). There are always
exceptions to rules and this should be resolved in a collaborative manner.

Regarding 2) I remember that Scott previously mentioned that he wasn't a great
fan of the "single version of the truth". Here I somewhat disagree, since there
is always some crucial data which must tally across the enterprise. The best
example is money - if you don't get your accounts to balance, it's not only that
it could cause huge problems both internally and externally (i.e. in terms of
legal responsibilities).

IMHO, one of the biggest problems in deep hierarchical structures is dispersion
of knowledge across layers, of those small details which are not visible to your
manager. In politics-ridden organizations (basically every medium or large
company) I find that there is way too much energy spent on in-fighting, which
can only be avoided by presenting facts, which is essentially data. OK, here you
can argue that in order for data to become knowledge you have to interpret it
and different people may have different interpretations, but in the end if you
don't have clear and valid data, you have nothing to begin with. On the other
hand, the top management does not have the time required to understand the
differences why Dept A does it this way and Dept B does it that way and why
those 2 sources contradict each other. Thus, you (as well as the other people
who report to them) have to present them very concise and clear information and
only then they can actually act upon it
  effectively.

I suppose that this is more of the "knowledge-management" than just
"data-management" domain, but to me it's clear that a certain level of evolution
of the data management community is necessary for it to survive and thrive.

Andrew


----- Original Message ----
From: Scott Ambler <scottwambler@...>
To: ambysoft@yahoogroups.com; Agile Articles <agilearticles@yahoogroups.com>;
agiledatabases@yahoogroups.com
Sent: Wednesday, May 23, 2007 8:28:39 AM
Subject: [agileDatabases] Questioning Traditional Data Management













             My May newsletter, posted at

http://www.ddj. com/dept/ architect/ 199700857? cid=Ambysoft

, examines some of the common assumptions made by the

traditional data management community and proposes

agile alternatives.  Last September, DDJ ran a data

quality survey which discovered that the majority of

IT organizations recognized that they had data quality

problems but were struggling to address them

effectively. I believe that the primary reason for

this is because data management groups have based

their processes on assumptions which prove to be

questionable at best and downright false at worst.

These assumptions include:

1. It's expensive to evolve a database schema.

Reality: Database refactoring is straightforward.

2. You need to model the details up front.  Reality:

You should do a little bit of modeling at a high-level

then identify details on a JIT basis.

3. You need to write everything down. Reality: A

test-first approach works much better.

4. You need to take a data-driven approach. Reality:

Data is only one of many important issues, a

usage-driven approach seems to work far more

effectively.

5. Review and inspections are an effective way to

ensure quality.  Reality: Database regression testing

is much more effective.

6. They need to govern data. Reality: Someone needs

to, but you need to take a collaborative approach, not

a command-and- control approach.



Hope the newsletter proves to be thought provoking.

Please pass it along to any of your data friends.



- Scott



Scott W. Ambler

Practice Leader Agile Development, IBM Methods Group

http://www-306. ibm.com/software /rational/ bios/ambler. html



Be smarter than spam. See how smart SpamGuard is at giving junk email the boot
with the All-new Yahoo! Mail at http://mrd.mail. yahoo.com/ try_beta? .intl=ca














<!--

#ygrp-mlmsg {font-size:13px;font-family:arial, helvetica, clean, sans-serif;}
#ygrp-mlmsg table {font-size:inherit;font:100%;}
#ygrp-mlmsg select, input, textarea {font:99% arial, helvetica, clean,
sans-serif;}
#ygrp-mlmsg pre, code {font:115% monospace;}
#ygrp-mlmsg * {line-height:1.22em;}
#ygrp-text{
font-family:Georgia;
}
#ygrp-text p{
margin:0 0 1em 0;}
#ygrp-tpmsgs{
font-family:Arial;
clear:both;}
#ygrp-vitnav{
padding-top:10px;font-family:Verdana;font-size:77%;margin:0;}
#ygrp-vitnav a{
padding:0 1px;}
#ygrp-actbar{
clear:both;margin:25px 0;white-space:nowrap;color:#666;text-align:right;}
#ygrp-actbar .left{
float:left;white-space:nowrap;}
.bld{font-weight:bold;}
#ygrp-grft{
font-family:Verdana;font-size:77%;padding:15px 0;}
#ygrp-ft{
font-family:verdana;font-size:77%;border-top:1px solid #666;
padding:5px 0;
}
#ygrp-mlmsg #logo{
padding-bottom:10px;}

#ygrp-vital{
background-color:#e0ecee;margin-bottom:20px;padding:2px 0 8px 8px;}
#ygrp-vital #vithd{
font-size:77%;font-family:Verdana;font-weight:bold;color:#333;text-transform:upp\
ercase;}
#ygrp-vital ul{
padding:0;margin:2px 0;}
#ygrp-vital ul li{
list-style-type:none;clear:both;border:1px solid #e0ecee;
}
#ygrp-vital ul li .ct{
font-weight:bold;color:#ff7900;float:right;width:2em;text-align:right;padding-ri\
ght:.5em;}
#ygrp-vital ul li .cat{
font-weight:bold;}
#ygrp-vital a {
text-decoration:none;}

#ygrp-vital a:hover{
text-decoration:underline;}

#ygrp-sponsor #hd{
color:#999;font-size:77%;}
#ygrp-sponsor #ov{
padding:6px 13px;background-color:#e0ecee;margin-bottom:20px;}
#ygrp-sponsor #ov ul{
padding:0 0 0 8px;margin:0;}
#ygrp-sponsor #ov li{
list-style-type:square;padding:6px 0;font-size:77%;}
#ygrp-sponsor #ov li a{
text-decoration:none;font-size:130%;}
#ygrp-sponsor #nc {
background-color:#eee;margin-bottom:20px;padding:0 8px;}
#ygrp-sponsor .ad{
padding:8px 0;}
#ygrp-sponsor .ad #hd1{
font-family:Arial;font-weight:bold;color:#628c2a;font-size:100%;line-height:122%\
;}
#ygrp-sponsor .ad a{
text-decoration:none;}
#ygrp-sponsor .ad a:hover{
text-decoration:underline;}
#ygrp-sponsor .ad p{
margin:0;}
o {font-size:0;}
.MsoNormal {
margin:0 0 0 0;}
#ygrp-text tt{
font-size:120%;}
blockquote{margin:0 0 0 4px;}
.replbq {margin:4;}
-->










________________________________________________________________________________\
____
Need Mail bonding?
Go to the Yahoo! Mail Q&A for great tips from Yahoo! Answers users.
http://answers.yahoo.com/dir/?link=list&sid=396546091

[Non-text portions of this message have been removed]

#1713 From: Curt Sampson <yahoo@...>
Date: Fri May 25, 2007 12:03 am
Subject: Re: RE: Questioning Traditional Data Management
cjstokyo
Send Email Send Email
 
On Thu, 24 May 2007, Garris, Nicole wrote:

> Programs are code which can be executed.

And your DDL is not code that's executed?

> I inspect the physical database structures (tables, indexes,
> constraints, etc.) to ensure I didn't make a mistake coding the DDL.

Make a mistake doing _what_ the DDL? Given your previous implication
that DDL is not code, I find your choice of verb interesting.

As for inspecting the DDL for errors, you already have an automated tool
that can, much more rapidly and accurately than you, inspect the DDL
for many kinds of errors, especially syntax errors and certain kinds of
inconsistency.

With a little help, you can use that tool to inspect for even more types
of errors. Once you start down this route, you find yourself (well,
all right, I find myself) doing pretty much the same sort of thing as
programmers do with unit tests.

> Sorry, but programs and data sources ARE different. Only one is
> executable.

Indeed. But, we're not talking about testing the data source; we're
talking about testing the specification for the data source's behaviour,
which, once you get about the relatively minor details of what sort of
language in which you're writing the specification, is no different from
a specification for an application program's behaviour.

As for your later question of whether or not it's worth it: I do it
because when I put on my business owner's hat, it increases the quality
of my product, saves me money and makes me more competitive.

One way to appreciate how this could work for you might be to take an
extreme programming (XP) workshop along the lines of the ones run by
Industrial Logic or whomever. (A better way would be to spend a couple
of months working in a good XP project, but that opportunity can be more
difficult to find.)

One last thing to think about: as both an expert programmer and
an expert DBA, I find the two roles to be identical: they're both
just development of computer software. Someone who's "only" a Java
programmer, or "only" a DBA is a developer who's merely familiar
with only one small area of the science, art and craft of software
development, and possibly someone who's not willing to learn more about
software development than their own narrow area.

cjs
--
Curt Sampson         <cjs@...>         +81 90 7737 2974
               http://www.starling-software.com
The power of accurate observation is commonly called cynicism
by those who have not got it.    --George Bernard Shaw

#1714 From: Cameron Laird <claird@...>
Date: Thu May 24, 2007 2:19 pm
Subject: Re: Questioning Traditional Data Management
Cameron_Laird
Send Email Send Email
 
On Thu, May 24, 2007 at 06:59:10AM -0400, Scott Ambler wrote:
> 		 .
>
>
> >
> > When *I* argue in data-modelling sessions for
> > "writing
> > everything down", I have just a couple of specific
> > techniques
> > in mind:
> > A.  comments in executable artifacts (including
> >     those regression tests!).  I rarely have a
> >     "description VARCHAR(35) ..." without a
> >     nearby, "-- Note that specification <URL:
> >     http://something-or-other.host/rev13.html >
> >     guarantees that 35 is the appropriate length
> >     for 'description'."
>
> You could probably capture that as a test.
			 .
			 .
			 .
Indeed!  I think I'm making a different point, though;
perhaps you can help me best express it.

In the example at hand, description is a VARCHAR(x); all
reasonable observes agree on that.  This has a couple of
implications:
A.  the system as a whole needs to handle x
     correctly--that is, behave in specified
     ways for data of length 0, 1, ..., x - 1,
     x, x + 1, ...  It's entirely reasonable
     to codify this as an executable test(s).
B.  The choice of x deserves exogenous
     commentary.  It might be the result of
     a "political" process, or a calculation
     based on analysis, or ...  My "bottom
     line" is this:  in 2007, I'll need to say,
     "we currently are limited to 35 for
     description; changing that will cost only
     q engineer minutes, but it *will* be a
     cost.  Recall that on 10 October 2003,
     John made the decision that we should
     adopt 35 as a fixed limit" rather than,
     "well, it's 35 now, but no one remembers
     why."

     Those kinds of comments or annotations
     have saved me hours of thrashing around
     in some organizations.

#1715 From: "Todd Carrico" <todd.carrico@...>
Date: Thu May 24, 2007 3:15 pm
Subject: RE: looking for consulting services
tmcarrico
Send Email Send Email
 
> > On Tue, 22 May 2007, Sonya Lowry wrote:
>
> > Due to our limited budget, however, we needed also to rely upon him
to
> > translate that logical model to a physical model. His approach was
to
> > do a direct translation that has put him at odds with the java
> > developers who must somehow map the ORM to the schema and achieve
some
> > acceptable level of performance. Complaints about number of joins
and
> > the complexity of the model are rampant now.
>
> It could be that your Java developers are undermining a perfectly fine
> relational model by attempting to replace it with an "object model."
> Developers not familiar with relational models tend to do this from
time
> to time. Not everything has to be or even should be objects.
>
> My life has been a lot simpler and more efficient since I started just
> doing relational stuff directly instead of trying to shove things
> through complicated and inefficient object-relational translation
> frameworks.
>
> cjs
> --

This is why I am a true believer in logic at the DB level... Not all
logic, but not that absence of it either.

I find it very useful to think of the proc layer of the DB objects as
the ORM layer that most Java developers think about.  From the
developers perspective the stored procedure is the data bucket.  This
works amazingly well in scenarios where the developer only needs to
populate their objects.  Updating the objects requires stored procedures
that unwind the "object" into the "relational".  Again SQL is optimized
for this type of work as well.

Restricting data access to procedures allows the DBA to alter data
storage underneath the procedures without requiring app developers to
make adjustments to "plumbing" code.  The procedures are in essence the
contract between the application and the data.  Good fences (er..
contracts) make good neighbors :)

Todd Carrico | Technical Architect
www.match.com

#1716 From: "lisakatzenmeier" <lisa.katzenmeier@...>
Date: Thu May 24, 2007 8:18 pm
Subject: Looking for Mid-Level DBA
lisakatzenmeier
Send Email Send Email
 
Mid-Level Oracle DBA

We are looking for a mid-level Oracle DBA. We need a DBA who will:

--Tune our application
--Help write complex SQL
--Design tables for new parts of the system
--Review everything we've done so far and make it better
--Maintain backup and recovery procedures
--proactively identify and resolve database issues
--be a valued member of an agile team

If so, we'd like you to join our team.

We're ePlan Services Inc. and we're the leader in providing Internet-
based 401k plans to small companies. See more about us at
www.eplanservices.com. We have a world-class Java/Oracle development
team in DTC area of Denver, Colorado. We are committed to agile
software development.

Here are a few more things we're looking for:
--At least 2 - 3 years experience with Oracle with some production
DBA experience.
--Solid SQL and PL/SQL skills.
--Hands-on experience running high-availability databases.
--UNIX administration skills are a big plus but are not required.
--Java knowledge a plus.
--Strong written and oral communication skills.
--Demonstrated track record of successful system and project
implementations.
--Strong troubleshooting skills.
--Quick Learner.
--Great Attitude.
--US citizenship or Green Card is also a requirement for this
position.
--A Bachelor's Degree in a technical field.

Please forward your resume to: jobs@...
No phone calls please.

#1717 From: Curt Sampson <yahoo@...>
Date: Fri May 25, 2007 5:03 am
Subject: RE: RE: Questioning Traditional Data Management
cjstokyo
Send Email Send Email
 
On Thu, 24 May 2007, Jennifer Riefenberg wrote:

> Another reason is that some of our development db environments have NO
> CONSTRAINTS/RI so that the developers can have their junit databases
> refreshed quickly (constraints, triggers, etc. add a lot of overhead
> when re-setting the database for the next test suite).

I've found that starting a new transaction before every unit test and
rolling it back when the unit test is complete works very efficiently
for me. Of course, then you do need to move tests that need to do
multiple transactions into a different part of the "test world," as it
were, but there are often a lot fewer of those, and they tend to take
longer to run, anyway.

cjs
--
Curt Sampson         <cjs@...>         +81 90 7737 2974
               http://www.starling-software.com
The power of accurate observation is commonly called cynicism
by those who have not got it.    --George Bernard Shaw

#1718 From: Sigur?ur Jonsson <sigjons@...>
Date: Fri May 25, 2007 8:00 am
Subject: RE: Questioning Traditional Data Management
sigjons2002
Send Email Send Email
 
IMHO, one of the biggest problems in deep hierarchical structures is
dispersion of knowledge across layers, of those small details which are not
visible to your manager. In politics-ridden organizations (basically every
medium or large company) I find that there is way too much energy spent on
in-fighting, which can only be avoided by presenting facts, which is
essentially data. OK, here you can argue that in order for data to become
knowledge you have to interpret it and different people may have different
interpretations, but in the end if you don't have clear and valid data, you
have nothing to begin with. On the other hand, the top management does not
have the time required to understand the differences why Dept A does it this
way and Dept B does it that way and why those 2 sources contradict each
other. Thus, you (as well as the other people who report to them) have to
present them very concise and clear information and only then they can
actually act upon it
effectively.


You also dont know if people are interprating data differently or if the
figures are different because of "bad" data in that situation.  So the
knowledge you gain could be false, built upon same interpretation of
different data instead of different interpretation of the same data.


   _____

From: agileDatabases@yahoogroups.com [mailto:agileDatabases@yahoogroups.com]
On Behalf Of Andrew Gregovich
Sent: Thursday, May 24, 2007 12:57 PM
To: agileDatabases@yahoogroups.com
Subject: Re: [agileDatabases] Questioning Traditional Data Management



For me, data governance means:
1) Setting standards/conventions in terms of data-modelling, formatting and
such (assuming that they are experts in those areas), and
2) Ensuring that data can be consolidated and presented to management in a
consistent and meaningful manner

Regarding 1), I agree with Scott that software developers should not be
constrained by the traditional waterfall and bureaucratic processes, but at
some stages in the lifecycle the data people should do a QA on the
developers and slap their wrists if developers do not adhere to the defined
standards for no particular reason (which happens quite often in reality).
There are always exceptions to rules and this should be resolved in a
collaborative manner.

Regarding 2) I remember that Scott previously mentioned that he wasn't a
great fan of the "single version of the truth". Here I somewhat disagree,
since there is always some crucial data which must tally across the
enterprise. The best example is money - if you don't get your accounts to
balance, it's not only that it could cause huge problems both internally and
externally (i.e. in terms of legal responsibilities).

IMHO, one of the biggest problems in deep hierarchical structures is
dispersion of knowledge across layers, of those small details which are not
visible to your manager. In politics-ridden organizations (basically every
medium or large company) I find that there is way too much energy spent on
in-fighting, which can only be avoided by presenting facts, which is
essentially data. OK, here you can argue that in order for data to become
knowledge you have to interpret it and different people may have different
interpretations, but in the end if you don't have clear and valid data, you
have nothing to begin with. On the other hand, the top management does not
have the time required to understand the differences why Dept A does it this
way and Dept B does it that way and why those 2 sources contradict each
other. Thus, you (as well as the other people who report to them) have to
present them very concise and clear information and only then they can
actually act upon it
effectively.

I suppose that this is more of the "knowledge-management" than just
"data-management" domain, but to me it's clear that a certain level of
evolution of the data management community is necessary for it to survive
and thrive.

Andrew

----- Original Message ----
From: Scott Ambler <scottwambler@ <mailto:scottwambler%40yahoo.com>
yahoo.com>
To: ambysoft@yahoogroup <mailto:ambysoft%40yahoogroups.com> s.com; Agile
Articles <agilearticles@ <mailto:agilearticles%40yahoogroups.com>
yahoogroups.com>; agiledatabases@ <mailto:agiledatabases%40yahoogroups.com>
yahoogroups.com
Sent: Wednesday, May 23, 2007 8:28:39 AM
Subject: [agileDatabases] Questioning Traditional Data Management

My May newsletter, posted at

http://www.ddj. com/dept/ architect/ 199700857? cid=Ambysoft

, examines some of the common assumptions made by the

traditional data management community and proposes

agile alternatives. Last September, DDJ ran a data

quality survey which discovered that the majority of

IT organizations recognized that they had data quality

problems but were struggling to address them

effectively. I believe that the primary reason for

this is because data management groups have based

their processes on assumptions which prove to be

questionable at best and downright false at worst.

These assumptions include:

1. It's expensive to evolve a database schema.

Reality: Database refactoring is straightforward.

2. You need to model the details up front. Reality:

You should do a little bit of modeling at a high-level

then identify details on a JIT basis.

3. You need to write everything down. Reality: A

test-first approach works much better.

4. You need to take a data-driven approach. Reality:

Data is only one of many important issues, a

usage-driven approach seems to work far more

effectively.

5. Review and inspections are an effective way to

ensure quality. Reality: Database regression testing

is much more effective.

6. They need to govern data. Reality: Someone needs

to, but you need to take a collaborative approach, not

a command-and- control approach.

Hope the newsletter proves to be thought provoking.

Please pass it along to any of your data friends.

- Scott

Scott W. Ambler

Practice Leader Agile Development, IBM Methods Group

http://www-306. ibm.com/software /rational/ bios/ambler. html

Be smarter than spam. See how smart SpamGuard is at giving junk email the
boot with the All-new Yahoo! Mail at http://mrd.mail. yahoo.com/ try_beta?
.intl=ca

<!--

#ygrp-mlmsg {font-size:13px;font-family:arial, helvetica, clean,
sans-serif;}
#ygrp-mlmsg table {font-size:inherit;font:100%;}
#ygrp-mlmsg select, input, textarea {font:99% arial, helvetica, clean,
sans-serif;}
#ygrp-mlmsg pre, code {font:115% monospace;}
#ygrp-mlmsg * {line-height:1.22em;}
#ygrp-text{
font-family:Georgia;
}
#ygrp-text p{
margin:0 0 1em 0;}
#ygrp-tpmsgs{
font-family:Arial;
clear:both;}
#ygrp-vitnav{
padding-top:10px;font-family:Verdana;font-size:77%;margin:0;}
#ygrp-vitnav a{
padding:0 1px;}
#ygrp-actbar{
clear:both;margin:25px 0;white-space:nowrap;color:#666;text-align:right;}
#ygrp-actbar .left{
float:left;white-space:nowrap;}
.bld{font-weight:bold;}
#ygrp-grft{
font-family:Verdana;font-size:77%;padding:15px 0;}
#ygrp-ft{
font-family:verdana;font-size:77%;border-top:1px solid #666;
padding:5px 0;
}
#ygrp-mlmsg #logo{
padding-bottom:10px;}

#ygrp-vital{
background-color:#e0ecee;margin-bottom:20px;padding:2px 0 8px 8px;}
#ygrp-vital #vithd{
font-size:77%;font-family:Verdana;font-weight:bold;color:#333;text-transform
:uppercase;}
#ygrp-vital ul{
padding:0;margin:2px 0;}
#ygrp-vital ul li{
list-style-type:none;clear:both;border:1px solid #e0ecee;
}
#ygrp-vital ul li .ct{
font-weight:bold;color:#ff7900;float:right;width:2em;text-align:right;paddin
g-right:.5em;}
#ygrp-vital ul li .cat{
font-weight:bold;}
#ygrp-vital a {
text-decoration:none;}

#ygrp-vital a:hover{
text-decoration:underline;}

#ygrp-sponsor #hd{
color:#999;font-size:77%;}
#ygrp-sponsor #ov{
padding:6px 13px;background-color:#e0ecee;margin-bottom:20px;}
#ygrp-sponsor #ov ul{
padding:0 0 0 8px;margin:0;}
#ygrp-sponsor #ov li{
list-style-type:square;padding:6px 0;font-size:77%;}
#ygrp-sponsor #ov li a{
text-decoration:none;font-size:130%;}
#ygrp-sponsor #nc {
background-color:#eee;margin-bottom:20px;padding:0 8px;}
#ygrp-sponsor .ad{
padding:8px 0;}
#ygrp-sponsor .ad #hd1{
font-family:Arial;font-weight:bold;color:#628c2a;font-size:100%;line-height:
122%;}
#ygrp-sponsor .ad a{
text-decoration:none;}
#ygrp-sponsor .ad a:hover{
text-decoration:underline;}
#ygrp-sponsor .ad p{
margin:0;}
o {font-size:0;}
.MsoNormal {
margin:0 0 0 0;}
#ygrp-text tt{
font-size:120%;}
blockquote{margin:0 0 0 4px;}
.replbq {margin:4;}
-->

__________________________________________________________
Need Mail bonding?
Go to the Yahoo! Mail Q&A for great tips from Yahoo! Answers users.
http://answers. <http://answers.yahoo.com/dir/?link=list&sid=396546091>
yahoo.com/dir/?link=list&sid=396546091

[Non-text portions of this message have been removed]






[Non-text portions of this message have been removed]

#1719 From: chris@...
Date: Fri May 25, 2007 8:56 am
Subject: RE: RE: Questioning Traditional Data Management
chrisrimmer1970
Send Email Send Email
 
While I've not created tests for table structures, I *have* written tests
for views.  I've found that often a view is the best way to express a
piece of logic which would otherwise show up as fragments of SQL in lots
of places.  Since it is encoding some logic, it needs testing.  I do this
by poking data into the underlying tables and then seeing if the expected
data shows up (or not) in the view.

Chris

====
Chris Rimmer
Development Team Leader
Nominet
+44 (0) 1865 332334
http://chrs.me.uk
====



"Garris, Nicole" <Nicole.Garris@...>
Sent by: agileDatabases@yahoogroups.com
24/05/07 17:57
Please respond to
agileDatabases@yahoogroups.com


To
<agileDatabases@yahoogroups.com>
cc

Subject
RE: [agileDatabases] RE: Questioning Traditional Data Management






Has anyone created tests/regression tests for table structures? I.e.,
data types, nullability, lengths, constraints, whether a primary key
exists and what column(s) its defined on, etc.? (I'm not referring to
stored procedures or other types of code that we choose to store within
the database ...) How did you do it and did you consider it worthwhile?



________________________________

From: agileDatabases@yahoogroups.com
[mailto:agileDatabases@yahoogroups.com] On Behalf Of Garris, Nicole
Sent: Thursday, May 24, 2007 7:19 AM
To: agileDatabases@yahoogroups.com
Subject: [agileDatabases] RE: Questioning Traditional Data Management



Someone please give me a specific example of how one would regression
test a table structure. Programs are code which can be executed.
Regression testing consists of executing the code and ensuring that
specified outputs result from specified inputs. As a DBA, you won't
catch me testing the DBMS code-that's Oracle's/Microsoft's/etc. job.

I inspect the physical database structures (tables, indexes,
constraints, etc.) to ensure I didn't make a mistake coding the DDL. But
they are best tested by running (testing) the programs. The program
tests catch far more types of errors than my coding errors (e.g., design
errors, requirements errors, etc.).

Sorry, but programs and data sources ARE different. Only one is
executable.






[Non-text portions of this message have been removed]




Yahoo! Groups Links

#1720 From: Scott Ambler <scottwambler@...>
Date: Fri May 25, 2007 12:02 pm
Subject: Documentation was Re: Questioning Traditional Data Management
scottwambler
Send Email Send Email
 
--- Cameron Laird <claird@...> wrote:

<snip>
> In the example at hand, description is a VARCHAR(x);
> all
> reasonable observes agree on that.  This has a
> couple of
> implications:
<snip>
> B.  The choice of x deserves exogenous
>     commentary.  It might be the result of
>     a "political" process, or a calculation
>     based on analysis, or ...  My "bottom
>     line" is this:  in 2007, I'll need to say,
>     "we currently are limited to 35 for
>     description; changing that will cost only
>     q engineer minutes, but it *will* be a
>     cost.  Recall that on 10 October 2003,
>     John made the decision that we should
>     adopt 35 as a fixed limit" rather than,
>     "well, it's 35 now, but no one remembers
>     why."
>
>     Those kinds of comments or annotations
>     have saved me hours of thrashing around
>     in some organizations.

Here's a different way to look at things:
Let's assume that you're in an environment where
you've been working in an agile manner for awhile now.
  Granted, that's not your situation but let's assume
that you're working on a system that was developed in
a "fully agile" manner, whatever that would mean.  So:
1. At the time that X was chosen, that must have been
the requirement at the time.
2. There must have been a good reason to go with X.
3. There may have been a previous value, or previous
values, for X before.  There would have been good
reasons for that too, but that's now in the past.
4. Your stakeholders explicitly decided to fund the
development effort to implement X, likely due to point
#2.
5. There's a full test suite in place, so if we break
something we can find out easily.
6. We can easily refactor our UI, app code, db, ... as
appropriate.

Given this situation how much documentation concerning
the motivations for X do I really need?  Absolutely
none.  Even years later I can assume that X was the
best idea at the time because that's what was built
and that it was always easy to move away from X if it
was important enough to do so.

Yes, I think it's an assumption that you need to
document a lot of things.  To be fair the
documentation is being used as a band-aid over the
real problem, the overly bureaucratic and
communication-weak environment that we often see in
the traditional world, but the organization could
choose to deal with the actual problem instead of
throwing more paperwork at it.  But, first we need to
start questioning some of the traditional data
management "wisdom" that we've had foisted upon us all
of these years.

- Scott



Scott W. Ambler
Practice Leader Agile Development, IBM Methods Group
http://www-306.ibm.com/software/rational/bios/ambler.html


       Ask a question on any topic and get answers from real people. Go to Yahoo!
Answers and share what you know at http://ca.answers.yahoo.com

#1721 From: Scott Ambler <scottwambler@...>
Date: Fri May 25, 2007 12:08 pm
Subject: Looking beyond data was Re: Questioning Traditional Data Management
scottwambler
Send Email Send Email
 
--- Cameron Laird <claird@...> wrote:

>
> In the example at hand, description is a VARCHAR(x);
> all
> reasonable observes agree on that.  This has a
> couple of
> implications:
> A.  the system as a whole needs to handle x
>     correctly--that is, behave in specified
>     ways for data of length 0, 1, ..., x - 1,
>     x, x + 1, ...  It's entirely reasonable
>     to codify this as an executable test(s).

Exactly.  This implication is that X isn't just a data
thing, it's also a UI thing, a functionality thing,
...

This is why it's critical to observe that data is only
one of many important issues that we need to deal
with, and that "data folks" need to work closely with
development teams.  I suspect that many people are
struggling with the concept of TDD for databases
because they view it as a separate effort from the
rest of the app.  Yet another traditional assumption
that specialists and groups of specialists are a
preferred way to organize IT departments (I couldn't
cover everything in my newsletter due to lack of
time).

When you're doing TDD you're writing tests that
validate the UI, the functionality, the DB, ... as
needed.  The people building the software need to be
prepared to address all of the issues, not just the
preferred issues of whatever their specialty is.

- Scott

Scott W. Ambler
Practice Leader Agile Development, IBM Methods Group
http://www-306.ibm.com/software/rational/bios/ambler.html


       Be smarter than spam. See how smart SpamGuard is at giving junk email the
boot with the All-new Yahoo! Mail at http://mrd.mail.yahoo.com/try_beta?.intl=ca

#1722 From: Scott Ambler <scottwambler@...>
Date: Fri May 25, 2007 12:18 pm
Subject: RE: RE: Questioning Traditional Data Management
scottwambler
Send Email Send Email
 
--- Curt Sampson <yahoo@...> wrote:

> On Thu, 24 May 2007, Jennifer Riefenberg wrote:
>
> > Another reason is that some of our development db
> environments have NO
> > CONSTRAINTS/RI so that the developers can have
> their junit databases
> > refreshed quickly (constraints, triggers, etc. add
> a lot of overhead
> > when re-setting the database for the next test
> suite).
>
> I've found that starting a new transaction before
> every unit test and
> rolling it back when the unit test is complete works
> very efficiently
> for me. Of course, then you do need to move tests
> that need to do
> multiple transactions into a different part of the
> "test world," as it
> were, but there are often a lot fewer of those, and
> they tend to take
> longer to run, anyway.

And there's something to be said for smoking the test
database every so often and building it from scratch.
All it takes is one failed transaction that goes
undetected and you've got a problem.

See
http://www.agiledata.org/essays/databaseTesting.html#SettingUpDatabaseTests
for some thoughts.

- Scott

Scott W. Ambler
Practice Leader Agile Development, IBM Methods Group
http://www-306.ibm.com/software/rational/bios/ambler.html


       Ask a question on any topic and get answers from real people. Go to Yahoo!
Answers and share what you know at http://ca.answers.yahoo.com

#1723 From: Scott Ambler <scottwambler@...>
Date: Fri May 25, 2007 12:14 pm
Subject: RE: RE: Questioning Traditional Data Management
scottwambler
Send Email Send Email
 
--- Jennifer Riefenberg <jennifer@...>
wrote:

> We have been working in Agile for the last 3+ years
> and do not test the
> table structures - no, it has not been found
> worthwhile as once the table is
> "in production" these things do not change in the
> same way that code does,
> db changes are then refactorings.  I guess in some
> environments it could be
> worthwhile, however, I am not sure where.  Stored
> procedures, etc., yes, are
> tested, but the actual, physical structures, no.

Do you have Junit test which validate the ability to
save entities into the db, read them, ...  ?  If so,
you're implicitly validating that the table structures
are in place.  Granted, the tests are just for the
table structures, that would be trivial.  They're
likely looking at a bigger picture that supports some
form of actual usage that's important to your end
users.



> Another reason is that some of our development db
> environments have NO
> CONSTRAINTS/RI so that the developers can have their
> junit databases
> refreshed quickly (constraints, triggers, etc. add a
> lot of overhead when
> re-setting the database for the next test suite).

So, some of the things that you would want to test for
aren't being implemented in the DB.

> If we have problems with
> the defaults/nullability/etc. - it is in the code
> and is caught by our other
> testing methods.

So you're testing at the level that you're
implementing it at.

Here's some questions for you:
If you implemented nullability/defaults/... in the
database would you write a test for it?
Would you want to be able to run those tests in a test
suite, say similar to JUnit, that runs in the DB?

- Scott


Scott W. Ambler
Practice Leader Agile Development, IBM Methods Group
http://www-306.ibm.com/software/rational/bios/ambler.html

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

#1724 From: Andrew Gregovich <andrew_gregovich@...>
Date: Fri May 25, 2007 11:46 am
Subject: Re: Questioning Traditional Data Management
andrew_grego...
Send Email Send Email
 
> You also dont know if people are interprating data differently or if the
> figures are different because of "bad" data in that situation.  So the
> knowledge you gain could be false, built upon same interpretation of
> different data instead of different interpretation of the same data.



As soon as there is contradictory information coming from 2 separate
departments, you can't even say there's any knowledge - you only know that there
is a problem. To counter this, you need to do post-implementation analysis and
changes which are usually huge in terms of cost and risk. The common way out of
this (ultimately preventable) situation is to build a common data-warehouse, but
their consolidation routines, although often complicated, may be able to only
mask the issues instead of fixing them.











________________________________________________________________________________\
____Building a website is a piece of cake. Yahoo! Small Business gives you all
the tools to get online.
http://smallbusiness.yahoo.com/webhosting

[Non-text portions of this message have been removed]

#1725 From: "Garris, Nicole" <Nicole.Garris@...>
Date: Fri May 25, 2007 2:18 pm
Subject: RE: RE: Questioning Traditional Data Management
nicgarris
Send Email Send Email
 
By that "automated tool that can .. inspect the DDL", do you mean the
DBMS's SQL engine/SQL processor?



Of course I inspect the results of the DDL using whatever facilities the
DBMS or other tools provide. The developer also verifies my work by
checking the results of the DDL and writing code against the database.



Here's what I don't do:

1. I don't write tests. (They would all say the same thing-verify the
column names, data types, nullability, constraints, keys, indexes  ...
against the specification.)

2. I don't run the DDL multiple times with different inputs and verify
the outputs for each case, because that wouldn't make sense-the input is
static, and the output should not vary!



Now, if I worked in an environment where we did database builds on a
regular basis, then I might write a verification for the build (verify
the column names, data types, nullability, constraints ...). This would
eliminate the human error from the verification plus make it a lot
faster to execute (however it would take up-front time to write it).
However it still wouldn't be a test, it would be a verification
script-wouldn't it?



Unfortunately, I don't work in an agile environment. Ya gotta work with
what ya got.



________________________________

From: agileDatabases@yahoogroups.com
[mailto:agileDatabases@yahoogroups.com] On Behalf Of Curt Sampson
Sent: Thursday, May 24, 2007 5:04 PM
To: agileDatabases@yahoogroups.com
Subject: Re: [agileDatabases] RE: Questioning Traditional Data
Management



On Thu, 24 May 2007, Garris, Nicole wrote:

> Programs are code which can be executed.

And your DDL is not code that's executed?

> I inspect the physical database structures (tables, indexes,
> constraints, etc.) to ensure I didn't make a mistake coding the DDL.

Make a mistake doing _what_ the DDL? Given your previous implication
that DDL is not code, I find your choice of verb interesting.

As for inspecting the DDL for errors, you already have an automated tool
that can, much more rapidly and accurately than you, inspect the DDL
for many kinds of errors, especially syntax errors and certain kinds of
inconsistency.

With a little help, you can use that tool to inspect for even more types
of errors. Once you start down this route, you find yourself (well,
all right, I find myself) doing pretty much the same sort of thing as
programmers do with unit tests.

> Sorry, but programs and data sources ARE different. Only one is
> executable.

Indeed. But, we're not talking about testing the data source; we're
talking about testing the specification for the data source's behaviour,
which, once you get about the relatively minor details of what sort of
language in which you're writing the specification, is no different from
a specification for an application program's behaviour.

As for your later question of whether or not it's worth it: I do it
because when I put on my business owner's hat, it increases the quality
of my product, saves me money and makes me more competitive.

One way to appreciate how this could work for you might be to take an
extreme programming (XP) workshop along the lines of the ones run by
Industrial Logic or whomever. (A better way would be to spend a couple
of months working in a good XP project, but that opportunity can be more
difficult to find.)

One last thing to think about: as both an expert programmer and
an expert DBA, I find the two roles to be identical: they're both
just development of computer software. Someone who's "only" a Java
programmer, or "only" a DBA is a developer who's merely familiar
with only one small area of the science, art and craft of software
development, and possibly someone who's not willing to learn more about
software development than their own narrow area.

cjs
--
Curt Sampson <cjs@... <mailto:cjs%40cynic.net> > +81 90 7737 2974
http://www.starling-software.com <http://www.starling-software.com>
The power of accurate observation is commonly called cynicism
by those who have not got it. --George Bernard Shaw



[Non-text portions of this message have been removed]

#1726 From: Scott Ambler <scottwambler@...>
Date: Fri May 25, 2007 2:29 pm
Subject: Governance was RE: Questioning Traditional Data Management
scottwambler
Send Email Send Email
 
Responding to several at once, sorry for any
confusion.

- Scott
--- Sigur?ur Jonsson <sigjons@...> wrote:

> IMHO, one of the biggest problems in deep
> hierarchical structures is
> dispersion of knowledge across layers, of those
> small details which are not
> visible to your manager. In politics-ridden
> organizations (basically every
> medium or large company) I find that there is way
> too much energy spent on
> in-fighting, which can only be avoided by presenting
> facts, which is essentially data.

Sounds like an assumption.  You can also choose to
collaborate effectively.  Perhaps the reason why there
are so many politics is because of the organizational
structure that was chosen by your company?  Perhaps if
you choose a more organic structure the politics would
be reduced?

<snip>
> On the other hand, the top management does not
> have the time required to understand the differences
> why Dept A does it this
> way and Dept B does it that way and why those 2
> sources contradict each other.

Perhaps that's an indication that the "problem" isn't
as severe as some data professionals make it out to
be?  Sounds like what you're saying is that the ROI to
resolve the differences isn't great enough to justify
the effort.

> Thus, you (as well as the other people who
> report to them) have to
> present them very concise and clear information and
> only then they can actually act upon it effectively.

There's one heck of an assumption.  My experience is
that people are flexible; if they know that the
information presented to them isn't perfect they can
still act on it effectively.  In fact, there's a
wealth of information out there on why management
never has perfect information in a timely manner nor
do they need it.  Pick up pretty much any management
book and you'll be able to find something to this
effect.  By the time you wait for perfect information
your competitive advantage is pretty much lost.


>
> You also dont know if people are interprating data
> differently or if the
> figures are different because of "bad" data in that
> situation.  So the
> knowledge you gain could be false, built upon same
> interpretation of
> different data instead of different interpretation
> of the same data.

That's why you need to talk with them and work closely
with them.  Management solely by the numbers has been
pretty much abandoned as a concept.

Also, it's human nature to interpret things
differently.  Different people have different
backgrounds and experiences.  They have different
expectations therefore they approach each situation
differently.  Even if you had perfect numbers
different people would still act on those numbers in
different manners.


--- Andrew Gregovich <andrew_gregovich@...>
wrote:

> As soon as there is contradictory information coming
> from 2 separate departments, you can't even say
> there's any knowledge - you only know that there is
> a problem.

This is a huge assumption.  Perhaps it's really an
advantage to look at things differently.  HSBC has a
wonder ad campaign where they show two pictures with
different captions.  Then they show the same two
pictures with the captions reversed.  The point is
that there's a competitive advantage to recognizing
that people have different perspectives and therefore
a "one truth fits all" approach might not be such a
good idea in many situations.

> To counter this, you need to do
> post-implementation analysis and changes which are
> usually huge in terms of cost and risk.

Another assumption.  Refactoring can be very
straightforward if you choose to get good at it.  My
experience is that the organizations that find that
refactoring to be cost costly and risky have assumed
it would be and then put a process in place which made
it so.  Self-fulfilling prophecies have a tendency of
working out.

Having said that, you probably want to do a bit of
thinking up front, see
www.agilemodeling.com/essays/amdd.htm , but nowhere
near as much as the professional modelers among us
would lead us to believe.  Doing high-level modeling
early on an agile project is a very common practice
(see my upcoming August DDJ column for the numbers on
this).

>The common
> way out of this (ultimately preventable) situation
> is to build a common data-warehouse, but their
> consolidation routines, although often complicated,
> may be able to only mask the issues instead of
> fixing them.

Exactly.  You need to fix problems at the source.  I
hear a lot of discussion about the need to do this
within the traditional data management community but
never any concrete strategies for doing so (at least
not strategies which haven't already failed for them
many times over).  We need to rethink our approach to
data management, but I really don't see the DM
community doing that.

>
>
> For me, data governance means:
> 1) Setting standards/conventions in terms of
> data-modelling, formatting and
> such (assuming that they are experts in those
> areas), and

At www.enterpriseunifiedprocess.com and
www.agiledata.org I describe collaborative strategies
for doing this.  The problem that I've seen again and
again is that organizations see governance as a
command-and-control type of thing.  They want to
enforce standards and regulations.  This is akin to
herding cats, you just can't do it.  A better way is
to take a collaborative approach where you motivate
and enable the proper behavior.  This is akin to
leading cats (grab a piece of raw fish and cats will
follow you wherever you want to go).

Last Fall I did a couple of surveys around data
management and data quality (see
www.ambysoft.com/surveys/).  One of the results was
that having data naming conventions in place was
co-related with higher quality data in production.  No
surprise.  What was interesting was that having data
naming conventions that developers wanted to follow
was corelated with greater levels of quality than
conventions that were enforced.  In short, a
collaborative approach seems to work better in
practice than a command-and-control approach.  Might
not be something that the data politicians want to
hear as it sort of brings into question the need for
the empires which they've built up over the years.


> 2) Ensuring that data can be consolidated and
> presented to management in a
> consistent and meaningful manner

To do that effectively you would actually need to
understand how people want to use the data in
practice.  Understanding just the data isn't
sufficient, understanding the potential usage of a
system and the supporting data might be.

>
> Regarding 1), I agree with Scott that software
> developers should not be
> constrained by the traditional waterfall and
> bureaucratic processes, but at
> some stages in the lifecycle the data people should
> do a QA on the
> developers and slap their wrists if developers do

Incredibly huge "command-and-control" style of
assumption.  It's more effective to work together
collaboratively, not to try to inspect and punish.
It's far more effective to train and mentor developers
in data techniques, Curt had some pretty good words to
that effect in another email.  But that unfortunately
would put the data political empire at risk, wouldn't
it?


> Regarding 2) I remember that Scott previously
> mentioned that he wasn't a
> great fan of the "single version of the truth".

My article on that subject is posted at
http://www.agiledata.org/essays/oneTruth.html

I think that I'm pretty clear that in some situations
it makes sense to strive for the one truth.  But you
need to be smart about it.  Flexibility is important,
because the "one truth" might not be exact.  And, the
"one truth" is a moving target anyway (those pesky
stakeholders keep changing their minds) so you're
still going to want a practical way to evolve your
database schemas.

My experience is that a spectacular amount of
bureaucracy is justified within organizations in order
to seek the one truth, and that this effort rarely
seems to provide any sort of payback when the actual
costs are taken into account. When you shine the "ROI
light" on many traditional data management activities
they rarely seem to make sense in practice.

> Here I somewhat disagree,
> since there is always some crucial data which must
> tally across the
> enterprise. The best example is money - if you don't
> get your accounts to
> balance, it's not only that it could cause huge
> problems both internally and
> externally (i.e. in terms of legal
responsibilities).

So pick your battles wisely.  For the few data
elements where the "one truth" is critical then do so.
  I suspect that you'll discover it makes sense in a
lot fewer situations than what the traditional DM
folks will claim.

<snip>
> but to me it's clear that
> a certain level of
> evolution of the data management community is
> necessary for it to survive
> and thrive.

I couldn't have said it better myself.  Unfortunately,
as long as the DM community clings to its false
assumptions I suspect it will never be in a position
to do so.

- Scott


Scott W. Ambler
Practice Leader Agile Development, IBM Methods Group
http://www-306.ibm.com/software/rational/bios/ambler.html

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

#1727 From: Scott Ambler <scottwambler@...>
Date: Sun May 27, 2007 10:21 pm
Subject: RE: RE: Questioning Traditional Data Management
scottwambler
Send Email Send Email
 
--- "Garris, Nicole" <Nicole.Garris@...> wrote:

>
> Now, if I worked in an environment where we did
> database builds on a
> regular basis, then I might write a verification for
> the build (verify
> the column names, data types, nullability,
> constraints ...).

What real-world value do you think you'd get from
tests that verified column names and data types?

Validating nullability and constraints makes a lot of
sense to me because they reflect business rules.

-Scott

Scott W. Ambler
Practice Leader Agile Development, IBM Methods Group
http://www-306.ibm.com/software/rational/bios/ambler.html


       Get news delivered with the All new Yahoo! Mail.  Enjoy RSS feeds right on
your Mail page. Start today at http://mrd.mail.yahoo.com/try_beta?.intl=ca

#1728 From: "Garris, Nicole" <Nicole.Garris@...>
Date: Tue May 29, 2007 2:48 pm
Subject: RE: Questioning Traditional Data Management
nicgarris
Send Email Send Email
 
Scott, you seem to be saying that testing to verify column names and
data types rarely makes sense.



This thread has been very interesting and helpful to me. My first,
gut-level reaction to the phrase "database regression testing" was that
it would add little, if any, value, and the cost of creating the tests
would greatly outweigh the value added. This is because I wrongly
assumed that "database regression testing" would test the results of the
DDL. Now that I understand better what you mean by the phrase, and in
what software development context it would be used, it makes a lot more
sense to me. What is meant is really something more like "database
constraint/data regression testing", correct?



________________________________

From: agileDatabases@yahoogroups.com
[mailto:agileDatabases@yahoogroups.com] On Behalf Of Scott Ambler
Sent: Sunday, May 27, 2007 3:22 PM
To: agileDatabases@yahoogroups.com
Subject: RE: [agileDatabases] RE: Questioning Traditional Data
Management




--- "Garris, Nicole" <Nicole.Garris@...
<mailto:Nicole.Garris%40dof.ca.gov> > wrote:

>
> Now, if I worked in an environment where we did
> database builds on a
> regular basis, then I might write a verification for
> the build (verify
> the column names, data types, nullability,
> constraints ...).

What real-world value do you think you'd get from
tests that verified column names and data types?

Validating nullability and constraints makes a lot of
sense to me because they reflect business rules.

-Scott

Scott W. Ambler
Practice Leader Agile Development, IBM Methods Group
http://www-306.ibm.com/software/rational/bios/ambler.html
<http://www-306.ibm.com/software/rational/bios/ambler.html>






[Non-text portions of this message have been removed]

#1729 From: "Jennifer Riefenberg" <jennifer@...>
Date: Tue May 29, 2007 3:33 pm
Subject: RE: RE: Questioning Traditional Data Management
dba401k
Send Email Send Email
 
Some of the junits tests actually put data into the database and many do not
- many of the junits use mock objects to avoid the database.  We do not seem
to get enough testing of the data actually getting stored in the database -
I have not been entirely convincing to the team in getting them to add these
components to their individual junit tests - plus we have run into Hibernate
"idiosyncracies" that have really messed with our data. We do rely on
Hibernate to do things correctly for data. The good news is that we have
normally caught any data problems later on down the road with other testing,
however, we have had some make it through to production, which we all agree
is not a good thing!  Another aspect of our situation is that we have a very
complex data model (in part a large legacy and in part, a complex business)
and we refresh each and every build/check-in to pre-populate the database
with the "seed" data needed for any level of testing.  We do have a clean
start each time, then.

Our other levels of testing (fitnesse, canoo, watir, etc. do include
database constraints and RI) - we just live with the longer refresh times
for our full-test-suite builds, which run throughout the day, but not each
and every check-in.  We have found that this is how we can best manage our
resources.

I am not opposed to testing in the db, but have not found the right tools, I
guess, to make ddl statements worth testing.  The db itself does a pretty
good job of finding datatype/size/ri errors that may occur, so I do rely on
it.  Something "like" a junit sort of testing tool would be a good thing to
consider in many situations, but I just haven't seen anything that really
works in our situation, yet.



As someone else pointed out, though, testing views can be done and we do
test them - just like db code.



Jennifer Riefenberg, DBA

  <mailto:Jennifer@...> Jennifer@...

   _____

From: agileDatabases@yahoogroups.com [mailto:agileDatabases@yahoogroups.com]
On Behalf Of Scott Ambler
Sent: Friday, May 25, 2007 6:15 AM
To: agileDatabases@yahoogroups.com
Subject: RE: [agileDatabases] RE: Questioning Traditional Data Management




--- Jennifer Riefenberg <jennifer@eplanservi
<mailto:jennifer%40eplanservices.com> ces.com>
wrote:

> We have been working in Agile for the last 3+ years
> and do not test the
> table structures - no, it has not been found
> worthwhile as once the table is
> "in production" these things do not change in the
> same way that code does,
> db changes are then refactorings. I guess in some
> environments it could be
> worthwhile, however, I am not sure where. Stored
> procedures, etc., yes, are
> tested, but the actual, physical structures, no.

Do you have Junit test which validate the ability to
save entities into the db, read them, ... ? If so,
you're implicitly validating that the table structures
are in place. Granted, the tests are just for the
table structures, that would be trivial. They're
likely looking at a bigger picture that supports some
form of actual usage that's important to your end
users.

> Another reason is that some of our development db
> environments have NO
> CONSTRAINTS/RI so that the developers can have their
> junit databases
> refreshed quickly (constraints, triggers, etc. add a
> lot of overhead when
> re-setting the database for the next test suite).

So, some of the things that you would want to test for
aren't being implemented in the DB.

> If we have problems with
> the defaults/nullability/etc. - it is in the code
> and is caught by our other
> testing methods.

So you're testing at the level that you're
implementing it at.

Here's some questions for you:
If you implemented nullability/defaults/... in the
database would you write a test for it?
Would you want to be able to run those tests in a test
suite, say similar to JUnit, that runs in the DB?

- Scott

Scott W. Ambler
Practice Leader Agile Development, IBM Methods Group
http://www-306. <http://www-306.ibm.com/software/rational/bios/ambler.html>
ibm.com/software/rational/bios/ambler.html

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail. <http://mail.yahoo.com> yahoo.com





[Non-text portions of this message have been removed]

#1730 From: "Bill Lewis" <datamodel@...>
Date: Wed May 30, 2007 1:43 am
Subject: Reponses to “Questioning Traditional Data Management"
blewis02
Send Email Send Email
 
I feel compelled to respond to several assertions in Scott W.
Ambler's "Questioning Traditional Data Management", in Dr. Dobb's
Portal, May 22 2007. In this column, Scott, as is typical, when
expounding upon "data professionals" (a stereotype, of the non-UML
kind), gets a few things quite wrong and one or two things on the
mark. Below are quotes from Scott (SA), and my responses (BL).

SA: The primary reason for struggling to effectively address data
quality problems is because data management groups have based their
processes on assumptions which prove to be questionable at best and
downright false at worst.

BL: "Data management groups" do not have responsibility for
originating nor managing data content—-business users do. Data
professionals, in fact, usually encounter significant resistance when
recommending the implementation of data integrity in the
database, "for performance reasons", especially.

SA: When data professionals first hear about refactoring they often
profess that it's a great idea for small databases, which it is, but
that it isn't realistic for "large databases" due to the sheer volume.

BL: My experience has been that data professionals, to a fault, bend
over backwards to accommodate data model change requests from
developers, yet encounter resistance to their own refactoring
recommendations due to the "unacceptable" impact these changes would
have on software that was developed based on the current model. "Oh,
no, you can't change that table/column name/datatype/length, we'd
have to change <enter a number> of <enter a type of program code>,
and then test it all again".

SA: Detailed up-front modeling actually proves to incredibly risky in
practice because people become committed to their original design and
are either unwilling or unable to change strategies later on.

BL: I agreed about problems created by doing "detailed" modeling
immediately (Big Data Up Front). One thing that data supports quite
well is abstraction, and models should start as abstract
(or "conceptual") early on.  In my experience, the "people" who
become most committed to a detailed design are usually the developers
who write code that's immutably dependent on a "frozen" design.

SA: We see extra columns, tables, and views which actually detract
from the quality of the design, and existing columns and tables being
used for purposes other than originally intended.

BL: Who makes the decisions to use columns and tables differently
from how they were designed? Who determines how they are "used"?
Software and business people, of course—they're the users of the
database.

SA: A high-level model enables you to act on what you know by
identifying a likely direction to go in, yet puts you in a position
where your design can evolve over time based on your improving
understanding of the domain.

BL: Absolutely, wholeheartedly agreed. This is indeed an area that
deserves significant focus. Entities and relationships (e.g., primary
and foreign keys) need to be identified and agreed to early on—-in
fact, before a line of code is written/generated. This foundation of
functional dependency and order of precedence is what the team needs
to get right early on, because that's what's really difficult to
change later. Details of individual attributes can be added, moved
and/or changed as the application evolves.

SA: There is a consistent belief among data professionals that the
information model pervades the overall system, which is completely
true. But then again security issues also pervade the overall system,
as does usage, as does functionality, as does... you get the point.
Taking a "data-driven" approach is a preference, but from the state
of data quality within the industry it doesn't appear to be a very
good one. Most modern methodologies tend to promote a usage-driven
approach.

BL: Once again, it's completely unfair to attribute the overall state
of data quality to data professionals, or to a "data-driven
approach". A data-driven approach IS preferable because data indeed
does pervade any significant business system. Software exists to
maintain and expose data. Everything else—security, usage,
functionality—is dependent on the data. No data? Then no need for
security, nothing to be used (screens with blank fields?), no need
for any functions that use the data.

SA: The plethora of data quality books and papers rarely mention
database testing let alone discuss it coherently. This is a huge
blind spot within the data community.

BL: Surely you've heard of data profiling. Lots has been written
about it; several companies specializing in it have been started,
merged, sold, etc.—-quite a market. But again, the data content is
not originated by data professionals...although they should be
actively involved in, if not responsible for, generating and managing
test data.

SA: Considering the typical relationship that data management groups
have with development teams I believe that there is very little
chance that they will succeed at data governance.

BL: Unfortunately I would agree. It's also unfortunate that most of
the evangelizing about data governance has had to originate within
the data management community rather than the business community,
where it really belongs. Case in point: DBAs don't govern the general
ledger.

SA: Lack-lustre performance of the traditional data community during
the past three decades...As the agile community has clearly shown
over the past few years these assumptions don't seem to hold water in
practice.

BL: Broad generalizations such as these futile to try to disprove—-or
to prove, for that matter. I'd srongly suggest that "Data-driven"
and "Agile" approaches to application engineering are by no means
mutually exclusive—-in fact, taking data seriously can significantly
increase development agility. Watch for my article in the July issue
of The Data Administration Newsletter (www.tdan.com) for more details.

Bill Lewis
Sr. IT Specialist
Global Business Services
IBM Corporation
lewisw@...

#1731 From: Scott Ambler <scottwambler@...>
Date: Wed May 30, 2007 11:10 am
Subject: Re: RE: Questioning Traditional Data Management
scottwambler
Send Email Send Email
 
--- "Garris, Nicole" <Nicole.Garris@...> wrote:

> Scott, you seem to be saying that testing to verify
> column names and
> data types rarely makes sense.

It depends on the situation.  Having a test to
validate a column name or data type seems really
trivial to me when:
1. A test that saves an entity in effect validates the
names/types of all columns that it's writing to.
2. Naming conventions could be validated via some sort
of static code analysis tool.

However, if it makes sense in your situation to write
such a test then do so.


>
> This thread has been very interesting and helpful to
> me. My first,
> gut-level reaction to the phrase "database
> regression testing" was that
> it would add little, if any, value, and the cost of
> creating the tests
> would greatly outweigh the value added. This is
> because I wrongly
> assumed that "database regression testing" would
> test the results of the
> DDL.

Exactly.  There's value in testing some of the more
interesting aspects of the DDL, such a constraint
definitions and triggers, but not the static structure
of the tables.

> Now that I understand better what you mean by
> the phrase, and in
> what software development context it would be used,
> it makes a lot more
> sense to me. What is meant is really something more
> like "database
> constraint/data regression testing", correct?

There's more to it than testing constraints.  My brief
list at
http://www.agiledata.org/essays/databaseTesting.html#Figure1
shows that.  My next newsletter will go into DB
testing in more detail seeing as it's not as clear as
I thought.

I think that one of the reasons why you're confused is
that this is a new idea within the data community.
There's been an underlying assumption within that
community for several decades that testing is
something that others do (i.e. QA folks).  This
reflects their penchant for over specialization and a
serial approach towards development.

As a result of the virtual absense of testing
discussion within the data management community we've
seen quality in production databases very clearly
suffer.

- Scott

Scott W. Ambler
Practice Leader Agile Development, IBM Methods Group
http://www-306.ibm.com/software/rational/bios/ambler.html


       Get news delivered with the All new Yahoo! Mail.  Enjoy RSS feeds right on
your Mail page. Start today at http://mrd.mail.yahoo.com/try_beta?.intl=ca

#1732 From: "Garris, Nicole" <Nicole.Garris@...>
Date: Wed May 30, 2007 2:20 pm
Subject: RE: Questioning Traditional Data Management
nicgarris
Send Email Send Email
 
My response to the partial post which I have included below:

I can't speak for the "data management community". Speaking for myself:
(1) Testing is an important part of the software development process.
I've never assumed it can't be applied to the database. (2) I personally
think the serial approach to development is flawed and I've adjusted my
DBA work processes accordingly. I always tell the developers I work
with, let's sit down and design a first draft database and then go from
there. I assume (and I know) that there will be changes. Frankly, I've
been amazed at how few changes there are.

Overspecialized? That's probably not for me to judge. But again isn't
this a rather broad generalization? No, I'm not a developer. They hired
me into this position exactly because I'm a "data management"
professional and not a developer. That doesn't make me "one of the bad
guys". It just makes me one of "them".

--------------------------------------------------------------

Nicole wrote:


> Now that I understand better what you mean by
> the phrase, and in
> what software development context it would be used,
> it makes a lot more
> sense to me. What is meant is really something more
> like "database
> constraint/data regression testing", correct?





Scott wrote:

There's more to it than testing constraints. My brief
list at
http://www.agiledata.org/essays/databaseTesting.html#Figure1
<http://www.agiledata.org/essays/databaseTesting.html#Figure1>
shows that. My next newsletter will go into DB
testing in more detail seeing as it's not as clear as
I thought.

I think that one of the reasons why you're confused is
that this is a new idea within the data community.
There's been an underlying assumption within that
community for several decades that testing is
something that others do (i.e. QA folks). This
reflects their penchant for over specialization and a
serial approach towards development.

As a result of the virtual absense of testing
discussion within the data management community we've
seen quality in production databases very clearly
suffer.

- Scott

Scott W. Ambler
Practice Leader Agile Development, IBM Methods Group
http://www-306.ibm.com/software/rational/bios/ambler.html
<http://www-306.ibm.com/software/rational/bios/ambler.html>






[Non-text portions of this message have been removed]

#1733 From: "Will Sargent" <will_sargent@...>
Date: Wed May 30, 2007 7:06 pm
Subject: Re: RE: Questioning Traditional Data Management
will_sargent
Send Email Send Email
 
I'd really like to see some examples, or (even better) a working regression
test suite.

Will.

On 5/29/07, Garris, Nicole <Nicole.Garris@...> wrote:
>
>   Scott, you seem to be saying that testing to verify column names and
> data types rarely makes sense.
>
> This thread has been very interesting and helpful to me. My first,
> gut-level reaction to the phrase "database regression testing" was that
> it would add little, if any, value, and the cost of creating the tests
> would greatly outweigh the value added. This is because I wrongly
> assumed that "database regression testing" would test the results of the
> DDL. Now that I understand better what you mean by the phrase, and in
> what software development context it would be used, it makes a lot more
> sense to me. What is meant is really something more like "database
> constraint/data regression testing", correct?
>
> ________________________________
>
> From: agileDatabases@yahoogroups.com <agileDatabases%40yahoogroups.com>
> [mailto:agileDatabases@yahoogroups.com <agileDatabases%40yahoogroups.com>]
> On Behalf Of Scott Ambler
> Sent: Sunday, May 27, 2007 3:22 PM
> To: agileDatabases@yahoogroups.com <agileDatabases%40yahoogroups.com>
> Subject: RE: [agileDatabases] RE: Questioning Traditional Data
> Management
>
> --- "Garris, Nicole" <Nicole.Garris@...<Nicole.Garris%40dof.ca.gov>
> <mailto:Nicole.Garris%40dof.ca.gov> > wrote:
>
> >
> > Now, if I worked in an environment where we did
> > database builds on a
> > regular basis, then I might write a verification for
> > the build (verify
> > the column names, data types, nullability,
> > constraints ...).
>
> What real-world value do you think you'd get from
> tests that verified column names and data types?
>
> Validating nullability and constraints makes a lot of
> sense to me because they reflect business rules.
>
> -Scott
>
> Scott W. Ambler
> Practice Leader Agile Development, IBM Methods Group
> http://www-306.ibm.com/software/rational/bios/ambler.html
> <http://www-306.ibm.com/software/rational/bios/ambler.html>
>
> [Non-text portions of this message have been removed]
>
>
>


[Non-text portions of this message have been removed]

#1734 From: "Will Sargent" <will_sargent@...>
Date: Wed May 30, 2007 6:56 pm
Subject: Fwd: Experience with Unitils?
will_sargent
Send Email Send Email
 
Hi there,

I'm looking at implementing a database testing framework.  I work in a
Spring / Hibernate / Oracle shop, and most of our code uses a DAO framework
to interact with Hibernate.

I've added a few unit tests using DbUnit, but I'm not really very impressed
with it.  DbUnit has to be told to ignore version numbers, creation dates,
needs to be passed in the schema name, and doesn't understand Flashback
tables aren't real.  And I'm not thrilled about keeping XML files all over
the place (there's probably some good way to organize them, but I haven't
found it yet).

I've read about something called Unitils, and it looks promising.  At least
it looks more in line with what I'd like to be doing, but I'm leery of
trying something that only now seems to be popping its head over the
horizon.  Anyone got any experience with it?

http://unitils.sourceforge.net/summary.html

Will.


[Non-text portions of this message have been removed]

#1735 From: "Simon Jones" <simon@...>
Date: Wed May 30, 2007 7:33 pm
Subject: Re: Questioning Traditional Data Management
simontcb
Send Email Send Email
 
I agree that a little early modelling, a bit of 3nf etc. actually
leads to relatively little change, only addition.

But then again I think Scott has always recommended a little design
up-front? Unless I missed something.

But as for over-specilization that is definately a problem, although
not one confined to the data arena. Over-specialisation is a problem
throughout. It would be better if we were all a bit more 'jack-of-
all' IMHO.

Simon

--- In agileDatabases@yahoogroups.com, "Garris, Nicole"
<Nicole.Garris@...> wrote:
>
> My response to the partial post which I have included below:
>
> I can't speak for the "data management community". Speaking for
myself:
> (1) Testing is an important part of the software development
process.
> I've never assumed it can't be applied to the database. (2) I
personally
> think the serial approach to development is flawed and I've
adjusted my
> DBA work processes accordingly. I always tell the developers I work
> with, let's sit down and design a first draft database and then go
from
> there. I assume (and I know) that there will be changes. Frankly,
I've
> been amazed at how few changes there are.
>
> Overspecialized? That's probably not for me to judge. But again
isn't
> this a rather broad generalization? No, I'm not a developer. They
hired
> me into this position exactly because I'm a "data management"
> professional and not a developer. That doesn't make me "one of the
bad
> guys". It just makes me one of "them".
>
> --------------------------------------------------------------
>
> Nicole wrote:
>
>
> > Now that I understand better what you mean by
> > the phrase, and in
> > what software development context it would be used,
> > it makes a lot more
> > sense to me. What is meant is really something more
> > like "database
> > constraint/data regression testing", correct?
>
>
>
>
>
> Scott wrote:
>
> There's more to it than testing constraints. My brief
> list at
> http://www.agiledata.org/essays/databaseTesting.html#Figure1
> <http://www.agiledata.org/essays/databaseTesting.html#Figure1>
> shows that. My next newsletter will go into DB
> testing in more detail seeing as it's not as clear as
> I thought.
>
> I think that one of the reasons why you're confused is
> that this is a new idea within the data community.
> There's been an underlying assumption within that
> community for several decades that testing is
> something that others do (i.e. QA folks). This
> reflects their penchant for over specialization and a
> serial approach towards development.
>
> As a result of the virtual absense of testing
> discussion within the data management community we've
> seen quality in production databases very clearly
> suffer.
>
> - Scott
>
> Scott W. Ambler
> Practice Leader Agile Development, IBM Methods Group
> http://www-306.ibm.com/software/rational/bios/ambler.html
> <http://www-306.ibm.com/software/rational/bios/ambler.html>
>
>
>
>
>
>
> [Non-text portions of this message have been removed]
>

#1736 From: Scott Ambler <scottwambler@...>
Date: Thu May 31, 2007 9:34 pm
Subject: Re: Re: Questioning Traditional Data Management
scottwambler
Send Email Send Email
 
--- Simon Jones <simon@...> wrote:

> I agree that a little early modelling, a bit of 3nf
> etc. actually
> leads to relatively little change, only addition.
>
> But then again I think Scott has always recommended
> a little design
> up-front? Unless I missed something.

Definitely.  See www.agilemodeling.com/essays/amdd.htm


A little bit of design up front enalbes you to think
the bigger issues through and thereby avoid some
common, and often serious, mistakes.  This is actually
a common thing for agilists to do.  In the August
issue of DDJ my column will summarize the findings of
the 2007 Agile Adoption Survey where I asked about the
practices of agile developers.  Seems we're doing a
bit more modeling than what some people would lead you
to believe.  However, nowhere near as much as the
traditionalists will tell you that you need.

- Scott

Scott W. Ambler
Practice Leader Agile Development, IBM Methods Group
http://www-306.ibm.com/software/rational/bios/ambler.html


       Be smarter than spam. See how smart SpamGuard is at giving junk email the
boot with the All-new Yahoo! Mail at http://mrd.mail.yahoo.com/try_beta?.intl=ca

#1737 From: Scott Ambler <scottwambler@...>
Date: Thu May 31, 2007 10:24 pm
Subject: Re: Reponses to “Questioning Traditional Data Management"
scottwambler
Send Email Send Email
 
--- Bill Lewis <datamodel@...> wrote:
<snip>
>
> SA: The primary reason for struggling to effectively
> address data
> quality problems is because data management groups
> have based their
> processes on assumptions which prove to be
> questionable at best and
> downright false at worst.
>
> BL: "Data management groups" do not have
> responsibility for
> originating nor managing data content—-business
> users do.

I didn't realize that data management groups weren't
responsible for managing data.  ;-)


> Data
> professionals, in fact, usually encounter
> significant resistance when
> recommending the implementation of data integrity in
> the database, "for performance reasons", especially.


As I've written about extensively in Agile Database
Techniques and online there are many options for
implementing data integrity.  Very often the database
proves to be the best option for doing it, but not
always.


>
> SA: When data professionals first hear about
> refactoring they often
> profess that it's a great idea for small databases,
> which it is, but
> that it isn't realistic for "large databases" due to
> the sheer volume.
>
> BL: My experience has been that data professionals,
> to a fault, bend
> over backwards to accommodate data model change
> requests from developers,

I hear that claim a lot from data professionals, but
in practice I discover that it's not the case.  Taking
an agile data approach the agile DBA is "embedded" in
the team and is an active member of that team.  So
when someone needs to change the database schema they
work with that person, or someone else with those
skills, and make the change right then and there.  The
change is made on the order of minutes.  Is that the
level of service that you're talking about? If so,
great, if not, then perhaps there's more that you can
do.

In the data surveys that we did through DDJ last year
(see www.ambysoft.com/surveys/) we found that 2/3 of
respondents indicated that the development teams will
choose to go around the data group within their org.
Of those that do, 75% indicated that the reason why
was because the data group was either too slow, too
difficult to work with, or provided too little value
to the development team.  Apparently development teams
are concerned with the level of actual service that
they're getting from data groups.

> yet encounter resistance to their own
> refactoring
> recommendations due to the "unacceptable" impact
> these changes would
> have on software that was developed based on the
> current model.

Yes, many non-agile teams struggle to rework their
code, and frankly that's typical in my experience too.
  I don't know of any agile teams with this problem.


> "Oh,
> no, you can't change that table/column
> name/datatype/length, we'd
> have to change <enter a number> of <enter a type of
> program code>,
> and then test it all again".

Agilists test their code constantly.  Retesting is an
absolutely trivial thing for us to do.

Non-agile teams, on the other hand,....

<snip>
> BL: I agreed about problems created by doing
> "detailed" modeling
> immediately (Big Data Up Front). One thing that data
> supports quite
> well is abstraction, and models should start as
> abstract
> (or "conceptual") early on.  In my experience, the
> "people" who
> become most committed to a detailed design are
> usually the developers
> who write code that's immutably dependent on a
> "frozen" design.

Once again, definitely a problem on traditional teams.
  Not a problem within the agile community.  The
original article was written from the point of view of
agile development, not traditional development.  I
guess that wasn't clear in the article.


>
> SA: We see extra columns, tables, and views which
> actually detract
> from the quality of the design, and existing columns
> and tables being
> used for purposes other than originally intended.
>
> BL: Who makes the decisions to use columns and
> tables differently
> from how they were designed?

If the data management folks were actually managing
the databases effectively then I guess they're
responsible for the state of the databases.  If
someone else is doing this then the data management
folks have lost control and are clearly ineffectual.
Either way the DM folks don't seem to be up to speed.


> Who determines how they
> are "used"?
> Software and business people, of course—they're the
> users of the database.

Someone needs to have a viable and coherent strategy
for evolving the database over time.  This is why you
need to be good at refactoring, testing, redeployment,
...  If the databases are messed up then clearly such
a strategy isn't in place.

<snip>
> BL: Absolutely, wholeheartedly agreed. This is
> indeed an area that
> deserves significant focus. Entities and
> relationships (e.g., primary
> and foreign keys) need to be identified and agreed
> to early on—-in
> fact, before a line of code is written/generated.
> This foundation of
> functional dependency and order of precedence is
> what the team needs
> to get right early on, because that's what's really
> difficult to
> change later.

Actually, it's fairly easy to change later.  See
Refactoring Databases
(www.ambysoft.com/books/refactoringDatabases.html) or
Process of DB Refactoring
(http://www.agiledata.org/essays/databaseRefactoring.html).
   It's an assumption that it's difficult to evolve
database schemas.


> Details of individual attributes can
> be added, moved
> and/or changed as the application evolves.

Tables can be split, renamed, ... very easily too.


<snip>
>
> BL: Once again, it's completely unfair to attribute
> the overall state
> of data quality to data professionals,

Then what is it that Data Management people should be
held accountable for?  Seems to me from the title that
if they want to claim to be data managers that they
should be, well... , managing the data.

I think that the fundamental difference that we see
between agilists and traditionalists is that they
agilists have stepped up and accepted responsiblity
for quality.  We've adopted quality techniques such as
refactoring, TDD, pairing, ... because we've
discovered that they work incredibly well for us in
practice.  Now we're inviting the data community to
step up and do the same thing.  Perhaps the state of
data quality in production databases is in the state
that it's in because our expectations of the data
community have been so low for too long.  It's time to
raise the bar.

> or to a
> "data-driven
> approach". A data-driven approach IS preferable
> because data indeed
> does pervade any significant business system.

Security issues also pervade any significant business
system.  Shouldn't we take a security-driven approach
by that logic?

Usability issues also pervade any significant business
system.  Should we take a usability-driven approach by
that logic?

And so on?

> Software exists to
> maintain and expose data. Everything else—security,
> usage,
> functionality—is dependent on the data. No data?
> Then no need for
> security, nothing to be used (screens with blank
> fields?), no need
> for any functions that use the data.

No security?  Pretty soon the data isn't trustable.

No UI?  Can't get to the data.

No usability?  Doesn't really matter if the data is
there because it's not consumable.

No network?  Good luck connecting to the data sources.

Data is only one of many issues.


> BL: Surely you've heard of data profiling. Lots has
> been written
> about it; several companies specializing in it have
> been started,
> merged, sold, etc.—-quite a market. But again, the
> data content is
> not originated by data professionals...although they
> should be
> actively involved in, if not responsible for,
> generating and managing
> test data.
>

Yes, but that's not really testing.  That's more along
the lines of reviewing/inspecting the database
content.  As I indicated in a previous post, my next
newsletter will be on database testing because it's
clearly a foreign concept to many data professionals.

<snip>
> BL: Unfortunately I would agree. It's also
> unfortunate that most of
> the evangelizing about data governance has had to
> originate within
> the data management community rather than the
> business community,
> where it really belongs. Case in point: DBAs don't
> govern the general
> ledger.

IT Governance in general needs to come from the
business community.  Sadly, that's going to be a
struggle.

>
> SA: Lack-lustre performance of the traditional data
> community during
> the past three decades...As the agile community has
> clearly shown
> over the past few years these assumptions don't seem
> to hold water in
> practice.
>
> BL: Broad generalizations such as these futile to
> try to disprove—-or
> to prove, for that matter.

Actually, we've done a few surveys via DDJ and it's
reasonably clear that the data management community is
struggling in practice.  If you believe TDWI's
assertion that data quality problems is a $600Billion
a year issue for US organizations that might be
another sign that the data community has some room for
improvement.


> I'd srongly suggest that
> "Data-driven"
> and "Agile" approaches to application engineering
> are by no means
> mutually exclusive—-in fact, taking data seriously
> can significantly
> increase development agility. Watch for my article
> in the July issue
> of The Data Administration Newsletter (www.tdan.com)
> for more details.

At www.agiledata.org and this list we've been pretty
clear about that for years.

- Scott



       Get a sneak peak at messages with a handy reading pane with All new Yahoo!
Mail: http://mrd.mail.yahoo.com/try_beta?.intl=ca

#1738 From: "sunil_gur" <sunil_gur@...>
Date: Fri Jun 1, 2007 11:50 am
Subject: Large Database
sunil_gur
Send Email Send Email
 
Hi,

I am new to the group. I need a little help. We have been given a
Project whereby large no. of records (millions) are to be stored to MS-
SQL database. In this regard, I want to know how do we design the
database so that the performance of the system at the retrieval and
creation of new records does not get affected.

What is the efficient way of desiging a database. Please help

Messages 1709 - 1738 of 2744   Oldest  |  < Older  |  Newer >  |  Newest
Add to My Yahoo!      XML What's This?

Copyright © 2010 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines NEW - Help