Re: How to store 50 Terabytes per day?

  • From: "jaromir nemec" <jaromir@xxxxxxxxxxxx>
  • To: <oracle-l@xxxxxxxxxxxxx>
  • Date: Thu, 9 Sep 2004 21:52:51 +0200

Hi Tom,



> ETL would have to consist of inserts only once, no updates or deletes EVER



> Current-image data would have to be implemented as either views or
    materialized views, if desired...



I completely agree with this strategy but in my opinion there are two
problems to be solved:



a) the dealing with rolling window - e.g. the data are held for 7 years, but
the "current image" should contain complete data.  This is not a trivial
task , for example using materialized views a drop of an old partition
should have no influence to the materialized view (may be not so difficult)
and a complete refresh of a materialized view must remain possible (may be
more difficult).



b) the validation and cleaning make the "insert only once" more complicated.
I mean the enforcing of business rules beyond of primary and foreign keys.
Those actions can be moved in current-image views, but this is for sure not
the desired solution (validation will be repeated for each query).



regards



Jaromir D.B. Nemec



http://www.db-nemec.com

----- Original Message ----- 
From: <tim@xxxxxxxxxxxxx>
To: <oracle-l@xxxxxxxxxxxxx>
Sent: Wednesday, September 08, 2004 6:40 PM
Subject: Re: How to store 50 Terabytes per day?


A couple things are certain:

  - ETL would have to consist of inserts only once, no updates or deletes
EVER.
    This goes back to the design of the data model and the proper
understanding
    of time-variant data.  It's not hard, but so many people with experience
in the
    OLTP world can only think in terms of current-image data.
  - Current-image data would have to be implemented as either views or
    materialized views, if desired...
  - Along with the principle of insert only, partitioning by RANGE on a time
column
    will be crucial, with subpartitioning on a frequently-queried other
column being
    an important option to consider.  However, due to the volume of data,
one might
    well consider hourly partitions, which would consume almost the limit of
65,535
    partitions by itself if data retention is 7 years or so, making
subpartitioning not
    feasible...

Boy, would I love to get my hands on that database!  :-)

-Tim
--
To unsubscribe - mailto:oracle-l-request@xxxxxxxxxxxxx&subject=unsubscribe
To search the archives - //www.freelists.org/archives/oracle-l/



--
To unsubscribe - mailto:oracle-l-request@xxxxxxxxxxxxx&subject=unsubscribe 
To search the archives - //www.freelists.org/archives/oracle-l/

Other related posts: