Re: No to SQL? Anti-database movement gains steam

From: Nuno Souto <dbvision@xxxxxxxxxxxx>
To: Matthew Zito <mzito@xxxxxxxxxxx>
Date: Fri, 10 Jul 2009 19:05:47 +1000

Matthew Zito wrote,on my timestamp of 7/07/2009 1:51 AM:

My point was simply that calling them incompetent is a dangerous path.
It's the old, "Not Invented Here" syndrome  - i.e., the way we do things
has worked for us, so someone who does something different must clearly
be incompetent.

Or much more simply: so completely outside of general purpose IT as to betotally and completely irrelevant other than as an odd curiosity. Just like aFerrari is.

traditional enterprise IT.  I agree largely with that statement, and I
assume that you're no longer calling these developers with different
needs and requirements "incompetent" and "inexperienced".

A lot of them I am. For a number of very good reasons. Most have NEVER EVEReven attempted to write correct SQL or design a simple database. All they do iscobble together some pasted code from other applications, spread it over as manysystems as they can to make it perform minimally acceptably, and then claim it'sthe only way to achieve top performance. Total bollocks.

However, I believe that it's important to consider ways in which new
technologies can be leveraged to add efficiency, performance, etc.  For
example, if you look at some of my banking customers, while they have a
ton of traditional J2EE+Oracle+EMC storage infrastructure - a lot of
them also use proprietary C++ + Memcache + custom built non-SQL data
stores for things like algorithmic trading.  For them, their needs are
specific enough, and the upside large enough, that it's worth looking at
other options. So clearly even traditional Enterprise IT has areas where
standard relational data stores aren't an appropriate decision.

Sorry, I don't follow this one. They have needs that are specific enough thatit's worth looking outside the square and that is proof enterprise IT - whichwas never specific - needs to do the same? My apologies, but non-sequitur.

Still, not a major point so don't fret on it.

- large degrees of data independence
- very high concurrent query levels
- high levels of throughput
- very strong sensitivity to latency
- a need to scale linearly

Simply don't work well with traditional relational databases, and hence
you have these non-traditional data stores as alternative options for

these types of workloads.

Good. And that is precisely where they should stay: in the realm of the veryspecific and vertical markets they come from.

Or are we supposed to believe that Joe Average in the shopping centre cornerstore - or indeed just about any commercial venture outside of the web-specificmarket (believe it or not, they are the majority of IT users) - also needs 15PBdata stores with sub-microsecond query times over 100K clients?


I think not...

It was started as a way to do full-text search for user inboxes, and is
being extended to support more and more operational data at Facebook.
Some notes from their configuration:
- Approximately 600+ cores as of late '08
- Approximately 120TB of disk space
- 25TB of indexes
- 4B worker threads
- Average ~12ms response time for a search
- Software level features like automatic partitioning, distributed local
and remote replication, insert/append without read, automated data file

collapse and aggregation,

Now certainly, you can build a >100TB Oracle instance, but the cost and
the complexity would be challenging.  In addition, presumably they only
see this data store growing, and how do you deal with a 200, 300, 400TB
Oracle instance?   Google, for example, in 2006 had approximately 1.2PB
of data in their structured data store.  Heaven knows what it is now.

Exactly. Like I said: specific, vertical markets that have no influencewhatsoever on how general purpose IT is carried out.

And I doubt half those numbers are valid. One thing is to add up total diskcapacity, another is to call it the active data store for queries. The twocouldn't be more different.

But let me emphasize one particular area of your points above, which is veryclose to me:


"> - Average ~12ms response time for a search
> - Software level features like automatic partitioning, distributed local
> and remote replication, insert/append without read, automated data file
> collapse and aggregation,
"

I am a strong believer that as a search-enabling technology for very large datastores, indexes are way under-powered. Extensive, automatic partitioning is theway of the future. Oracle 11g has made incredible strides in that direction andit is my belief it will continue to do so. No need to change the relationalmodel: just improve it.

I wrote in my blog a few years ago what and how I considered we could addressthis very large data store problem. It was the "No Moore" series of posts.Won't repeat it here, still there for anyone to check.

But in a nutshell: Moore's law is history. we cannot continue to use "bruteforce" to approach searches and processing of very large data stores. You know:the "personal Petabyte" and other such.

I have proven to my own satisfaction with our own DW that extensive partitioningis indeed one way of addressing this problem of fast searches of very large datasets without the need for huge indexing an its associated maintenance nightmare.


Only in that sense do I find it interesting to follow some of the new 
developments.

The rest of Facebook and its specifics, quite frankly, is irrelevant to generalpurpose IT.

To use the gmail/facebook/my ad startup example, collapsing data means
you lose data.  In the case of the advertising startup, they
realistically can only collapse user persistence data they haven't seen
for a very long time.  Real-time analytics is critical for making ad

display decisions, ad placement optimization, spend analytics, etc.

Aggregate data is death for some workloads.

The same problem applies to search engine marketing, for example. And yet, I'veseen them address that problem with extensive pre-"crunching" of data and thencollapsing the results to a RDBMS. In fact, I worked at the company thathandled Google traffic for two years, doing just that. Guess what we used?


Hint:
Oracle, Perl, C, Linux on pizza boxes.

And it coped easily with Google traffic volumes back then. Still does.

So, it can be done. It however requires folks who know what data management andstorage is all about, not just "trendy" buzzwording.

What you may not realize is that those stats include the cost of the
DBAs, as they get accounted along with the development organization.


The "cost of DBAs" is grossly exaggerated and has been for years now.

We run an entire 75B$ organization with 3 DBAs, with SQL Server and Oracle datastores. NO ONE can convince me that our cost is a significant factor in ouroverall IT costing. To do so is to basically lie.In fact, knowing exactly how much we spend in IT and what our DBAs cost, I canconfidently guarantee to you that the so-called "excessive DBA cost" is completeboulderdash.

It's all about core competency.  If you're a property management
company, it makes zero sense to build your own email system and search
index.  It has nothing to do with your business.


Bingo.  And that goes for the vast majority of IT users out there.

With all due respect, you can hardly hold up one example where a project
was (what sounds to be) poorly managed from start to finish and tar an
entire option.

That project and other similar I have seen repeated ad nauseum in IT in the lastfew years. Would you like me to provide heaps of examples? I can...

The mistake they made was that manufacturing management *is* a core
competency for them, given their business.  Trying to map a traditional
solution to their model created something that was half off-the-shelf,
half written from scratch, and all a mess.

That, I am sorry, reinforces my point: they should have looked at the solutionthey had and which was satisfactory, and find ways of running it faster/moreefficiently. Instead of gong for "modern" solutions with absolutely no fitwhatsoever to their business.

I don't know - a lot of great stuff came out of the Web 1.0 "tech
wreck":
- Linux
- Commodity compute
- Distributed clusters
- Grid Computing
- MySQL/PostgreSQL
- Open Source
- Web-based applications
- Content Delivery Networks
- Datacenter Automation/Configuration Management

These are all things that either became powerhouses in their own right,

or fueled the next gen of technology.

Sure. But don't forget that not a single one of those is applicable only to avertical market. Which is what those non-SQL solutions are.

To be honest, I hear the same hype from traditional Enterprise IT, and
even from Oracle itself.  Let's sample the main link on Oracle.com
today:



Absolutely!   Oracle is not above hype by any means!

But I have yet to see proof that ANY modern j2ee or otherwise custom-designedsystem using non-general purpose IT technologies can be maintained easily. Itis so complex to do that they even invented a new term to mask the need forre-writing: "refactoring". Which is itself the biggest hype I've seen.

Again, not to keep hammering this home, it's about your core competency.
If your organization's core competency is IT in one way or another, then
it might make sense to build something rather than buy it.

Bingo. Now: exactly how many companies do IT as core competency, compared tothe market of IT users? Do I need to continue?

These days, almost everyone uses Linux somewhere in their
infrastructure.  Many people still use Solaris.  They each serve a
purpose.  But this was something that was "new" and "hyped" and turned
out to actually be pretty darn good.


Because it is GENERAL PURPOSE.  NOT a vertical market or very specific.
THAT, is WHY they were successful.


It's not "fraud", it's just "hype", something that is rampant in
technology, and the world in general.  It would be nice if reporters
were a little more skeptical.


I call it fraud.  But it's OK to disagree there.
;)


--
Cheers
Nuno Souto
dbvision@xxxxxxxxxxxx
--
//www.freelists.org/webpage/oracle-l

Follow-Ups:
- RE: No to SQL? Anti-database movement gains steam
  - From: Matthew Zito

References:
- No to SQL? Anti-database movement gains steam
  - From: Sunil Kanderi
- Re: No to SQL? Anti-database movement gains steam
  - From: Nuno Souto
- RE: No to SQL? Anti-database movement gains steam
  - From: Matthew Zito
- Re: No to SQL? Anti-database movement gains steam
  - From: Nuno Souto
- RE: No to SQL? Anti-database movement gains steam
  - From: Matthew Zito

Re: No to SQL? Anti-database movement gains steam

Other related posts: