Re: No to SQL? Anti-database movement gains steam

  • From: Nuno Souto <dbvision@xxxxxxxxxxxx>
  • To: Matthew Zito <mzito@xxxxxxxxxxx>
  • Date: Fri, 10 Jul 2009 19:05:47 +1000

Matthew Zito wrote,on my timestamp of 7/07/2009 1:51 AM:



My point was simply that calling them incompetent is a dangerous path.
It's the old, "Not Invented Here" syndrome  - i.e., the way we do things
has worked for us, so someone who does something different must clearly
be incompetent.

Or much more simply: so completely outside of general purpose IT as to be totally and completely irrelevant other than as an odd curiosity. Just like a Ferrari is.


traditional enterprise IT.  I agree largely with that statement, and I
assume that you're no longer calling these developers with different
needs and requirements "incompetent" and "inexperienced".

A lot of them I am. For a number of very good reasons. Most have NEVER EVER even attempted to write correct SQL or design a simple database. All they do is cobble together some pasted code from other applications, spread it over as many systems as they can to make it perform minimally acceptably, and then claim it's the only way to achieve top performance. Total bollocks.



However, I believe that it's important to consider ways in which new
technologies can be leveraged to add efficiency, performance, etc.  For
example, if you look at some of my banking customers, while they have a
ton of traditional J2EE+Oracle+EMC storage infrastructure - a lot of
them also use proprietary C++ + Memcache + custom built non-SQL data
stores for things like algorithmic trading.  For them, their needs are
specific enough, and the upside large enough, that it's worth looking at
other options. So clearly even traditional Enterprise IT has areas where
standard relational data stores aren't an appropriate decision.

Sorry, I don't follow this one. They have needs that are specific enough that it's worth looking outside the square and that is proof enterprise IT - which was never specific - needs to do the same? My apologies, but non-sequitur.
Still, not a major point so don't fret on it.




- large degrees of data independence
- very high concurrent query levels
- high levels of throughput
- very strong sensitivity to latency
- a need to scale linearly

Simply don't work well with traditional relational databases, and hence
you have these non-traditional data stores as alternative options for
these types of workloads.

Good. And that is precisely where they should stay: in the realm of the very specific and vertical markets they come from.

Or are we supposed to believe that Joe Average in the shopping centre corner store - or indeed just about any commercial venture outside of the web-specific market (believe it or not, they are the majority of IT users) - also needs 15PB data stores with sub-microsecond query times over 100K clients?

I think not...



It was started as a way to do full-text search for user inboxes, and is
being extended to support more and more operational data at Facebook.
Some notes from their configuration:
- Approximately 600+ cores as of late '08
- Approximately 120TB of disk space
- 25TB of indexes
- 4B worker threads
- Average ~12ms response time for a search
- Software level features like automatic partitioning, distributed local
and remote replication, insert/append without read, automated data file
collapse and aggregation,
Now certainly, you can build a >100TB Oracle instance, but the cost and
the complexity would be challenging.  In addition, presumably they only
see this data store growing, and how do you deal with a 200, 300, 400TB
Oracle instance?   Google, for example, in 2006 had approximately 1.2PB
of data in their structured data store.  Heaven knows what it is now.

Exactly. Like I said: specific, vertical markets that have no influence whatsoever on how general purpose IT is carried out.

And I doubt half those numbers are valid. One thing is to add up total disk capacity, another is to call it the active data store for queries. The two couldn't be more different.

But let me emphasize one particular area of your points above, which is very close to me:

"> - Average ~12ms response time for a search
> - Software level features like automatic partitioning, distributed local
> and remote replication, insert/append without read, automated data file
> collapse and aggregation,
"

I am a strong believer that as a search-enabling technology for very large data stores, indexes are way under-powered. Extensive, automatic partitioning is the way of the future. Oracle 11g has made incredible strides in that direction and it is my belief it will continue to do so. No need to change the relational model: just improve it.

I wrote in my blog a few years ago what and how I considered we could address this very large data store problem. It was the "No Moore" series of posts. Won't repeat it here, still there for anyone to check.

But in a nutshell: Moore's law is history. we cannot continue to use "brute force" to approach searches and processing of very large data stores. You know: the "personal Petabyte" and other such.

I have proven to my own satisfaction with our own DW that extensive partitioning is indeed one way of addressing this problem of fast searches of very large data sets without the need for huge indexing an its associated maintenance nightmare.

Only in that sense do I find it interesting to follow some of the new 
developments.

The rest of Facebook and its specifics, quite frankly, is irrelevant to general purpose IT.


To use the gmail/facebook/my ad startup example, collapsing data means
you lose data.  In the case of the advertising startup, they
realistically can only collapse user persistence data they haven't seen
for a very long time.  Real-time analytics is critical for making ad
display decisions, ad placement optimization, spend analytics, etc.
Aggregate data is death for some workloads.

The same problem applies to search engine marketing, for example. And yet, I've seen them address that problem with extensive pre-"crunching" of data and then collapsing the results to a RDBMS. In fact, I worked at the company that handled Google traffic for two years, doing just that. Guess what we used?

Hint:
Oracle, Perl, C, Linux on pizza boxes.

And it coped easily with Google traffic volumes back then. Still does.

So, it can be done. It however requires folks who know what data management and storage is all about, not just "trendy" buzzwording.


What you may not realize is that those stats include the cost of the
DBAs, as they get accounted along with the development organization.

The "cost of DBAs" is grossly exaggerated and has been for years now.
We run an entire 75B$ organization with 3 DBAs, with SQL Server and Oracle data stores. NO ONE can convince me that our cost is a significant factor in our overall IT costing. To do so is to basically lie. In fact, knowing exactly how much we spend in IT and what our DBAs cost, I can confidently guarantee to you that the so-called "excessive DBA cost" is complete boulderdash.


It's all about core competency.  If you're a property management
company, it makes zero sense to build your own email system and search
index.  It has nothing to do with your business.

Bingo.  And that goes for the vast majority of IT users out there.


With all due respect, you can hardly hold up one example where a project
was (what sounds to be) poorly managed from start to finish and tar an
entire option.

That project and other similar I have seen repeated ad nauseum in IT in the last few years. Would you like me to provide heaps of examples? I can...


The mistake they made was that manufacturing management *is* a core
competency for them, given their business.  Trying to map a traditional
solution to their model created something that was half off-the-shelf,
half written from scratch, and all a mess.

That, I am sorry, reinforces my point: they should have looked at the solution they had and which was satisfactory, and find ways of running it faster/more efficiently. Instead of gong for "modern" solutions with absolutely no fit whatsoever to their business.


I don't know - a lot of great stuff came out of the Web 1.0 "tech
wreck":
- Linux
- Commodity compute
- Distributed clusters
- Grid Computing
- MySQL/PostgreSQL
- Open Source
- Web-based applications
- Content Delivery Networks
- Datacenter Automation/Configuration Management

These are all things that either became powerhouses in their own right,
or fueled the next gen of technology.

Sure. But don't forget that not a single one of those is applicable only to a vertical market. Which is what those non-SQL solutions are.


To be honest, I hear the same hype from traditional Enterprise IT, and
even from Oracle itself.  Let's sample the main link on Oracle.com
today:


Absolutely!   Oracle is not above hype by any means!

But I have yet to see proof that ANY modern j2ee or otherwise custom-designed system using non-general purpose IT technologies can be maintained easily. It is so complex to do that they even invented a new term to mask the need for re-writing: "refactoring". Which is itself the biggest hype I've seen.


Again, not to keep hammering this home, it's about your core competency.
If your organization's core competency is IT in one way or another, then
it might make sense to build something rather than buy it.

Bingo. Now: exactly how many companies do IT as core competency, compared to the market of IT users? Do I need to continue?

These days, almost everyone uses Linux somewhere in their
infrastructure.  Many people still use Solaris.  They each serve a
purpose.  But this was something that was "new" and "hyped" and turned
out to actually be pretty darn good.

Because it is GENERAL PURPOSE.  NOT a vertical market or very specific.
THAT, is WHY they were successful.

It's not "fraud", it's just "hype", something that is rampant in
technology, and the world in general.  It would be nice if reporters
were a little more skeptical.

I call it fraud.  But it's OK to disagree there.
;)


--
Cheers
Nuno Souto
dbvision@xxxxxxxxxxxx
--
//www.freelists.org/webpage/oracle-l


Other related posts: