Is SAP HANA about the “in-memory database”?

Disclaimer: this post is not meant to be easy-digestible, so please stay with me through the text and let’s have a discussion after that.

What is SAP HANA?

When in May 2010 I first heard Hasso Platner, Chairman of the Board in SAP, talking about the in-memory revolution they were planning with the SAP HANA product, I scratched my head. I had been working with SAP NetWeaver BW Accelerator (BWA) already for 4 years, and it was obvious that HANA was the continuation of the same technology. But what I made me curious was why out of three major principles underpinning the technology – massive parallel processing (MPP), columnar-based data store, and in-memory data store – SAP had chosen the last one as a flagship feature for the new product? It was not clear for me at that time. I decided that it must be due to the fact, that there are products already strongly identified with columnar data presentation (like Sybase IQ or Vertica) and with analytics MPP processing (like Teradata or HP Neoview), while in-memory databases, like TimesTen, Altibase or solidDB, were not that known to a broader audience.

For a last couple of years we’ve seen SAP effort to re-claim the “innovative” adjective next to the company name. So, using “in-memory” – existing, but not that wide known, technology seemed to be a good match for “innovation”. As we saw during last year, indeed HANA was used successfully by SAP marketing to generate lots of “game-changing”, “revolutionary”, “deliciously disruptive” buzz. This buzz was picked up by many. So, it was quite interesting to read the contradictory statement made by the analyst Dennis Gaughan at Gartner Symposium (source):

… Gaughan said none of the four vendors [IBM, Microsoft, Oracle, SAP] are “re-imagining” IT, as per the theme of the Gartner conference.

You won’t find innovation in their product portfolio,” he said. “You might find it if you try and talk to the research parts of these organisations.”…

Indeed for those of us with a broader and deeper technical view, the question remained open: “What makes SAP HANA the innovative product among many existing in-memory database management systems?” I do not think this question has been fully answered by SAP so far. Let me share my understanding and thoughts here.

Firstly, in my opinion it is not the technology, as it is the ultimate promise, which is visionary: running transactional and analytic systems on a single platform with a single store of data. The whole data warehousing, as we know it, was born from a need to remove analytic workload from the transactional systems. In addition transactional data structures were transformed to analysis-optimized (like star schemas or OLAP cubes) along with data enrichment. Then ETL systems came into place to remove data transformation workload from data warehousing systems. Now SAP promises to bring everything back at one system (see graph below) – making separate ETL and EDW systems (and much of related skills and expertise) obsolete. This will be a huge change, yet from my discussions with SAP customers it was not clear if they had gotten it. Many of them want to have SAP HANA database for the sake of running ERP alone faster. Again – it is not what is revolutionary with the SAP vision to be delivered thanks to the HANA platform.

OLTP and OLAP systems today require not only separate computing resources, but as well different data structures optimized for specific profiles of queries. SAP’s promise is that once transactional (e.g. ERP) and analytic (e.g. BW) systems are running on a single HANA platform, they will be using a single copy of data. All additional data modifications required, for example by analytics part of the system, like data cleansing, transformation, enrichment, will be done on the fly during each execution of queries [VitalBI: I bet there is going to be some kind of results caching, even if some guys in SAP marketing disagree]. In-memory data storage together with in-database calculations, append-only tables, and multi-cores processing are all the features, which are going to help SAP to achieve the “single business platform” promise.

What is different comparing to other in-memory database management systems, that SAP’s ambition to bring in-memory technology to the next level: Enterprise.  It means not only specific and limited use cases, but mixed-workload, big-scale, high-volumes scenarios.

Secondly, there is not enough information about the innovation in the technology being developed by SAP. You will not find many white papers from SAP describing what is under the hood of the new database. Just storing data in the RAM, and treating this as a faster storage, is nothing new. Sybase ASE – the database acquired by SAP last year – has an “In-memory database” option. SAP HANA certainly has to offer something better.

My discussion with Franz Faerber, SAP HANA chief architect, at SAP Influencer Summit last summer helped to get a bit deeper view into the technology, beyond obvious things. In a nutshell, two major drivers behind SAP HANA technology were:

  1. “RAM is slow” (And you thought “in-memory” is about storing data in RAM??)
  2. “CPU clock frequency reaches its growth barrier”

In SAP HANA everything is about the performance, which is a prerequisite for the real-time data processing. Even if RAM is faster than ‘spindle’ hard drives, CPUs still waste cycle while waiting for data from RAM. Therefore the optimization goal is to reduce the idle cycles by making sure that there is as many useful data in CPU caches as possible. The HANA database has to be coded using CPU-cache-aware algorithms and processing CPU-cache-optimized data structures. Well, back in 2006 Jim Gray from Microsoft discussed this principle in his famous presentation “RAM Locality is King”.

Most of the data is stored in SAP HANA databases in columnar and compressed format. This data still has to be converted to records during processing, so it is important that this step happens as late as possible – something called late materialization. Ideally operations on the data should be able to run directly on compressed data, without need to uncompress them.

As just mentioned in the previous paragraphs: in HANA everything is about performance, so when the clock speed growth slows down, the search for performance is in multi-core CPU processing. It is the worst kept secret on the market that about a dozen of developers from Intel spent months in SAP office coding the core of SAP in-memory technology to use all possible features of Intel Xeon chipset architecture: HyperThreading, Intel Turbo Boost, Threading Building Blocks. That’s why its top performance SAP HANA database can achieve only when running bare metal on Intel Xeon CPUs, and not on other platforms or in the virtualized environment.

Last, but not least: SAP HANA database is in fact the hybrid database: the RAM is used as a primary data store, but there are still SSDs or spindle drives used for data persistence, like in case of the power lost. I saw some customers being surprised when facing the SAP HANA hardware with external storage besides lots of RAM.


On SAP invitation I am going to attend SAP Influencer Summit during December 13-14, and I am looking forward to it as a chance to get a layer deeper into what makes SAP in-memory technology truly a step forward comparing to others and how they are going to overcome some remaining technology barriers.

Advertisements

9 Comments

Filed under HANA, SAP

9 responses to “Is SAP HANA about the “in-memory database”?

  1. i would go with Intel before trying Gartner, but that’s just IMHO.

  2. Small correction: Intel’s engineering team has been based full-time in WDF since 2005 doing the cpu-level optimizations for SAP’s in-memory technology. Truly deep co-innovation partnership at the instruction-set level.

  3. The marketing and push behind “in-memory technologies” is pretty inexplicable considering (1) BWA never had such a push before, (2) the technology itself (TREX) has been in SAP since early 2000s and (3) there are other competing technologies that are largely more compelling when it comes to big data analytics. I’ve said this before and I’ll say it again, when it comes to technology innovation in software nothing can compete with FOSS. It amazes me still that people look at some enterprise software and think of how revolutionary it is (today).

    “Secondly, there is not enough information about the innovation in the technology being developed by SAP. You will not find many white papers from SAP describing what is under the hood of the new database.”
    Actually there is, you just never hear about it because people in the technology business world rarely have an attraction for such things. The Hasso Plattner Institute is rarely brought up in conversation, but really serves as the hub of innovation for SAP. As both you and I from Christensen’s book, large corporations have an extremely hard time adopting disruptive innovation.

    Wanna get under the hood? Here ya go:
    http://epic.hpi.uni-potsdam.de/Home/Publications
    http://vimeo.com/user3992599/videos
    (I’ve downloaded many of the PDF main white papers that came out of HPI and I can share them with you if you want)

    “That’s why its top performance SAP HANA database can achieve only when running bare metal on Intel Xeon CPUs, and not on other platforms or in the virtualized environment.”
    Yup! People don’t realize this. Intel is an extremely innovative company too! I’m glad SAP partner’s with them. Effectively we now eliminate I/O bound activity and rely on CPU bound activity.

    I was going to write a post almost exactly like this. Nice one! I actually just recorded a webinar (which I’m waiting to get published out) that highlights the same things.

    p.s. from what I understand, Franz actually wrote most of BWA/TREX….genius 🙂

  4. I fully agree, many of the technologies used in HANA are not innovative as they existed before. But the approach is the main innovation. So far, advanced analytics was an add on to existing ERP (or other) systems within the enterprise. Whether you purchased Vertica or Teradata in most of the cases you were creating another copy of the data, an organization developing, supporting, testing etc….

    SAP attempt to leverage a set of best of bread available hardware and software is really innovative. They aim to have only one version of most of the data, real-time analytics, and invalidate most of the books about data warehousing (the need for historical data loads or aggregations). The current approach with near real time replication is the first step as it still requires data movement. But it is more transparent and in the long run, the data will not be copied, but analyzed as it is stored in sources. Conceptually this is very different from what competitors are doing and what have been done recently (different, dedicated, optimized appliance for OLTP and OLAP).

    I can understand that SAP approach, if successful, can be a true game changer, especially from a CIO point of view… Future will show.

  5. Frank Renkes

    If one is interested to learn a bit more about the co-innovation between SAP and Intel around HANA. http://tinyurl.com/blqnu2v

  6. Friendly feedback, please grammar check your post 🙂 There are some glaring grammatical boo boo’s 🙂

    • Pushkar, thank you for feedback. Indeed I have a tendency to write long sentences, with many subthreads in them, like in many examples in this post, because it is so difficult to compress lots of information into short form of blogs, and then – on top of that – to have as well some of flow of reasoning, which readers could follow and from which they could undersand my way of thinking, as this is what I am trying to convey. ;-D

  7. Pingback: Podcast – Debating the Value of SAP HANA

  8. Vitailiy, I made similar points that most important innovation for SAP would be to run transaction systems on SAP HANA (as far as Hana is concerned). And I have the great doubt that customers would migrate to SAP HANA particularly anything that requires systems to be audit-able. I am not aware of electronic storage that is in commercial production which can guarantee persistence ?

    From SAP road-map at SAP UK User Group conference, SAP have indicated they would build business work-flow/ BPM tool on top of SAP HANA and we would be able to define and quickly build work-flow based application which is supported by HANA database. I think that could be big thing on the cloud.

    If SAP actually delivers what it has indicated, I would think from business point of view, the hype has been ok.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s