Disclaimer: this post is not meant to be easy-digestible, so please stay with me through the text and let’s have a discussion after that.
What is SAP HANA?
When in May 2010 I first heard Hasso Platner, Chairman of the Board in SAP, talking about the in-memory revolution they were planning with the SAP HANA product, I scratched my head. I had been working with SAP NetWeaver BW Accelerator (BWA) already for 4 years, and it was obvious that HANA was the continuation of the same technology. But what I made me curious was why out of three major principles underpinning the technology – massive parallel processing (MPP), columnar-based data store, and in-memory data store – SAP had chosen the last one as a flagship feature for the new product? It was not clear for me at that time. I decided that it must be due to the fact, that there are products already strongly identified with columnar data presentation (like Sybase IQ or Vertica) and with analytics MPP processing (like Teradata or HP Neoview), while in-memory databases, like TimesTen, Altibase or solidDB, were not that known to a broader audience.
For a last couple of years we’ve seen SAP effort to re-claim the “innovative” adjective next to the company name. So, using “in-memory” – existing, but not that wide known, technology seemed to be a good match for “innovation”. As we saw during last year, indeed HANA was used successfully by SAP marketing to generate lots of “game-changing”, “revolutionary”, “deliciously disruptive” buzz. This buzz was picked up by many. So, it was quite interesting to read the contradictory statement made by the analyst Dennis Gaughan at Gartner Symposium (source):
… Gaughan said none of the four vendors [IBM, Microsoft, Oracle, SAP] are “re-imagining” IT, as per the theme of the Gartner conference.
“You won’t find innovation in their product portfolio,” he said. “You might find it if you try and talk to the research parts of these organisations.”…
Indeed for those of us with a broader and deeper technical view, the question remained open: “What makes SAP HANA the innovative product among many existing in-memory database management systems?” I do not think this question has been fully answered by SAP so far. Let me share my understanding and thoughts here.
Firstly, in my opinion it is not the technology, as it is the ultimate promise, which is visionary: running transactional and analytic systems on a single platform with a single store of data. The whole data warehousing, as we know it, was born from a need to remove analytic workload from the transactional systems. In addition transactional data structures were transformed to analysis-optimized (like star schemas or OLAP cubes) along with data enrichment. Then ETL systems came into place to remove data transformation workload from data warehousing systems. Now SAP promises to bring everything back at one system (see graph below) – making separate ETL and EDW systems (and much of related skills and expertise) obsolete. This will be a huge change, yet from my discussions with SAP customers it was not clear if they had gotten it. Many of them want to have SAP HANA database for the sake of running ERP alone faster. Again – it is not what is revolutionary with the SAP vision to be delivered thanks to the HANA platform.
OLTP and OLAP systems today require not only separate computing resources, but as well different data structures optimized for specific profiles of queries. SAP’s promise is that once transactional (e.g. ERP) and analytic (e.g. BW) systems are running on a single HANA platform, they will be using a single copy of data. All additional data modifications required, for example by analytics part of the system, like data cleansing, transformation, enrichment, will be done on the fly during each execution of queries [VitalBI: I bet there is going to be some kind of results caching, even if some guys in SAP marketing disagree]. In-memory data storage together with in-database calculations, append-only tables, and multi-cores processing are all the features, which are going to help SAP to achieve the “single business platform” promise.
What is different comparing to other in-memory database management systems, that SAP’s ambition to bring in-memory technology to the next level: Enterprise. It means not only specific and limited use cases, but mixed-workload, big-scale, high-volumes scenarios.
Secondly, there is not enough information about the innovation in the technology being developed by SAP. You will not find many white papers from SAP describing what is under the hood of the new database. Just storing data in the RAM, and treating this as a faster storage, is nothing new. Sybase ASE – the database acquired by SAP last year – has an “In-memory database” option. SAP HANA certainly has to offer something better.
My discussion with Franz Faerber, SAP HANA chief architect, at SAP Influencer Summit last summer helped to get a bit deeper view into the technology, beyond obvious things. In a nutshell, two major drivers behind SAP HANA technology were:
- “RAM is slow” (And you thought “in-memory” is about storing data in RAM??)
- “CPU clock frequency reaches its growth barrier”
In SAP HANA everything is about the performance, which is a prerequisite for the real-time data processing. Even if RAM is faster than ‘spindle’ hard drives, CPUs still waste cycle while waiting for data from RAM. Therefore the optimization goal is to reduce the idle cycles by making sure that there is as many useful data in CPU caches as possible. The HANA database has to be coded using CPU-cache-aware algorithms and processing CPU-cache-optimized data structures. Well, back in 2006 Jim Gray from Microsoft discussed this principle in his famous presentation “RAM Locality is King”.
Most of the data is stored in SAP HANA databases in columnar and compressed format. This data still has to be converted to records during processing, so it is important that this step happens as late as possible – something called late materialization. Ideally operations on the data should be able to run directly on compressed data, without need to uncompress them.
As just mentioned in the previous paragraphs: in HANA everything is about performance, so when the clock speed growth slows down, the search for performance is in multi-core CPU processing. It is the worst kept secret on the market that about a dozen of developers from Intel spent months in SAP office coding the core of SAP in-memory technology to use all possible features of Intel Xeon chipset architecture: HyperThreading, Intel Turbo Boost, Threading Building Blocks. That’s why its top performance SAP HANA database can achieve only when running bare metal on Intel Xeon CPUs, and not on other platforms or in the virtualized environment.
Last, but not least: SAP HANA database is in fact the hybrid database: the RAM is used as a primary data store, but there are still SSDs or spindle drives used for data persistence, like in case of the power lost. I saw some customers being surprised when facing the SAP HANA hardware with external storage besides lots of RAM.
On SAP invitation I am going to attend SAP Influencer Summit during December 13-14, and I am looking forward to it as a chance to get a layer deeper into what makes SAP in-memory technology truly a step forward comparing to others and how they are going to overcome some remaining technology barriers.