Expand +



BI Beat: Clint Vosloo on Optimizing Your Data Archiving for BI Performance

by Dave Hannon

February 6, 2014


In this kickoff to the BI Beat podcast series, SAP Mentor Clint Vosloo of EV Technologies describes the optimal data archiving strategy for BI users today and why it makes a good stepping stone for the jump to SAP HANA.

To view the demo that Clint references in the podcast, click here.

Dave Hannon: Hello and welcome to BI Beat, an SAPinsider podcast series exploring how you can optimize your business intelligence environments. I’m Dave Hannon with SAPinsider.  Joining me to kick off this series is Clint Vosloo.  Clint is Managing Partner of EV Technologies, APJ. He’s an SAP mentor and has been involved in the BI space since 1997 when he worked on his first installation of Sybase IQ. Clint is also certified in Sybase IQ and SAP HANA. Welcome, Clint, thank you for joining us!

Clint Vosloo: Thanks so much for having me, and you’ve made me realize how old I am, throwing that number up there! 

Dave: Exactly, yes! Well Clint, you’ve been a longtime proponent of using Sybase IQ as an analytical database, it sounds like it was love at first sight, really. Let’s talk a little bit about what makes that the right choice in your opinion for starters.

Clint: So, I mean, you know, in the SAP space recently it’s been a very HANA-strong message and the concept of using a columnar database for analytics has really come to the forefront. Now, I was very fortunate enough to be in a company, as you said back in ’97, that used IQ which used columnar database technology for analytics. And you know, once you’ve got you head around the columnar store and the way they store information and where it just really changes the perception of what you can do with the data, and how you can crunch the numbers. So, you know, for the non-technical-minded people they get column and row and they get a bit freaked out by it but the databases originally were designed for high-transactional inserts, behind your transactional system it just needs to really insert a high volume of transactional TPS (transactions per second) as they call it, but they weren’t really designed to get data out.

So, where the columnar store comes in, whether it’s on disk, like SAP IQ, I think it’s called now, or HANA, it really lets you extract information out and just to be quite honest, beats the pants off the row-based database. So, for me if you’re running a BI system or analytical system whatever you want to call it, and you want to crunch large sets of data and also bring disparate data sets together and really, you know, create the enterprise data warehouse, merging all those data sources and you have to look at a columnar store. For me, it’s…very passionate about it but it’s not a matter of if you should; you have to, to be quite honest. 

Dave: Ok, ok. I know folks love to throw big numbers around to describe capacity, can you put it in terms I can understand? For example, how many episodes of “True Blood” could IQ load and analyze? 

Clint: So you want me, sure, I mean there’s ah-I’ve caught you reading up on the Guiness Book of Records, so SAP IQ did a Guiness Book of Records of loading data and I’ve actually just pulled up the slide here so don’t get this wrong, but it pretty much loaded 34.3 terabytes pf data per hour. So as you said in terms of “True Blood”, that’s 76 years of “True Blood” episodes per week, if that makes any sense.

Dave: That makes a lot of sense to me, yeah.

Clint: I’m not a vampire person, but according to my wife she was quite impressed when she heard that number. But the reality is you know, from a SAP IQ point of view the technology’s been around, it’s been proven, and I think what’s very exciting in the SAP space is merging the in-memory offering of HANA as well as having the disk solution of SAP IQ, and by all accounts teams are sort of cross-pollinating IP across emerging technologies so it’s really gonna be a win-win for the end user, in my opinion. 

Dave: Ok, ok. The sort of broader trends at work here are, you know, the growth of databases and big data. For customers running say, BW specifically, where do they typically start to see performance issues and what form do those performance issues take?  

Clint: So database bloat’s another one of those big quotes, you know maybe when you go to a big data presentation you’ll get you know, the size of the data is doubling every two years compared to the last 60 years or, you know, people love throwing those numbers around. But the reality is that there is a trend that’s happening and people don’t actually know what to do with this data. 

Big data strategy gets thrown around loosely these days, but people seem to be slightly confused in exactly what that means.  If you look at the BW instance, and you look at the SAP landscape, now from my side I’ve come from the Sybase BusinessObjects world so I’m not a classic  SAP person, so to speak, but you know, under BW they’ve typically got, you know whether it’s DB2, Oracle, Microsoft, those are all going back to your row-based databases. 

Now when you’ve got a row-based database, and BW being an analytical tool wanting to sort of analyze lots of data, you know it depends on the hardware and I know this is a very wild statement but when you get to the 1, 2, 3 terabyte instance of size, typically row databases you know, as I like to say, fall off the cliff. They can’t deal with the performance. So, what’s been happening in a lot of worlds is that the storage from an archive that—your history of data has to get archived in your BW instance—and what that means typically is it ends up sort of not in a read space—it’s been archived on the disk so it’s gone. 

Now, what’s been around for probably 8-10 years is the concept of near-line storage within BW. And there was a partner company called PBS out of Germany that wrote the near-line storage component for Sybase IQ at the time. And what that did is actually allowed BW, the analytical engine, to archive all the data for you, into Sybase IQ, but the key point is that it really made it readable, so it was never data that you couldn’t have access to. As of, going to get my dates wrong, because I think it was about 12 months ago, I think it was version 7.3 of BW, and the BW people would know the service packs and all that stuff but near-line storage to Sybase IQ became free and part of the BW offering. Now myself and a fellow SAP mentor, Ethan Jewett have done a few podcasts on this so if you just Google “BW Near-Line Storage” on my name you’ll find some videos of us setting this up. And it’s a really really really  compelling offer for customers, because what you have is, this for example, let’s use years as an example, you’ve got 2003 which wants to be your hot data so what you can do is you can put that, and I’m going to use Microsoft in this instance, under your BW instance, your Microsoft SQL Server database.  And then you can archive all your other data off into Sybase IQ. 

The benefits you get off that is automatically you, say you’ve put 6 years of data into Sybase IQ, the benefit from the—immediate performance benefit is most of your data is going to come off your hot data which is going to be sitting in your SQL Server instance and that’s only going to suddenly have from 7 years of data, it’s only going to have 1 year’s worth of data so automatically you’re going to get performance benefits.

On the flip side, you’re going to have 6 years of the data not sitting in the columnar store, which has been optimized for raw performance. And, you know, during the video when Ethan and I did some tests with that we were getting you know, out of sort of unstructured, sort of raw data at the DSO level out of Sybase IQ we were getting better performance compared to 1 year’s worth of data within SQL Server at an infocube level. So it’s really really compelling. 

And, moving down the line, you know, my take is if you are an SAP customer, you will be running HANA at some stage, it’s just a matter of when, not if, yet again, a lot of when. But if you implement the solution and you have got a large environment, for me it puts you in a very much—from a BW perspective, again—a very nice HANA-ready state.  Because what it does, it’ll ring-fence your hot data, so in an example I used your 2013 data, which you can then put in an in-memory appliance, so you have the benefit of columnar as well as in-memory, and then you have all your historical data which is still accessible, in columnar database on disk space but it’s still blisteringly fast. 

And the nice thing, and this is to me why it’s such a beautiful solution, the end user’s oblivious to where the data is. Whether they are using BOBJ on top or other Business Warehoue tools, they’re just coming through the same point in the data and it is moved automatically by BW between memory and disk, and they’re oblivious and they’re still getting great performance. So, for me, that’s a great solution, something I’ve been presenting on quite a bit around the world, and it’s been received very very well.  And it works great.

Dave: So it’s sort of almost a nice transition, for a company that’s thinking about going to HANA or planning to go to HANA at some point, this was sort of a transition strategy. 

Clint: Exactly, and I think what’s, it’s been quite interesting, a lot of early adopters of HANA this year the conversations I’ve been having you know, leading into this year is that, ‘Ok, so we’ve got the 1 terabyte instance, it’s all great, but now what? We can’t keep just adding onto that, we need to come up with a strategy. This is coming in a more conservative way I guess, is to say get that archiving strategy in place so when you have to go for funding, you know, you’re going in-memory for a year, not as sort of a…you can almost ring-fence that number. And then typically what happens in time is the end users will say, ‘Alright, I want 2 years in-memory, 3 years in-memory’, and you can slowly grow your in-memory database, but then that sets your speed, you’re not forced to just keep buying, I think that’s a great solution for customers. 

Dave: Ok, ok. How about those that may be running BW Accelerator, is it also a good solution for them?

Clint: It is, yeah, I mean it is, cause you know the thing is the acceler—I, you know, to be quite honest I’m not too sure how the Accelerator-to-HANA conversion works, I’m definitely not in sales, I’m a technical person, so talk to your AE about that, I know there’s some equation but for sure, I mean it, from the Accelerator point of view it is all memory-bought, in blades so it all depends how much you have in-memory, so if it’s some data that you still wanna have access to but you don’t want to have it in, or you run out of space in your BWA then  absolutely, then you can move it across. 

And the key as well, you know in the BW world there’s something called multi-providers, so you can have a query coming on both IQ and SQL Server in the first instance, or IQ and HANA, and merge together to the end user, and they’re oblivious, which is to me, the beauty of it.

Dave: Sure, sure, ok.  So, for those companies who do take the SAP HANA plunge, how simple is it to set up Smart Data Access? 

Clint: Smart Data Access is another, you know, the way I see the world is, in the SAP space is you have classic SAP customers running ERP and BW, and you’ve got your agnostics running enterprise data warehouse, so the near-line storage setup which is your, you know your archive, BW customers it’s really simple, as I said there’s—somewhere on Youtube there’s a 40-minute video of us setting it up, and we actually spent, Ethan and I spent most of the time getting the two servers to connect via Amazon, once we had them talking to each other it was really really simple, which is great. 

Now Smart Data Access is something that, as I’m coming from the more enterprise data warehouse world, is I’m really really excited about this. It’s super simple to set up, and it works very, very well.  So if you look at an enterprise data warehouse customer, let’s take the retail example, I love that because I’ve worked a lot in that space, is you could have a situation where all your point-of-sale data is streaming into HANA. Now, something like IQ wouldn’t deal with that very well, it doesn’t like the high-insert rate, it’s not designed for that.  But HANA can handle that.  So, you can have the ability to put in all you POS data in real-time during the day, from retail into HANA, but then have the ability to archive that into something like IQ from a disk-based storage where it’s cheaper. So if anyone’s worked in the retail space and put huge amounts of data that’s wide and massive so, your space runs out quickly. Now Smart Data Access is one of I suppose, what you can call it is federation, so what it’s gonna do is it’s gonna merge your HANA data sets and your IQ data sets together. 

Now, with Sp7 which has just come out, there’s some huge enhancements in Smart Data Access where you can actually update IQ from HANA, I haven’t done it yet but it’s on my very long to-do list for the year and I’ve said it in the database space, I don’t think anybody else can really compete because if you can have that blend of in-memory, looking at the enterprise data warehouse customer, and disk, and you’re gonna get the affordability of disk and you know, keep that up for 7, 10 years and have the ability to analyze that, that once again, and this is an important point, is have your end users coming through a single entry point. Because in the past, what we’ve always faced is if you want to get that blistering in-memory speed, ok, well then you better use this report, or you better use that report, and that becomes clunky, because what you end up doing is moving data around, you’re reconciling and it gets confusing, and your end users don’t really know what you want to go in. In my world, and you know in the BI analytics space, if your end user’s not using your product, then you know, then you’re not really a success. So, you need to meet, or we need to as sort of techologists, make it as simple as possible for our end user. So I’m a huge fan of Smart Data Access, it’s one of the things I always demo as well, and I think it’s a great solution for this, but it’s hot and cold, for your enterprise data warehouse customer. 

Dave: Sure, ok. Lastly, I was going to ask if you have any warnings or caveats when it comes to moving, replicating, and reconciling data in disparate database engines?

Clint: That’s a 2-hour podcast alone. The reality is, the devil’s in the data. So, I would always, if people are doing the database migration, really take time and effort to plan for it because if you do migrate and you don’t actually cross the Ts and dot the Is, it’ll always come back to bite you. Having said that, from a HANA point of view and an ERP and Business Warehouse it’s a bit different because from their side I think they’ve got the RDS, the rapid deployment solutions, you know what tables you’re dealing with, so you know what the structures are like, you know what the queues are like, so it’s a bit more of a ring-fence solution. If you’re looking at the enterprise data warehouse model which is where I’ve played for most of my life, you don’t really understand what data’s out there, what compatibility issues are out there. So as part of a migration project it’s very, very key to do this, it’s one of the big things but often people don’t, you know, put funding and budget into it.  But if done properly it can work really well.

You know, part of the demos I always do is, you know, with power designers another part of the Sybase tool that came across that’s now part of the SAP stable, it’s a great modeling tool, you can take a SQL Server database, reverse engineer it from the power designer, flip the database management system into Sybase IQ or SAP IQ or into HANA, and then deploy those tables straight away. So, there are easy ways of doing it, there are smart ways of doing it, but just reach out to people and get advice, that’s my key thing. The only, you know, the only piece of the puzzle missing in terms of reconciling data in disparate database engines is what I would love to see happen from the Smart Data Access point of view, is if almost this archiving that happens in the near-line solution storage could happen automatically.  

So what I mean by that is you could have a table, say, “Sales” in HANA, and automatically based on a time dimension of 2 weeks, that can sort of trickle feed that into IQ, if that can come down the line and SAP hopefully are working on that, if they can hear me or they listen to me, I really think they are going to be very, very difficult to beat because, with that movement of data again, you know what you’re doing is you need to, you know you’re leaving the POS, point-of-sale point of view, all retailers are 24/7 or pretty much 24/7 these days, with online, it’s, it always gets quite tricky as to find when is the end of day, or when do you move the data. You’ve got to sort of almost get a point in time, then define, move the data across, then you’ve got to reconcile that you’re ok with what you moved before you sort of delete it from your source.  So if SAP can come up pretty much like they’ve done with near-line storage of a way to archive, you know, use it, use SLT or use Data Services to trickle feed that into IQ, then that’s going to be very interesting and something I hope to see in the near future. 

Dave: Ok, great.  For those folks who are interested in watching the demo that Clint referenced, we’ll get a link to it up on our page here on this podcast page.  Ok, Clint Vosloo, Managing Director of EV Technologies and SAP mentor, thank you very much for joining us today on the BI Beat!

Clint: Thanks so much Dave.               

An email has been sent to:

More from SAPinsider


Please log in to post a comment.

No comments have been submitted on this article. Be the first to comment!