This past year, I did an exhaustive analysis of potential candidates to replace an aging HP EVA infrastructure for storage. After narrowing the choices down, based on several factors, the one that had the best VMware integration, along with mainframe support was the EMC Symmetrix VMAX.
One of the best things about choosing VMAX in my mind was PowerPath. It can be argued whether PowerPath provides benefits, but most people I have talked to in the real world swear that PowerPath is brilliant. But let’s face it, it HAS to be brilliant to justify the cost per socket. Before tallying up all my sockets and asking someone to write a check, I needed to do my own due diligence. There aren’t many comprehensive PowerPath VE vs. Round Robin papers out there, so I needed to create my own.
My assumption was that I’d see a slight performance edge on PowerPath VE, but not enough to justify the cost. Part of this prejudice comes from hearing the other storage guys out there say there’s no need for vendor specific SATP / PSP’s since VMware NMP is so great these days. Here’s hoping there’s no massive check to write! By the way, if you prefer to skip the beautiful full color screen shots, go ahead and scroll down to the scorecard for the results.
Tale of the Tape
My test setup was as follows:
|Test Setup for PowerPath vs. Round Robin|
|2 – HP DL380G6 dual socket servers|
|2 – HP branded Qlogic 4Gbps HBA’s each server|
|2 – FC connections to a Cisco 9148 and then direct to VMAX|
|VMware ESXi 5 loaded on both servers|
|All tests were run on 15K FC disks – no other activity on the array or hosts|
Let’s Get It On!
(i’m sure there’s a royalty I will have to pay for saying that)
Host 1 has PowerPath VE 5.7 b173, and host 2 has Round Robin with the defaults. Each HBA has paths to 2 directors on 2 engines. I used IOmeter from a Windows 2008 VM with fairly standard testing setups. Results are from ESXTOP captures at 2 second intervals.
The first test I ran was 4k 100% read 0% random. All these are with 32 outstanding IO’s, unless otherwise specified.
Here is Round Robin
And PowerPath VE
First thing I noticed was that Round Robin looks exactly like my mind thought it would look. Not that that means anything. I do realize that this test could have been faster on RR with the IOPS set to 1, and maybe I’ll do that in Round 2. As for round 1, with more than twice the number of IOPS, PowerPath is earning its license fee here for sure.
How about writes? Here’s 4k 100% write 0% random.
Once again, PowerPath VE shows near 2x the IOPS and data transfer speeds. I’m starting to see a pattern emerge. 😉
How about larger blocks? 32K 100% read 0% random.
PowerPath is really pulling ahead here with over 2x the IOPS yet again.
32K 100% write 0% random
Wow! PowerPath is killing it on writes! Maybe PP has some super-secret password to unlock some extra oomph from VMAX’s cache. 😉
Nevertheless, it’s obvious that PP is beating up on the default Round Robin here, so let’s throw something tougher at them.
Here’s 4K 50% read 25% random with 4 outstanding IO’s.
The gap between the contenders closes a bit with this latest workload at only a 24% improvement for PP. But as we all know, IOPS doesn’t tell the entire story. What about latency?
4k 100% write 0% random
Write latency is 138% higher with Round Robin! That’s a pretty big gap. Is it meaningful? Depends on your workload I guess.
Scorecard after Round 1
So far, PowerPath looks like a necessity for folks running EMC arrays. I’m not sure how it would work on other arrays, but it really shines on the VMAX. In some of my tests the IOPS with PowerPath were three times greater than with the standard Round Robin configuration! I do believe that the gap will shrink if I drop the IOPS setting to 1, but I doubt it will shrink to anywhere near even. We will see.
In addition to the throughput and latency testing, I also did some failover tests. I’m going to save that for a later round. I don’t want this post to get too long.