Acclinet Blog

The latest in IT

Tukwila Itanium First Revenue Shipments

Posted by: admin

Tagged in: Untagged 

According to a blog entry at Intel, the newest version of Itanium processor is shipping to vendors for resale.  This means we should probably expect a system launch from HP sometime soon.  More details on the chip to follow soon.

 


POWER7: Massive Multi-threading comes to Power

Posted by: dan

Tagged in: Untagged 

IBM's upcoming POWER7 processor will be impressive. It manages to bring a much larger number of cores and threads while maintaining a higher clock rate of about 4GHz than anyone else. Although, not as high as POWER6's 5GHz.

It will ship in a few different packages varying the number chips per MCM (multi-chip module): a blade version with a single chip, a dual-chip MCM for the majority of the Power Servers, and a quad-chip MCM for POWER7 IH node, which are currently intended only to be used in the Blue Waters supercomputer. It's temping to think that the quad-chip modules will make it into the high-end Power 595 but a need for liquid cooling at any reasonable clock rate will probably prevent this. This will certainly cause people to be confused about the performance stats of these processors because the number of cores and threads are not the only thing that varies with the three versions. Just to confuse things further, it looks like IBM will offer the single chip and dual-chip MCM with not only 8-core chips but 6-core and 4-core also. The dual-chip "rejects" will most likely only be used in the entry level systems.

A single chip contains a maximum 8-cores with support for 4 SMT threads for a total of 32 threads per chip. Most systems will contain dual-chip MCMs which offer 64 threads per socket. Assuming that IBM offers them in dual-socket books like the POWER6 chips, it will allow for 2048 threads in the top end Power 595 system. Quite a jump from the current max of 128. Perhaps more importantly to most, the Power 520 entry level systems will now have a maximum of 64 threads versus the current 8. That is assuming they are not too expensive to offer in a low end machine. Don't let the list price fool ya: You will most likely need to pay a lot more to turn everything on and then there is OS licensing, etc.

POWER7 is the first chip to use eDRAM (embedded DRAM), which allows for a greater cache size with less transistors and also power savings compared to SRAM (typical processor cache). It is a bit slower but it seems to be worth it, especially because it allowed them to bring the cache onto the chip, which counteracted this side-effect. POWER6 had its L3 cache in separate chips on the MCM. POWER7 will have 32MB L3 cache per chip. For most people, this will translate to 64MB L3 cache per socket, which is by far the largest cache seen so far in any server. The design also has 32KB L1 instruction cache, 32KB L1 data cache, and 256KB L2 cache per core. If IBM used SRAM for the L3 cache instead of eDRAM, 32MB would have required a 2.7 billion transistor design (assuming you could), more than doubling the actual 1.2 billion for the complete POWER7.

There are now 12 execution units per core including the following:

  • 2 fixed-point units
  • 2 load/store units
  • 4 double-precision floating-point units
  • 1 vector unit (supporting VSX)
  • 1 decimal floating-point unit
  • 1 branch unit
  • 1 condition register unit

The pipelines have also been revamped to deal with the new execution units and increased thread count.

A feature missing from POWER6 but used in previous POWER chips, out-of-order execution, is being put back into POWER7. This is not only a performance boost but may allow some people stuck on POWER5 for software reasons to migrate to the POWER7 systems.

The Processors are directly connect as they were with POWER6. In fact, the POWER7 will actually be available as an upgrade for the Power 570 and Power 595 systems (with a few exceptions).

As for memory, POWER7 has two dual-channel DDR3 memory controllers per chip. IBM has stated that these controllers can sustain 100GB/s of bandwidth per chip. This would translate to an aggregated sustained rate of 6.4TB/s (8TB/s theoretical peak) of memory bandwidth.

These numbers aren't confirmed yet but they appear to be off the charts compared to previous generation systems from anyone. Unfortunately, it is also likely their total price tag is too.


IBM POWER7 System Announcement Imminent

Posted by: admin

Tagged in: Untagged 

According to recent reports such as here and here, POWER7 launch is coming up soon. It will be this coming Monday, February 7, 2010. IBM will probably roll the lineup out over time so don't expect to get a high-end system right away.

IBM has stated that the Power 570 and Power 595 will support upgrading POWER6 hardware to POWER7 (with a few exceptions). It is unlikely that the systems will support mixing processor generations though. Such mixing is only available on the Sun/Oracle mid-range and high-end systems such as the M4000, M5000, M8000 and M9000.

There also seems to be no plans for such upgrades on the low end. I will detail the POWER7 chip and systems in several posts to come, so stay tuned.

 

 

 


Software Matters in Appliances

Posted by: dan

Tagged in: Untagged 

The iSCSI provider has been rewritten for Solaris family and has considerably changed the performance. Detailed results can be seen in this blog entry, iSCSI before and after. Some of the performance gain is actually from a change in the processors and effective interprocessor communication but the software was previously the bottleneck. Here is a quick summary:

OLD:

Sun Storage 7410 (Barcelona, 2008 Q4 software)
311 MByte/s
37,056 IOPS (512-byte)

NEW:

Sun Storage 7410 (Istanbul, 2009 Q3+ software)
2.7 GByte/s
318,099 IOPS (512-byte)

 


Sun and Oracle Acquisition

Posted by: dan

Tagged in: Untagged 

Sun Lives!

Sun Microsystems is now a wholly owned subsidiary of Oracle Corporation. All of the current hardware is still being sold and all the older hardware continues the support it had from Sun. More importantly, it ensures Sun's viability. Some people had doubts due to money troubles but Oracle has put an end to that.

SPARC is actually getting a bigger investment and so is Solaris. Virtually every software product, even those that overlap with Oracle's past offerings, is being continued for the foreseeable future.

Good news all around for those who use Sun and the industry as a whole.

 

 

 


IBM Blue Waters Monstrous Bandwidth

Posted by: dan

Tagged in: Untagged 

The Blue Waters supercomputer being built by IBM is a game changer, especially in the area of bandwidth. It has 2+ million threads, 2PB of memory with 8PB/s of aggregate memory bandwidth, and over 32,000 2nd generation x16 PCIe slots with 640TB/s aggregate bandwidth.

I based this information on a article written by Timothy Prickett Morgan at The Register, and a little math about how the bandwidth in the switch/hub chip is dedicated. I'm sure there will be more to say when the system is actually installed in 2011.

Communication Bandwidth
per Switch/Hub Chip per drawer per supernode per Blue Waters (512 nodes)
192 GB/s of bandwidth into each Power7 MCM (what IBM called a host connection)
336 GB/s of connectivity to the seven other local nodes(MCMs) on the drawer
240 GB/s of bandwidth between the nodes in a four-drawer supernode

1920GB/s

320 GB/s dedicated to linking nodes to remote nodes 2560GB/s 10240GB/s  (10TB/s)
total external inter-node (not including PCIe cards) 4480GB/s
40 GB/s of general purpose I/O bandwidth (PCIe) 320GB/s 1280GB/s 655360GB/s (640TB/s)

Empty spaces in the chart are values which shouldn't be aggregated

Thread Count
4 SMT threads/core
8 cores/chip (32 threads)
4 chips/MCM (128 threads)
8 MCMs/drawer (1024 threads)
4 drawers/supernode (4096 threads)
512 supernode (2,097,152 threads)

Memory Bandwidth
128GB/s per chip
512GB/s per MCM
4TB/s per drawer
16TB/s per supernode
8PB/s
DIMM Count and Memory Size

Capacity DIMMs
per DIMM 8GB 1
per MCM 128GB 16
per drawer 1TB 128
per supernode 4TB 512
per Blue Waters 2PB 262144

PCIe
16 x16 PCIe2 per drawer (64 per supernode, 32,768 total)
1 x8 PCIe2 per drawer (4 per supernode, 2,048 total)


UltraSPARC T3 support in Opensolaris

Posted by: admin

Tagged in: sun microsystems

Opensolaris has gained support for the next SPARC processor which is likely to be named UtraSPARC T3. This processor extends the T-series processors to even more threads.  Unfortunately, what Sun has mention about the processor so far contradicts itself so reports vary wildly. 

There will be at least 128 threads and possibly 256 per processor, along with with multiprocessor support of at least 4 and possibly 8 processors.  At a minimum, the top-end system will support 512 threads in a 4U, double the current T5440's 256.  If there are 256 threads per processor and 8 processors, it will allow for 2,048 threads in a single mid-range SMP system.  Although there are other systems with this kind of thread count coming, they will have price tags in the millions and have size and power bills to match. 

However it comes out, it will be a wonderful consolidation box as it can be used to consolidate systems with a decent number of threads of its own.  Many systems, esspecially SPARC, have used many threads for years and thus didn't lend themselves to consolidation on previous boxes except perhaps the T5440

The chips will use at least four DDR3 controllers per chip allowing for huge DIMM counts and bandwidth.  The processors are likely to ship at much higher clock speeds than previous T-series chips.  Sun's target seems to be 2.1GHz. 

I will post more information on these systems as it becomes available.


De-duplication for Speed

Posted by: admin

Tagged in: sun microsystems

OpenSolaris has recently incorporated block level de-duplication into ZFS.  This allows for space savings when many duplicates of the same file or similar files exist in the same pool of storage.  Even if they are in different filesystems because ZFS uses a hierarchal file system, allowing multiple filesystems to share the same storage pool. 

A big advantage to this: If you try to write a block that already exists, the pool only needs to reference the block in the filesystem. This basically eliminates the disk writes necessary to store data that matches data already stored. For those who don't know, disk writes are often a huge performance bottleneck.  De-duplication also helps read speeds by eliminating redundant data blocks in the read cache, allowing you to maximize the use of your cache. 

Along with the ability to use SSD as read and write caches, you can achieve huge performance gains.  Don't think you can afford cache drives for hundreds of systems?  Well, consider consolidating storage across several systems using NFS, iSCSI, fibre channel or even InfiniBand.  ZFS has direct support for all of them and more.With the overlapping blocks from multiple systems being in cache much more frequently, you may find even better performance than you get from local disks.


RAID Failures: RAID Gets Drive-Hungry

Posted by: admin

Tagged in: Untagged 

There are various forms of RAID often referred to as levels.  This, however, is misleading because it gives an impression that higher "levels" are better.  Some may argue that higher levels offer more reliability but, truth be told, they are just different ways of allowing multiple devices (usually drives) to appear as one.

Most people who deal with a lot of drives are familiar with RAID5, which is described as striped set with distributed parity.  All the technical stuff aside, that means you can lose any single disk and the rest can rebuild the data that was contained on it.  All you have to give up is one drive worth of space.  When a drive fails, you just replace it and tell the device or software that is doing all this work for you to rebuild the information - possibly by just the act of physically replacing the drive.  You may even use a spare drive so it can be done completely automatically (recommended).

What's all this about?  Well, drives fail fairly often and a grouping of multiple drives fail even more often.  Recreating data on a drive takes time.  As drives get bigger, they take more time.  If you are actively using these disks, it takes even more time.  All the while you are stressing these disk to their maximum increasing their chance of failure.  To make things worse, basically all RAID setup will lose all your data if you have a failure during these rebuilds.  So what to do?
Enter RAID6, Striped set with dual-distributed parity.  This means you can lose any single disk during a drive rebuild and still have no problem  - and at the cost of only one extra drive worth of data compared to RAID5.  Of course, it begs the question: When, if not now, will we need a fix that allows even more failures?

Enter raidz3. Yup you guest it. It allows a single drive to fail when you are rebuilding two others.  Lucky for those of you who are sick of the repetition, that's the end of it ... for now.  Where do you get raidz3?  It is currently available as part of ZFS in OpenSolaris and in the Sun Storage 7000 Series. Coming to Solaris 10 soon. There are also raidz and raidz2 which are variations of RAID5 and RAID6, respectively.  I'm thinking the next addition to raidz technology (raidzx?) will allow for a variable number of failures.  If you feel the need for more now, you could setup a 5-way (or more) mirror with ZFS.  Of course, that would consume at least 4 out of every 5 of your drives, allowing a maximum data usage of 20% of total raw capacity at best.

The point of these double and triple parity protection schemes is to keep your volume/pool from failing during a rebuild, but in order to do that you will need hot spares.  One for every tolerable failure is probably a good place to start.  If not, you'll need at least one less than that (ex: 2 spares for raidz3).  Otherwise you'll be missing the point, because the larger number of parity drives won't be much, if any, better.  Now if you don't plan on regularly swapping out drives, and plan on using your array for a  significant amount of time, you will need even more spares.  If you have someone onsite constantly monitoring and replacing failures immediately in your systems, you may be able to get away with less or even no spares but I wouldn't bet my data on that.  Or someone else's for that matter.

As you can see, the parity schemes have an increasing demand for drive count.  Raidz3 cannot be done with less than 4 drives and by what I am suggesting you need at least 7.  Using 7, however, would only allow for one drive worth of space which isn't a particularly attractive option so I'd suggest more.  Since ZFS offers it, and there is often a huge capacity here you should probably, for performance sake, use at least 1 read cache drive and 2 mirrored drives for a write cache.  I'll talk more about ZFS cache drives another time but I mention them now because it's 3 more drives for a minimum recommended configuration.  Ten drive slots filled to represent one drive worth of data.  Of course drive usage actual improves with drive count for these schemes.  For example a X4500 has 48 internal drive slots which, if filled, could improve the previously mention 1 in 10 slot config to 39 out of 48. This would allow you to use approximately 81% of the total possible raw capacity.  Much better drive usage than 2-way(standard) mirroring which is always 50%.  Be careful how far you stretch this, though, or not only will you always have drives rebuilding but they may not finish in time and the pool of storage will fail.

Something I can't stress enough is that all this protection does not remove the need for backup!

There is a great deal more that I can say about this subject but I think that is more than enough for now.


Exadata V2: Affordable Database Scalability

Posted by: admin

Tagged in: Oracle

The newest database system from Oracle and Sun Microsystems is a beauty. It easily scales from one quarter rack to eight full racks of nodes with just cabling (and nodes of course).  If that's not enough for you, it can be expanded even further with larger switches, which are available from Sun. As for the upper-end configuration, an Oracle overview of Exadata storage stated that there is, "No practical limit to number of Cells that can be in the grid," referring to the storage nodes (explained below).  If a quarter rack is too much for you, the architecture allows the system to be run with a single pairing of one database node and one storage node, but you will need to add in switching when if you decided to expand.

Oracle and Sun are currently only offering this server with X4170 Database nodes and X4275 Storage nodes running Oracle Enterprise Linux.  There does not, however, seem to be anything limiting the use of other x86 systems and there are plenty in the Sun arsenal.  I also suspect that a Solaris x86 and Solaris SPARC versions are in the works if not already done.  These will allow Oracle to take advantage of more of Sun technologies such as CMT and ZFS.

The Exadata storage nodes are not just used as RAID arrays but actually as subsystems that do some of the database work.  They actively run queries on their portion of the compressed data, and return only relevant data to the database nodes.  This helps eliminate unnecessary interconnect traffic.  The interconnect is often a bottleneck of multi-node systems such as this, so any way of cutting down on traffic is most welcome.  They are also using QDR InfiniBand and four PCIe cards with flash cache to accelerate workloads.  Oracle has optimized it's database to use this flash, increasing its performance so much that they now allow OLTP and even mixed workloads.

OLTP in this type of system is quite distributive.  It has traditionally been handled by vertically scaled systems, which add more processors in a single node versus adding more nodes to handle a larger workload.  There are various technical details which made this necessary and congratulations to Oracle for overcoming them.  What's the big deal here?  Money.  Smaller nodes typically cost less per processor.  Lowering the total cost of system.

Please don't misread the fact that one specialize multi-node system can now do some work once reserved for large systems as this system intends to or even could replaces all large single systems.  Large systems still have many uses.  Even if they didn't, they could (and probably will) be used as direct replacements for the current nodes of the Exadata system when some workloads fail to scale in a usual way.

I expect to see a lot more packaging of Oracle software with Sun hardware in the not to distant future.

Call Toll-Free to Get a Quote: 1-888-486-4948