ClusterVision's HPC cluster predicts accurate weather for Swedish NSC in Linköping
Almost a year has passed since the Swedish National Supercomputing Centre in Linköping (NSC) took delivery of a ClusterVision supercomputer to run weather forecasts and research code for the Swedish met office SMHI. Time to return to Linköping and interview Technical Director Niclas Andersson about his experience with the machine.
Q.: Can you tell us a bit about your centre?
Niclas Andersson: NSC, the National Supercomputing Centre at Linköping, is a supercomputing centre here in Sweden. For a large part we provide services to academic researchers throughout the country via the means of several systems. Even though it is called “National Supercomputing Centre”, it is not the only supercomputing centre in Sweden. This is just the name for one of the six public supercomputing centres we have in Sweden. We have been around for quite some time now, since 1989. Although we mainly provide services to academic users, we also have a partnership with the meteorological office in Sweden. This is the Swedish Meteorological and Hydrological Institute (SMHI). We provide SMHI with computation and storage on a large scale. We also have a partnership with SAAB, the Swedish aerospace and defence company, that among other things, builds the fighter jet, JAS 37. We have a number of additional partnerships, the largest of which is probably with CERN to provide data from their VLHC, the Large Hadron Collider in Geneva, to researchers all over the world.
There are about 30 people working at the centre. Half of them are working on keeping the systems up and running. Further, a number of them focus on pure research and others are dedicated to providing an increased level of service to users.
I am the technical director at NSC. I am mostly engaged with data operation and procurement of new systems. This includes the system that we are talking about now, delivered by ClusterVision. The system is called BiFrost and it is used entirely by SMHI. The cluster has been divided into two parts, not physically but logically. One part is for weather prediction (that part is called Frost). This is in collaboration with Norway. There are two systems: one in Sweden and one in Norway, which is located in Trondheim. The system in Trondheim, together with the Frost-half of BiFrost, collaborate in producing weather forecasts. Currently, it acts as an active backup setup, but that will soon change towards a more distributed manner of bringing the forecast results to the countries’ met offices. The other part of BiFrost is used for research (that part is called Bi). Bi is used for climate research, and the development of new climate and weather models.
The BiFrost system consists of 641 compute nodes, each with 16 cores. This ClusterVision system also made it to the June 2015 TOP500 list, where it is ranked in place 204. It was installed almost a year ago now. Delivery was taken just before Christmas last year. It is fully used and it is running well, delivering services to SMHI.
Q.: Can you tell us a little bit about what type of machine it is?
Niclas Andersson: It is a cluster computer system that ClusterVision designed based on ASUS servers. It includes Intel Haswell processors, with eight cores each. There are two processors in each compute node.
We purchased this system by means of a tender which followed EU procurement rules. The tender was aimed at best performance on two different benchmarks within a given budget, NEMO and Arome. The ClusterVision system won the procurement. To take our decision we focused mainly on the best performance from these two benchmarks which represent the dual use of the system. The interconnect is Intel True Scale InfiniBand QDR, at 40 Gbit/s speed. We do not have a full bisectional bandwidth in the cluster, because we do not see the need for that in nearly all cases. In the cluster there are, of course, login and system nodes. Storage was procured separately from the cluster.
Q.: If you go up a little bit higher in the system, what is the operating system?
Niclas Andersson: CentOS. We use CentOS in all our systems. We created our own software stack which is based on CentOS. Maintaining our own software stack simplifies our task of maintaining all the systems. We are running SLURM as the queueing system. For the applications we use the Intel compilers. Of course, we use other ones too, but the Intel tools are performing very well. Furthermore a number of scripts and programmes help glue, maintain, operate, and monitor in order to take care of all the bits and pieces that appear in a cluster. Many of these have been put together by ourselves.
Q.: Which languages and which parallel tools are being used for the applications?
Niclas Andersson: The compiler and editor are the most frequently used tools. Most codes are still written in Fortran, but C and C++ are also often used programming languages. There are of course other possibilities, such as Phyton and MATLAB with optimized libraries. The main focus is on longer runs for climate research, and weather prediction application codes. These are, of course, compiled programmes. The codes are developed in collaboration with a number of other countries. They are large applications that are continuously improved. SMHI is a member of the international research programme, HIRLAM (High Resolution Limited Area Model). Since 2011 HIRLAM is cooperating with the ALADIN consortium. HIRLAM and ALADIN are jointly developing the production code AROME from the HARMONIE (HIRLAM ALADIN Research on Meso-scale Operational NWP in Euromed) model. This code is used in the Nordic countries as well as in many European countries for weather forecasting. In addition there is regional adaptation of the code that is needed for each country. We are running other services as well.
For the research part, we use codes based on HIRLAM and NEMO. The main research activities out there are still based on the HIRLAM code, and we still use a number of FORTRAN based codes.
Q.: Is your system running in production for these applications?
Niclas Andersson: In production SMHI runs four forecasts each day. Each forecast lasts 48-hours. Since we do not have the computing capacity to perform a global model, we perform a regional model. Due to this we cannot predict very far into the future. This 48-hour forecasts should soon be enhanced to a 60-hour forecast. The longer ranging forecasts are taken from the European Centre for Medium-Range Weather Forecasts, ECMWF, in Reading, UK.
Q.: Are these performed by the researchers at SMHI or by researchers all over Sweden?
Niclas Andersson: Researchers connected to the SMHI I would say. At the weather forecast centre, there is a large research group that is not only doing the forecasts but that is also doing a considerable amount of climate research. Because SMHI is funding us to provide this service they can decide who runs on the system. Of course, we have other users on the system but the paying customer is SMHI. We do not let anyone else on the system unless SMHI says so.
Q.: The system has now been up and running for about a year?
Niclas Andersson: Yes, for about a year. Users were already accessing the cluster on the same day that we accepted it. The acceptance period lasted about two months in order to test the system, ensure it was stable, and so on. During that period we also allowed pilot users with test applications who analysed the performance. A soon as we informed ClusterVision of the system’s acceptance we were immediately able to allow users to start using it. The acceptance test was done in two parts: first, a functional test & acceptance test before Christmas and second, a stability test. We started delivering services to SMHI in February. At this point they started to use a fully validated production system.
Q.: What is the experience thus far with the machine?
Niclas Andersson: It is running very well. As with any system of this size, parts fail and have to be replaced. Our relation with ClusterVision is extremely good. ClusterVision was able to tailor their services offering to really match our skills and requirements. We keep spare parts on site, for fast exchange of memory, power supplies and such things, that tend to break more than other parts. There have been no substantial issues. There is a direct flight connection between Linköping and Amsterdam, so it is only a matter of hours, were there to be a need for one or more engineers from ClusterVision to come over urgently. Thus far, this has not been necessary. The system has run very well, and we got the performance that was promised in the offer. It is a success I would say, definitely.
Q.: Is there something about future developments that is interesting to tell?
Niclas Andersson: We have plans to run the system for at least four years. Usually it comes closer to 5 years. I told you about our cooperation with Trondheim. Until now, we have been leapfrogging our system updates with the system updates in Trondheim. Today it is Norway’s turn to upgrade their system. After they have upgraded their system it will be our turn again.
Recently, there have been advancements on a Memorandum-of-Understanding between all the Nordic countries that, together, want to start one Nordic Weather Centre. I do not know where the machines for that centre will be situated. It might be here, it might not. However, we have been running weather forecasts for SMHI since around 1996. Thanks to this experience, we have become quite good at it, and we have not lost a single simulation in all these years due to the the dual system setup. Let’s see what the future brings. It is not entirely in our hands.
For further information please visit: