A server system that enables real-time monitoring of the status and quality of network channels by large and small internet providers at a state, regional, or city scale.
Develop a new sustainable and high-performance monitoring system to process large amounts of data coming from 100,000 network channels at 5-minute intervals
Support sustainable operation with measuring probes by various manufacturers, such as Cisco, Metrotek, Eltex, and others
Provide real-time status of individual channels and the whole network to Internet providers at various levels through the convenient web and mobile interfaces
Design and develop the system to comply with international certificates and standards and be certified by government standardization and certification bodies of several European countries
Assemble a team of senior developers ready to begin work on a complex project quickly
Develop an architecture and create a system that could receive and analyze over 800,000 indicators at 5-minute intervals. The system should also adjust to increased data volume and new functions
Ensure a stable and fast system operation of the cluster comprising 12 server machines. In addition, the system must maintain efficient operation even in case of simultaneous shutdown of 50% of the cluster computing power
The previous version of the system did not allow receiving and processing the required amount of data at 5-minute intervals. Its capacity was up to 7,000 indicators within the specified time interval.
It was necessary to develop a new system that could process larger volumes of data.
Create a cluster of 12 powerful servers capable of a stable and efficient performance even with computing capacity reduced by 50%
Collect and transmit data from various probes to the server. The probes used a variety of protocols and ports and were produced by different manufacturers
Create a complex reporting system. Reports should contain data on collected indicators for user-defined time intervals on numerous network channels
Receive, record, store, analyze and provide a large number of parameters about the network channels' status in real time and on demand. This necessitated a move from a relational database to a specific DBMS designed to work with Big Data. Initially, we planned to use Postgres but had to switch it for Hadoop + Apache HBase and transfer the data, while preserving its integrity. The layout of data in Hadoop + Apache HBase differed from Postgres.
Establish a stable and quick system operation with large volumes of data in real time and on the frontend:
display the status of network channels on a geographic map;
generate graphs showing information about network channels in real time and estimate future status.
Errors committed and fixed
In the beginning, the Postgres DBMS was selected, which subsequently could not process large volumes of data in the allotted time. Relational DBMSs are not suitable for building this system since they do not allow fast and stable handling of the large volumes of data at short time intervals. We needed to move to a new DBMS - Hadoop + Apache HBase.
User registration according to their roles in the system, automatic assignment of access rights and levels of data display
Internal system messages and SMS/email notifications that inform providers of problems through specific channels and devices
Predicting the channel performance indicators
Continuous integrity testing of the packets sent and received during standard operation and at peak loads
Automated data analysis across 100,000 channels before adding it to the DBMS
Graph display in real time, demonstrating the channels current status and color-coded problems
Real-time display of current network channel status across the map
Creating and adjusting new infrastructure, connecting new channels and equipment
Report generation on the status of network channels within a period (day, month, year, or an arbitrary period specified by the user):
Maven, Clojure automatic test system, PMD, Subversion
TeamCity, JBoss, Python
Currently, the system is actively used by large Internet providers and is being complemented by new features.
A new version of the monitoring system was developed, which effectively processes over 800,000 parameters updating every 5 minutes, and can withstand a further load increase
Our team developed special modules for rapid acquisition of measurable metrics from all system probes designed by various manufacturers
Convenient web and mobile interfaces were developed to provide access to network channel status for Internet providers of various levels
The developed system successfully achieved international certification by the government bodies
Within five days we have provided a team of 8 senior developers which immediately started working on the project
A unique modular architecture was developed to ensure the system's efficiency and a considerable modernization margin. Estimated data processing capacity is 800,000 metrics every 5 minutes. The app passed load testing with 1 million metric values at 5-minute intervals. Currently, the system processes 240,000 parameters every 5 minutes for the client's largest customer
THrough the use of hazelcast, jgroups, zookeeper, we ensured the application's stable operation on a cluster of 6 servers without performance losses
New technology is not always better than the old one. Developing the web interface of the application with JSP, which is an older technology compared to JSF, would have taken less time. The interface would have worked no worse, and the number of bugs in the development process would have been much less.
Time to completion
1 year 6 months
Web application development;
Frontend and backend development;
8 Senior Full Stack developers (plus 1 tech.lead)
STAGES OF WORKING WITH THE CLIENT
Carefully studied the client's requirements
Generated the project's step-by-step implementation plan
Agreed on a plan with the client
Implemented the plan
Regularly interacted with the client, provided status updates and interim demos
Implemented the client's newly occurring requests and edits
Provided the client with a ready-to-use solution
Get a free consult on the project's current performance and its improvement possibilities