RUDN mathematicians taught multifactor data selection algorithm work up to eight times quicker

RUDN mathematicians accelerated algorithm of data aggregation by integral Sugeno. New algorithm application will enable to enhance the quality of given data selection and to reduce the computation time. The article is published in the magazine The International Journal of Intelligent Systems.
Selection and analysis of given data is widely used in sociology, economy, politics, neurophysiology, statistics, and other fields of experimental data processing and advanced hypothesis testing. The main task of selection and analysis of a given data algorithm is to select significant and independent from each other parameters from the whole range of given information and calculate the contribution of each to the final value of related (required) function. Such analysis and data aggregation when the parameters are mutually dependent is performed with the help of fuzzy measures. In practice this analysis is complicated because of the high number of parameters and no knowledge about which parameters are dependant and which are not. That is why the complexity of learning fuzzy measures from data raises exponentially with respect the number of its parameters. The second problem of learning fuzzy measures is that the respective optimization algorithm complexity increases quadratically with respect to the number of data.
In order to accelerate the speed of the learning algorithm, Gleb Beliakov and Dmitryi Divakov, employees of RUDN Applied Probability and Informatics Department decided to change the scenarios of data selection and modify the objective function so that the difficulty of its calculation increases linearly, not quadratically as in the original solution. The Sugeno integral was chosen as the computational model of the objective function. It is used to evaluate judgments defined as linguistic variables and present the outputs also on linguistic scale. The mathematicians adapted the regression analysis, the point of which is finding the most suitable set of parameters of the unknown function, to the Sugeno integral for subsequent fuzzy logic-based alternative selection and decision making.
Beliakov, Divakov, and their colleagues selected data close in its valuation and united them into pools, then substituting the pools by gross averages. After this selection of data they applied the algorithm PAVA to the pool breakers. The algorithm retained the accuracy of numerical solution and enabled to save computational time. Thus, the time for computation is reduced, but the accuracy of the work of algorithm is preserved.
Numerical calculations were conducted by the ordinal regression method. Numerical evaluation algorithm was realized in the programming language R. Programs in the languages C and C++ were written for the collection and the analysis of results.
The time of the central processing unit (CPU) work, necessary for convergence of the optimization algorithm, which characterizes the speed of calculations, was used as the criterion of assessing the effectiveness of objective function calculation. Comparison table of different algorithms' work with the different number of parameters (data vector size K) and different problem size n was built up on the basis of numerical experiments performed. The algorithm proposed by RUDN mathematicians enabled to accelerate the speed of CPU by 3-8 times at the considerable number of data vectors K equal to 1000. For small scale problems when K equals 10 and 100, the benefits of algorithm are not remarkable and bring calculations acceleration to the level 10-20%.
The article : Beliakov G, Divakov D. Fitting Sugeno integral for learning fuzzy measures using PAVA isotone regression.
Int J Intell Syst. 2019;1–9.
https://doi.org/10.1002/int.22172
Provided by RUDN University