mm c2021Mathematical Methods in Economics 39th International Conference on Mathematical Methods in Economics Faculty of Economics and Management Czech University of Life Sciences Prague 8th - 10th September 2021 Conference Proceedings Czech University Of Life Sciences Prague Faculty of Economics and Management Czech University of Life Sciences Prague Faculty of Economics and Management Proceedings of the 39t h International Conference on Mathematical Methods in Economics !¥I!VIE2021Mathematical Methods in Economics 8t h - 10t h September 2021 Prague, Czech Republic, EU Editor: Robert Hlavatý Cover: Jiří Fejfar Technical editors: Jiří Fejfar, Michal Hruška Publisher: Czech University of Life Sciences Prague Kamýcká 129, Prague 6, Czech Republic Publication is not a subject of language check. Papers are sorted by authors' names in alphabetical order. All papers passed a peer review process. © Czech University of Life Sciences Prague © Authors of papers ISBN 978-80-213-3126-6 Programme Committee doc. RNDr. Ing. Miloš Kopa, Ph.D. President of the Czech Society for Operations Research Charles University, Faculty of Mathematics and Physics prof. Dr. Ing. Miroslav Plevný Vice-president of the Czech Society for Operations Research University of Economics in Prague, Faculty of Informatics and Statistics prof. RNDr. Helena Brožová, CSc. Czech University of Life Science in Prague, Faculty of Economics and Management prof. Ing. Mgr. Martin Dlouhý, Dr., MSc. University of Economics in Prague, Faculty of Informatics and Statistics doc. Ing. Jan Fabry, Ph.D. University of Economics in Prague, Faculty of Informatics and Statistics prof. RNDr. Ing. Petr Fiala, CSc, MBA University of Economics in Prague, Faculty of Informatics and Statistics prof. Ing. Jana Hančlová, CSc. Technical University of Ostrava, Faculty of Economics prof. Ing. Josef Jablonský, CSc. University of Economics in Prague, Faculty of Informatics and Statistics doc. RNDr. Jana Klicnarová, Ph.D. University of South Bohemia, Faculty of Economics Ing. František Koblasa, Ph.D. Technical University of Liberec, Faculty of Mechanical Engineering prof. RNDr. Jan Pelikán, CSc. University of Economics in Prague, Faculty of Informatics and Statistics prof. RNDr. Jaroslav Ramík, CSc. Silesian University in Opava, School of Business Administration in Karviná Ing. Karel Sladký, CSc. Academy of Sciences of the Czech Republic, Institute of Information Theory andAutomation doc. Ing. Tomáš Šubrt, Ph.D. Czech University of Life Science in Prague, Faculty of Economics and Management Ing. Miroslav Vavroušek, Ph.D. Technical University of Liberec, Faculty of Mechanical Engineering prof. RNDr. Milan Vlach, DrSc. Charles University in Prague, Faculty of Mathematics and Physics The Kyoto College of Graduate Studies for Informatics prof. RNDr. Karel Zimmermann, DrSc. Charles University in Prague, Faculty of Mathematics and Physics prof. Ing. Miroslav Žižka, Ph.D. Technical University of Liberec, Faculty of Economicss Organisation Committee Ing. Robert Hlavatý, Ph.D. (chair) Czech University of Life Science in Prague, Faculty of Economics and Management Ing. Jiří Fejfar, Ph.D. Czech University of Life Science in Prague, Faculty of Economics and Management Ing. Martina Housková Beránková, Ph.D. Czech University of Life Science in Prague, Faculty of Economics and Management Ing. Igor Krejčí, Ph.D. Czech University of Life Science in Prague, Faculty of Economics and Management Ing. Tereza Jedlanová Czech University of Life Science in Prague, Faculty of Economics and Management Ing. Tereza Sedlářova Nehézová Czech University of Life Science in Prague, Faculty of Economics and Management Technical Editors Ing. Jiří Fejfar, Ph.D. Czech University of Life Science in Prague, Faculty of Economics and Management Ing. Michal Hruška, Ph.D. Czech University of Life Science in Prague, Faculty of Engineering Foreword It is a pleasure to present you the Proceedings of the 39th International Conference on Mathematical Methods in Economics - MME2021. The conference was held by the Czech University of Life Sciences Prague (CZU), Faculty of Economics and under the auspices of the Czech Society for Operational Research on September 8-10th 2021. The conference hosted nearly 120 participants, both onsite and online. The proceedings contain 89 reviewed contributions. The conference MME is a traditional event that brings together researchers and practitioners in the field of Operations research and Econometrics, and it returned to CZU after 12 years has passed since being organised here. It is not hard to notice that many things have changed since then - the number of contributions has nearly doubled, the sessions have become online, and the old research questions have turned into new research questions. Fortunately, the most important aspect - the social role of the conference - has remained. It is still an event that represents an opportunity for the best experts in the field to meet and spend some time together. The year 2021 was marked by issues and challenges where the mathematical methods will play an important role: the pandemic, the new green deal and the turbulent weather no less. On this 39th birthday of the conference, let us wish this event more successful years and ideas that will contribute to dealing with those challenges. Finally, let me express my sincere thanks to the members of the programme committee for securing the smooth review process and the members of the organising committee whose effort and hard work have made this event possible. September 2021 Robert Hlavatý Table of Contents The competitiveness of V4 countries in the context of EU member states Markéta Adamová, Jana Klicnarová, Nikola Soukupová 6 Economic Policy Uncertainty and Stock Markets Co-movements Peter Albrecht, Svatopluk Kapounek, Zuzana Kučerová 12 Optimal consumption with irreversible investment in the context of the Ramsey model Nastaran Ansari, Adriaan Van Zon, Olaf Sleijpen 18 Impact of incorporating and tailoring PRINCE2 into the project-oriented environment Jan Bartoška, Jan Rydval, Tereza Jedlanová 24 Assessment of personal ambiguity attitude in a series of online experiments Simona Bažantová, Vladislav Bína, Václav Kratochvíl, Klára Šimůnková 30 An original two-index model of the multi-depot vehicle routing problem Zuzana Borčinová, Štefan Peško 36 Portfolio selection via a dynamic moving mean-variance model Adam Borovička 42 A shadow utility of portfolios efficient with respect to the second order stochastic dominance Martin Branda 48 Efficient Values of Selected Factors of DMUs Production Structure Using Particular DEA Model Helena Brožová, Milan Vlach 54 On the measurement of risk of some cointegrated trading strategies Michal Černý, Vladimír Holý, Petra Tomanová, Lucie Beranová 60 Statistical Analysis of ICT Utilization in Marketing Andrea Čížků 66 On the crossing numbers of join of one graph on six vertices with path using cyclic permutation Emilia Draženská 72 Efficiency of Credit Risk Management and Their Determinants in Central European Banking Industries Xiaoshan Feng 83 Models of Technology Coordination Petr Fiala, Renata Majovská 89 The Impact of Covid-19 on Mutual Relations of Czech Macro-aggregates: Effect of Structural Changes Jakub Fischer, Kristýna Vltavská 95 Productivity analysis in the Mexican food industry Martin Flégl, Carlos Alberto Jimenez Bandala, Isaac Sánchez-Juárez, Edgar Matus 100 Evaluation and testing of non-nested specifications of spatial econometric models Tomáš Formánek 106 The link between DEA efficiency and return to assets Lukáš Frýd, Ondřej Sokol 112 The geography of most cited scientific publications: Mixed Geographically Weighted Regression approach Andrea Furková 117 Bilevel Linear Programming under Interval Uncertainty Elif Garajová, Miroslav Rada, Milan Hladík 123 An Efficiency Comparison of the Life Insurance Industry in the Selected OECD Countries with Three-Stage DEA Model Biwei Guan 129 Determination of Wages in Forestry Depending on the Occurrence of Natural Disasters David Hampel, Lenka Viskotová, Antonín Marti nik 135 Efficiency evaluation of the health care system during COVID-19 pandemic in districts of the Czech Republic Jana Hančlová, Lucie Chytilová 141 Analysis of uneven distribution of diseases COVID - 19 in the Czech Republic Jakub Hanousek 147 Determinants of company indebtedness in the construction industry Jana Heckenbergerová, Irena Honková, Alena Kladivová 155 Robust Slater's Condition in an Uncertain Environment Milan Hladík 161 Sensitivity of small-scale beef cattle farm's profit under conditions of natural turnover Robert Hlavatý, Igor Krejčí 167 The Use of Genetic Algorithm in Clustering of ARMA Time Series Vladimír Holý, Ondřej Sokol 173 Enumerative Core of Polya's Theorems on Random Walk and the Role of Generating Functions in their Proofs Richard Horský 179 Numerical Valuation of the Investment Project with Expansion Options Based on the PDE Approach Jiří Hozman, Tomáš Tichý 185 Housing Submarkets: The case of the Prague Housing Markets Petr Hrobař, Vladimír Holý 191 Forecasting Czech unemployment rate using dimensional reduction approach Filip Hron, Lukáš Frýd 197 Health index for the Czech districts calculated via methods of multicriteria evaluation of alternatives Dana Hůbělová, Beatrice-Elena Chromková Manea, Alice Kozumplíková, Martina Kuncová, Hana Vojáčkova 202 Modelling of PX Stock Returns during Calm and Crisis Periods: A Markov Switching Approach Michaela Chocholatá 208 The Performance Assessment of Different Types of Investment Funds Using Markowitz Portfolio Theory Zuzana Chvátalova, Oldřich Trenz, Jitka Sládková 214 SBM models in data envelopment analysis: A comparative study Josef Jablonský 220 Swap Heuristics for Emergency System Design with Multiple Facility Location Jaroslav Janáček, Marek Kvet 226 The minimal network of hospitals in terms of transportation accessibility Ludmila Jánošíkova, Peter Jankovič 232 Multifractal approaches in econometrics and fractal-inspired robust regression Jan Kalina 238 LTPD variables inspection plans and effect of wrong process average estimates Nikola Kaspříková 244 Flexible Job Shop Schedule generation in Evolution Algorithm with Differential Evolution hybridisation František Koblasa, Miroslav Vavroušek 249 Distortion risk measures in portfolio optimization Miloš Kopa, Juraj Zelman 255 The goal programming approach to investment portfolio selection during the COVID-19 pandemic Donata Kopaňska-Bródka, Renata Dudziňska-Baryta, Ewa Michalska 261 Robust First Order Stochastic Dominance in Portfolio Optimization Karel Kozmík 269 System Dynamic Model of Beehive Trophic Activity Kratochvílová Hana, Rydval Jan, Bartoška Jan, Chamrada Daniel 275 An Analysis of Dependence between German and V4 Countries Stock Market Radmila Krkošková 281 Evaluation of the Construction Sector: A Data Envelopment Analysis Approach Markéta Křetínská, Michaela Staňková 287 The path-relinking based search in unit lattice of m-dimensional simplex Marek Kvet, Jaroslav Janáček 293 Portfolio discount factor evaluated by oriented fuzzy numbers Anna tyczkowska-Hančkowiak, Krzysztof Piasecki 299 Comparing TV advertisement in the year 2019 using DEA models Jan Malý, Petra Zýková 305 Efficiency of tertiary education in EU countries Klára Mašková, Veronika Blašková 312 Application of robust efficiency evaluation method on the Czech life and non-life insurance markets Markéta Matulová, Lucia Kubincová 318 Stochastic reference point in the evaluation of risky decision alternatives Ewa Michalska, Renata Dudziňska-Baryta 325 Structure of the threshold digraphs of convex and concave Monge matrices in fuzzy algebra Monika Molnárová 331 Weak Solvability of Max-plus Matrix Equations Helena Myšková 337 The Impact of Technical Analysis and Stochastic Dominance Rules in Portfolio Process David Neděla 343 Information Retrieval System for IT Service Desk for Production Line Workers Dana Nejedlová, Michal Dostál 349 Permanent Income Hypothesis with the Aspect of Crises. Case of V4 Economies Václava Pánková 355 Multi Vehicle Routing Problem Depending on Vehicle Load Juraj Pekar, Zuzana Čičková, Ivan Březina 359 Using Parametric Resampling in Process of Portfolio Optimization Juraj Pekar, Mario Pčolár 364 Optimal routing order-pickers in a warehouse Jan Pelikán 370 Performance of the CLUTEX Cluster Applying the DEA Window Analysis Natalie Pelloneová 375 New models for a return bus scheduling problem Štefan Peško, Stanislav Palúch, Tomáš Majer 381 Different ways of extending order scales dedicated to credit risk assessment Krzysztof Piasecki, Aleksandra Wójcicka-Wójtowicz 387 Interval two-sided (max, min)-linear equations Ján Plavka 393 Forecasting of agrarian commodity prices by time series methods Alena Pozdílková, Jaromír Zahrádka, Jaroslav Marek 399 Discrete Time Optimal Control Problems with Infinite Horizon Pavel Pražák, Kateřina Frončková 405 Bankruptcy Problem Under Uncertainty of Claims and Estate Jaroslav Ramík, Milan Vlach 411 Searching for a Unique Good Using Imperfect Comparisons David M. Ramsey 417 Dealing with uncertainty by Fuzzy evaluation and robust optimization Tereza Sedlářova Nehézová, Michal Škoda, Helena Brožová 423 The System Dynamics approach to creation of a recovery model of an urban object Anna Selivanova 429 Fuzzy discount factor parametrized by logarithmic return rate Joanna Siwek, Krzysztof Piaseck 435 Consumption Expenditures and Demands of Ageing Population Jaroslav Sixta, Jakub Fischer 440 Central Moments and Risk-Sensitive Optimality in Markov Reward Processes Karel Sladký 446 Construction and optimization of HFT systems with the use of binary-temporal state model Míchat Dominik Stasiak 452 Possibilistic median of a fuzzy number Jan Stoklasa, Pasi Luukka 458 Asymmetric Transmission of Crude Oil Prices to Retail Gasoline and Diesel Prices in U.S. Market Karol Szomolányi, Martin Lukáčik, Adriana Lukáčiková 464 DEA Window Analysis of Engineering Industry Performance in the Czech Republic Eva Štichhauerová, Miroslav Žižka 469 Work Contour Models in Projects Tomáš Šubrt, Jan Bartoška, Daniel Chamrada 475 Visualising HR data concerning the performance of university staff - career development assessment and assistance perspective Tomáš Talášek, Jana Stoklasová, Jan Stoklasa, Jana Talašová 481 Combined Time Coordination of Connections in Public Transport Dušan Teichmann, Michal Dorda, Denisa Mocková, Pavel Edvard Vančura, Vojtěch Graf, Ivana Olivková 487 Measuring Efficiency of Football Clubs: DEA approach Michal Tomíček, Natalie Pelloneová 493 Analysis of profitability of cryptocurrencies trading strategy based on technical analysis Quang Van Tran, Jaromír Kukal 499 GLMM Based Segmentation of Czech Households Using the EU-SILC Database Jan Vávra 505 State-space modeling of claims reserves in non-life insurance Petr Vej mělká 511 Forecasting third party insurance: A comparison between Random forest and Generalized Additive Model Lukáš Veverka 517 Efficiency Verifications of Classical Portfolio Optimization Models Anlan Wang 523 Scoring Applications in Early Collection Jiří Witzany, Anastasiia Kozina 529 On Modelling Dependencies between Criteria in PROMÉTHEE František Zapletal 535 Consumer Dividend Aristocrats: Dynamic DEA Approach Petra Zýková 541 The competitiveness of V4 countries in the context of EU member states Markéta Adamová1 , Jana Klicnarová2 , Nikola Soukupová3 Abstract. Even though there is no generally accepted definition and understanding of the concept of competitiveness, this issue is currently of interest to many economic analyses as a basic measurement for countries' macroeconomic performance. The measurement of efficiency is subject to different methods and data operationalization, but in the European Union, the competitiveness concept has not been uniquely defined yet. In our previous research, we provided an analysis of the efficiency of the European Union (EU) Member states using the Data Envelopment Analysis ( D E A ) and Malmquist index of the efficiency of all EU-27 Member States during the period 2013-2019 given the total unemployment rate, general gross government debt, gross capital formation and G D P per capita as macroeconomic indicators. This paper provides an analysis of the position of V 4 countries within the evaluation of the E U Member states with aims to analyze the efficiency progress of Visegrád countries (V4) in comparison with E U member states. Keywords: competitiveness, E U member states, economic efficiency, D E A , Malmquist index J E L Classification: C61, R l l , R15 A M S Classification: 91B82 1 Introduction The concept of competitiveness is currently of interest to economic theorists' attention as a basic measurement for countries' macroeconomic performance, even though there is no generally accepted definition, understanding and measurement system of this concept due to its complexity and different perceptions [8, 21]. According to [22], as a mirror of competitiveness could be understood the concept of economic efficiency as a commonly applied instrument to help identify the strengths and weaknesses of the evaluated states. The measurement of economic efficiency, which closely related to the use of resources in the economy, is subject to different methods and data operationalization, but in the European Union, the concept has not been uniquely defined yet, although the E U Member States efficiency is the source of national competitiveness [1,4]. The interest in measuring the efficiency of states has led to the development of different methods and data operationalization applied to the evaluation of efficiency. A frequently used research method is Data Envelopment Analysis (DEA) method or Malmquist index productivity because D E A is suitable for determining efficiency of units that are comparable to each other, e. g. selected macroeconomic indicators of countries [1,4, 22]. Table 1 shows researches where the efficiency of states is evaluated by the data envelopment analysis (DEA) method. Authors Year Main findings Fare et al. [7] 1994 Innovation contributes to competitiveness growth more than improvements. Martic and Savic [13] 2000 Only 17 regions in the E U can be classified as effective by D E A . Staníčkova and Melecký [22] 2016 The best efficiency changes in competitiveness were achieved by N U T S 2 regions belonging to the group of the Visegrad countries. Moutinho, Madaleno and Robaina [18] 2017 Resources productivity shows a positive and significant influence independently of the country technical and eco-efficiency level. 1 University of South Bohemia in České Budějovice, Department of Management, Studentská 13, 370 05 České Budějovice, adamovam@efjcu.cz. 2 University of South Bohemia in České Budějovice, Department of Applied Mathematics and Informatics, Studentská 13, 370 05 České Budějovice, janaklic@ef.jcu.cz. 3 University of South Bohemia in České Budějovice, Department of Management, Studentská 13, 370 05 České Budějovice, nsoukupova@ef.jcu.cz. 6 Authors Year Main findings Lozowicka [11] 2020 In inefficient countries can be identified weak areas and indicate the action that should be taken to improve their efficiency. Stankovic, Marjanovic and Stojkovic [23] 2021 26 out of 28 E U members state do not achieve satisfactory levels of socio-economic efficiency. Table 1 D E A method used for evaluation of efficiency of states Source: own processing In the E U there are still significant economic, social, and territorial disparities that lead to cohesion concerns in the expanding and further integrating E U [24]. After the second world war, western European countries had to undergo turbulent times. In countries of central Europe, there was communism which affects its national economies [20]. Many researchers define the term new E U countries (post-2004) and old/traditional (pre-2004) E U countries, e. g. [3, 20, 17, 15]. Between old E U countries Austria, Belgium, Denmark, Finland, France, Germany, Greece, Ireland, Italy, Luxembourg, Netherlands, Portugal, Spain, Sweden belong. A s new E U countries are labeled Bulgaria, Cyprus, Estonia, Latvia, Lithuania, Malta, Slovenia, Romania, Croatia, and all of the V 4 countries [26]. According to [3], the present competitive advantage of new E U member states is a low-cost-based economy which is attractive for foreign investors. V 4 countries belong to the group of transition countries [12]. The V 4 group consists of four Central European countries - Slovakia, Czechia, Hungary, and Poland. V 4 countries have a common history, culture, and geographical position [2]. This may lead to the idea that they are comparable. According to [17], the V 4 countries have undergone rapid growth accompanied by restructuring and modernization. Based on the research of [9] within the V 4 countries the Czechia is the most successful country in terms of indicators (reference years 2010 -2014). The differences in the economic performance of V 4 countries are being narrowed. According to [14] "development in V4 countries has a trend towards advanced countries, such as Austria and Germany. There was a growth in their performance, increasing trend in effective use of their advantages and improve in competitive position." This paper provides an analysis of the position of V 4 countries within the evaluation of the E U Member states with aims to analyze the efficiency progress of Visegrad countries (V4) in comparison with E U member states. 2 Materials and methods In our paper [10] we have analyzed the E U countries according to values of G D P per capita, Gross Capital Formation, General Government Debt, and Unemployment rate between 2013 and 2019. We applied Data Envelopment Analysis to identify countries with optimal values under all these criteria and used the Malmquist index (MI) to measure progress in this evaluation. If we focus on V 4 countries, we can see from results D E A that only Czechia belongs to states with an optimal combination of outputs and inputs, in D E A language we called these states efficient. The position of Czechia might be influenced by the low level of the unemployment rate (it was oscillating around 2%) and decreasing level of government general gross debt (Czechia dept is decreasing from 2013) [6]. Results of a D E A are based only on data from the chosen year, therefore, it could be supplemented by the Malmquist index to add an overview of development over time (2013-2019). However, if we study a M I based on data from 2013 and 2019, we can see that all V 4 countries achieved quite good results, their position is in the first half within the order of E U member states. The Czech Republic took 2nd place (5.07), Hungary took 3rd place (4.2), Poland ranked 9th place similar to Slovakia (11th place, score 2.42). The results follow the idea that there is a different dynamic between old and new E U member states DMUs MI DMUs MI DMUs MI DMUs MI Ireland 5.29 Netherlands 2.69 Belgium 1.86 Luxembourg 1.34 Czechia 5.07 Poland 2.58 Denmark 1.83 Sweden 1.32 Hungary 4.2 Malta 2.49 Germany 1.77 Spain 1.26 Croatia 3.44 Slovakia 2.42 Latvia 1.75 Bulgaria 1.22 7 DMUs MI DMUs MI DMUs MI DMUs MI Portugal 3.34 Romania 2.28 Estonia 1.75 Italy 1.15 Slovenia 3.27 Lithuania 1.96 Finland 1.52 France 1.04 Cyprus 2.95 Greece 1.87 Austria 1.38 Table 2 Malmquist Index Score of D M U s Source: own processing Now, our aim is to focus in more detail on the progress of V 4 countries in chosen parameters in comparison to other E U countries. For our analyses we have used Unemployment Rate, General Government Debt, Gross Domestic Product per capita (hereinafter GDP), and Gross Capital Formation (hereinafter GCF), all of the indicators were I L O modeled estimates to ensure comparability across countries and over time. The data were obtained from the World Bank [25] and the International Monetary Fund and the reference period was chosen 2013-2019 due to the elimination of the effect of the financial crisis and C O V I D pandemic situation. For D E A we applied Unemployment Rate, General Government Debt as proxies for inputs and Gross Domestic Product per capita, and Gross Capital Formation as proxies for outputs. To compare the progress of countries under these parameters. First, it was necessary to standardize data to get comparable values under different variables. Since we use cluster analysis with Euclidian metrics, we normalize data into vectors with unit $L_2$-norms (under variables, i.e. the vector of G D P per capita in 2013 has a unit norm, for example). Our aim is to compare all values and progresses in all factors, therefore we applied cluster on all normalized data - values of all variables in all studied years - i.e. we used values of G D P per capita, G C F , Unemployment rate, and Debt from 2013 to 2019. In the following graphs, we can compare original and normalized data. We can see that the order of countries' values for each year stays the same, however, the proposition of the positions can change, it is given by the normalization. 13 14 15 16 17 18 19 13 14 15 16 17 IS 19 • B u l g a r i a • C r o a t i a m C z e c h i a — • — B u l g a r i a C r o a t i a — • — C z e c h i a • F i n l a n d • G e r m a n y • G r e e c e • F i n l a n d • G e r m a n y • G r e e c e Graph 1 Comparison of original and standardized data (on the example of Unemployment Rate) Source: own processing First, we applied Hierarchical clustering to get an idea about the position of V 4 countries and then we ran a kmeans analysis, to discuss the properties of clusters in more detail. 3 Results The hierarchical clustering was used to illustrate the clustering of E U countries, see graph 2. First, the old E U countries come together, except Germany and France, which form a separate cluster as Italy and Spain. This fact is due to the high level of gross capital formation (GCF) compared to other states. Luxembourg forms a separate cluster due to its high G D P per capita compared to others E U states. Greece also forms a separate cluster, but due to the high level of debt. The new countries of the European Union tend to come together. The hierarchical clustering shows that the positions and developments of V 4 countries are quite similar. 8 Clustered EU countries Variables: C D P per capita, G C F , Debt, Unempl. rate 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Distance (Euclidian metric) Graph 2 Clustered E U countries Source: own processing To analyze the similarities and differences among countries in detail, we applied k-means cluster analysis, according to results in hierarchical methods, we decided for the choice of 8 clusters, the division into clusters, we can see in the following table 3. Cluster n. 1 Austria, Belgium, Ireland Cluster n. 2 Denmark, Finland, Netherlands, Sweden Cluster n. 3 Croatia, Cyprus, Portugal Cluster n. 4 Luxembourg Cluster n. 5 Bulgaria, Czechia, Estonia, Hungary, Latvia, Lithuania, Malta, Poland, Romania, Slovak Republic, Slovenia Cluster n. 6 Italy, Spain Cluster n. 7 Greece Cluster n. 8 France, Germany Table 3 Clusters of E U states Source: own processing As the results show, the division into old and new states of the European Union is still evident - e.g. Austria, Belgium, Denmark, Finland, Netherlands, Sweden form cluster n. 1 and n. 2, Italy and Spain together form cluster n. 6 or France and Germany together form cluster n. 8. Cluster number 5 consists of 11 cases including V 4 countries. A l l these cases belong to new E U member states. Traditional member states create clusters together (instead of Portugal, which is with Malta and Cyprus). It supports the idea that differences between the traditional and new E U member states may be influenced by their position in a global economy. The new member states are attractive for multinational companies due to lower labor and production expenditures [15]. Even new members attract companies by tax competition [19]. According to [16], the new E U member states are approaching to results of the old member states (reference years 1996-2017, indicator GDPP), different in the speed of moving. In all E U member states, the unemployment rate between 2013 - 2019 was decreasing. The position of V 4 countries (the reference year 2016) was under the average of E U (6,7%) and Euro area (6,9). The Czechia had the lowest rate of the whole E U (2%). Poland and Hungary unemployment rates were oscillating around 3%. Slovakia was placed in the order of E U members in the second half with 5,8% (Eurostat, 2019). G D P P had an increasing trend in all E U countries between 2013 and 2019. The highest growth was recorded in Czechia from V 4 countries. A similar improvement had Poland and Hungary and Slovakia placed last position [5, 6, 25]. The less indebted country from V 4 is Czechia. The highest debt from V 4 countries has Hungary, even Hungary exceeds the limit from the Maastricht agreement (60% of GDP). A l l V 4 countries have general gross government debt under the E U average [6]. We can see that all V 4 countries are in one cluster. To study the specification of individual clusters, see graph 3, where the values of clusters' centroids are displayed. istna 'laiimi N e t r l Ä JTiilaiid Bulgaria , . .Olivia thuania epupfic epublic liuuua oland TV T Germany Lirxerripourg Greece 9 Clusters' Centroids 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 -0.1 -0.2 D e b t 2 0 1 5 U n e m p l . R. 2 0 1 4 G C F 2 0 1 3 G C F 2 0 1 9 G D P P P 201 { D e b t 2 0 1 8 Unempl. R. 2 0 1 7 G C F 2 0 1 6 G D P P P 2 0 1 5 Variables Cluster n. 1 C l u s t e r n . 2 Cluster n Cluster n Cluster n Cluster n Cluster n Cluster n. 8 Graph 3 Clusters' centroids Source: own processing In more detail, graph 3 shows centroids of each cluster. Therefore, we can see that countries of cluster n. 5 have one of the lowest debts. Moreover, their debt is relative compared to the debt of other countries, decreasing. The same for the unemployment rate. On the other hand, also G D P per capita and G C F are the lowest ones (in G D P , only Greece (cluster n. 7) is worse than the centroid of cluster n. 5). If we run k-means analysis for 9 clusters, it divides cluster number 5 into two separate clusters, the first cluster would consist of the Czech Republic, Hungary, Romania, Malta, Poland, and Slovenia and the second cluster from Baltic countries together with Slovakia and Bulgaria. The first cluster has a better level in macroeconomic indicators like the unemployment rate, G D P per capita and gross capital formation than the second cluster. In contrast, the second cluster has a better level of general government gross debt. 4 Conclusions The V 4 countries reach worse values in G D P and G C F than old E U member states. But both indicators are relatively (in comparison with other E U countries) increasing within reference years in V 4 countries. Their good results are influenced by a low indebtedness and a low unemployment rate under the E U average. In the next years, these good results could be affected by the economic situation accompanying the pandemic situation of C O V I D - 19. The position of V 4 countries is in a cluster with the lowest debt, even relatively decreasing, the same unemployment rate is low and it is decreasing. V 4 countries have many similarities due to the common history, the best position from them takes place Czechia. To compare with E U member states, V 4 countries have the lowest employment rate and low level of indebted (except Hungary). The V 4 countries are approaching the results of the old member states what complies with the findings of [16]. V 4 countries are leaders of new E U member states and have become an important source of improving the international competitiveness of the largest economy in the E U . Acknowledgements This contribution was supported by research grant G A J U No. 121/2020/S "Principles of circular economics in regional management leading to increased efficiency of systems". References [1] Anderson, H . J., Stejskal, J. (2019). Diffusion efficiency of innovation among E U member states: a data envelopment analysis. Economies, 7(2), 34. 10 [2] Bacik, R., Kloudova, J., Gonos, J., & Ivánkova, V . (2019). Management of Competitiveness and Economic Performance Based in the V 4 countries. [3] Civelek, M . , Ključnikov, A., Krajčík, V., Žufan, J. (2019) The Importance of Discount Rate and Trustfulness of A Local Currency for the Development of Local Tourism. Journal of Tourism and Services, 10(19), 77- 92, https://doi.org/10.29036/jots.vl0il9.117. [4] Cook, W . D., Seiford, L . M . (2009). Data envelopment analysis (DEA)-Thirty years on. European journal of operational research, 192(1), 1-17. [5] Eurostat. (2019). Statistics explained: Unemployment by sex and age - annual data. Available at: https://ec.europa.eu/eurostat/statistics-explained/index.php?title=Unemployment statistics and beyond [6] Eurostat. (2020). Main page. Available at: https://ec.europa.eu/eurostat/web/main/home [7] Fare, R., Grosskopf, S., Norris, M . , & Zhang, Z . (1994). Productivity growth, technical progress, and efficiency change in industrialized countries. The American economic review,66-83. [8] Hermoso-Orzáez, M . J., García-Alguacil, M . , Terrados-Cepeda, J., Brito, P. (2020). Measurement of environmental efficiency in the countries of the European Union with the enhanced data envelopment analysis method (DEA) during the period 2005-2012. Environmental Science and Pollution Research, 1-25. [9] Ivanova, E., & Masárová, J. (2018). Performance evaluation of the Visegrád Group countries. Economic research-Ekonomska istraživanja, 31(1), 270-289. [10] Klicnarová, J., Adamová, M , & Soukupová N . (2021). Efficiency analysis of E U Member States i n context of population aging. Submited to Central European Journal of Operations Research. [11] Lozowicka, A . (2020). Evaluation of the Efficiency of Sustainable Development Policy Implementation in Selected E U Member States Using D E A . The Ecological Dimension. Sustainability, 12(1), 435. [12] Marková, J., & Švihlíková, I. (2019). Comparison of External Indebtedness and Debt Sustainability Development in V 4 Countries. European Journal of Tranformation Studies, 7(2), 113-127. [13] Martic, M . , Savic, G . (2001). A n application of D E A for comparative analysis and ranking of regions in Serbia with regards to social-economic development. European Journal of Operational Research, 132(2), 343-356. [14] Melecký, L . , & Staníčkova, M . (2012). National efficiency evaluation of Visegrád countries in comparison with Austria and Germany by selected D E A models. In Proceedings of 30th International Conference Mathematical Methods in Economics (pp. 575-580). [15] Melecký, L . , Staníčkova, M . , & Hančlová, J. (2019). Nonparametric Approach to Evaluation of Economic and Social Development in the EU28 Member States by D E A Efficiency. Journal of Risk and Financial Management, 12(2), 72. [16] Mlynarzewska-Borowiec, I. (2020). Income Gap between the New and Old E U Member States and Its Determinants in the Period 1996-2017. Ekonomista, (3), 401-430. [17] Molendowski, E., & Folfas, P. (2019). Effects of the Pillars of Competitiveness on the Competitive Positions of Poland and the Visegrád Group Countries in the Post Accession Period. Comparative Economic Research. Central and Eastern Europe, 22(2), 55-67. [18] Moutinho, V., Madaleno, M . , & Robaina, M . (2017). The economic and environmental efficiency assessment in E U cross-country: Evidence from D E A and quantile regression approach. Ecological Indicators, 78, 85- 97. [19] Podviezko, A . , Parfenova, L . , & Pugachev, A . (2019). Tax competitiveness of the new E U member states. Journal of Risk and Financial Management, 12(1), 34. [20] Schwarcz, P., Kováčik, M . , & Valach, M . (2021). The Development of Economic and Social Indicators in V 4 Countries. Acta Polytechnica Hungarica, 18(2). [21] Segota, A., Tomljanovič, M . , Hudek, I. (2017). Contemporary approaches to measuring competitiveness-the case of E U member states. Journal of Economics and Business, 35(1), 123-150. [22] Staníčkova, M . , Melecký, L . (2016). Malmquist index approach to efficiency analysis in selected old and new E U Member States. Ekonomická revue - Central European Review of Economic, 19 (87-104). 1805-9481. DOI: 10.7327/cerei.2016.09.02. [23] Stankovič, J. J., Marjanovič, I., & Stojkovič, N . (2021). D E A Assessment of Socio-economic Development of European Countries. Management: Journal of Sustainable Business & Management Solutions in Emerging Economies, 26(1). [24] Stephan, A . , Happich, M . , & Geppert, K . (2005). Regional disparities in the European Union: convergence and agglomeration (No. 2005, 4). Working Paper Series. [25] Worldbank. (2020). World Development Indicators. Available at: https://databank.worldbank.org/ [26] Yamen, A., Allam, A . , Bani-Mustafa, A., & Uyar, A . (2018). Impact of institutional environment quality on tax evasion: a comparative investigation of old versus new E U members. Journal of International Accounting, Auditing and Taxation, 32, 17-29. 11 Economic Policy Uncertainty and Stock Markets Co-move­ ments Peter Albrecht1 , Svatopluk Kapounek2 , Zuzana Kučerová3 Abstract. We empirically examine co-movements between the Economic Policy U n certainty (EPU) index and selected stock market indices (S&P500, UK100, N i k kei225, and D A X 3 0 ) at different investment horizons. We show significant but timevariant co-movements between E P U and stock markets employing wavelet analysis. Moreover, we identify E P U as a leading indicator of stock market drops, especially in the U S , Japan, and Germany by using the time-varying domain based on the wavelet coherence. The lag between the changes of E P U and selected stock markets is from 4 months up to 32 months for longer investment horizons. We identify the co-movements between the E P U and stock markets also in times of decreased uncertainty but only to a small extent. Keywords: Economic policy uncertainty, wavelet analysis, stock markets, investment horizons J E L Classification: G01, G41 A M S Classification: 91G15 1 Introduction There is a growing body of literature on the relationship between uncertainty and capital market returns. The uncertainty is generally measured using the Economic Policy Uncertainty (EPU) index published by Baker et al. (2016). A s Pastor and Veronesi (2013) state, political uncertainty can be interpreted as uncertainty perceived by investors and connected with future policy actions of the government. Authors demonstrate that stock returns are affected by both fundamental economic and political shocks and show that political uncertainty increases the volatility of stock returns. Increasing uncertainty in markets is related to higher exchange rate risk (Abid, 2020; Albulescu et al., 2019), but also F D F s (Canh et al., 2019). Uncertainty affects the exchange rates of emerging countries (Abid, 2020), i.e. that in time of higher uncertainty exchange rate risk appears which could impact fund, stock, and bond investments in emerging markets. Firms are not sure what impact it may have on demand, so they are more riskaverse (Tran, 2019). They decrease innovations also because of their risk aversion (He et al., 2020). Anyway, uncertainty is also significantly related to firms asset structures - companies prefer to own fewer assets in foreign currencies during times of higher uncertainty (Huang et al., 2019). Increasing uncertainty is often related to decreasing stock prices (e.g. Antonakakis et al., 2013; Tiwari et al., 2019; Luo and Zhang, 2020), however, there is still some gap in literature and we focus on comovements of uncertainty and stock market returns at different investment horizons4 . We follow Karp and Vuuren (2019) who show that investors react to uncertainty in financial markets in different ways and with different lags. We follow this stream of literature and provide contribution in several ways. First, we employ a continuous and discrete wavelet transformation to identify the persistence (i.e. the cyclical behavior) of the E P U index and stock markets indices at different investment horizons. Second, we empiricaly examine time-varying comovements between the E P U index and stock market drops, especially during following events: Black Monday in 1987, the Gulf wars in 1990 and 2003, the Japan crisis in 1989, the DotCom bubble, the terrorist attacks in the U S in 2001, the Global financial crisis in 2007, the Greek debt crisis in 2009, the Brexit referendum in 2016, and the 1 Mendel University in Brno, Faculty of Business and Economics, Zemědělská 1,613 00 Brno, Czech Republic, peter.albrecht@mendelu.cz. 2 Mendel University in Brno, Faculty of Business and Economics, Zemědělská 1, 613 00 Brno, Czech Republic, kapounek@mendelu.cz. 3 Mendel University inBrno, Faculty of Business and Economics, Zemědělská 1, 613 00 Brno, Czech Republic, zuzana.kucerova@men- delu.cz. 4 Investment horizons are given by fractal dynamics. Effects of market fractions are significant when buying and selling orders are not efficiently cleared very often (Peters, 1994). 12 president Trump election in 2016. Third, we employ phase shift and wavelet cross-correlation sequences to provide detailed analysis of lags. We confirm prevailing co-movements at frequencies below 32 months and lags between 2 and 6 months. Thus, our results identify the E P U index as a leading indicator of stock market drops for long investment horizons. 2 Data and Methods The co-movements between E P U and stock market returns is observed for the stock markets of major developed countries, the U S , Great Britain, Japan, and Germany in the period 1985-2019 (monthly data)5 . Stock markets are represented by the following indices: S&P500, UK100, Nikkei225, and D A X 3 0 . We employ logarithmic differences to ensure stationarity for all the selected time series. We examine comovements between E P U and stock markets employing wavelet analysis. This approach allows us to differentiate between frequencies representing different investment horizons (Fidrmuc et al., 2019). 3 Results Most of the previous papers describe only periods when the uncertainty is rising (Luo and Zhang, 2020; Stolbov and Shchepeleva, 2020; Chen and Chiang, 2020) but this paper analyzes not only periods with rising uncertainty but also periods when there are peak seasons or the periods with decreasing uncertainty. A s we observe, the relationship between stock indices returns and the uncertainty is very volatile, and that the uncertainty is related to stock markets returns differently for each market fraction. Even in times of higher uncertainty, it is not so possible to strictly conclude that the rising uncertainty causes the drop in stock market returns. A s such, there is not a single causality between uncertainty and stock market returns. In this context, the contribution of this paper is very clear as this paper defines mutual coherence of stock indices returns and uncertainty for every period in time and every market fraction separately. A more detailed analysis of stock market returns, taking the existence of uncertainty (using the E P U index) into account, is performed in the time-frequency domain applying the Wavelet coherence enabling the detection of co-movement; results are presented in Figure 1. A n important finding of the analysis of the returns of S&P500 and the uncertainty (Figure 1, upper-left sector) applying wavelet coherence and co-movement is that there is no exact co-movement in statistically significant periods. There are mostly two situations prevailing in the figure and that is an arrow pointing either to the bottom-left direction (an increase of the uncertainty indicates a decrease of the stock market index, i.e. the uncertainty is a leading indicator of the stock market index) or an arrow pointing to the top-left direction (the stock market index is a leading indicator of the uncertainty, therefore, when the stock market grows, the uncertainty decreases). The second type of behavior is interpreted in two ways: (1) the investors can see the positive potential in financial markets and, as such, the uncertainty about future growth decreases (Bonsall, et al. 2020); (2) the longterm investors look at different fundamentals and before these fundamentals are announced in newspaper articles, the markets have already incorporated this information into prices. That originates in the fact that long-term investors look at the information more difficult to process and release. Both explanations offer reasonable arguments so they could be valid both at the same time. In Figure 1 we can see that the E P U index is a leading indicator of future returns of the S&P500 index for shortinvestment horizons (about 16-months and shorter), except for the period of the Global financial crisis in 2007. There is also no significant relationship between the E P U index and stock market returns for market fractions lower than 4 months. Periods of significant historical coherence of the uncertainty and stock market returns occur in four periods while three of these periods are related with crises. The first period, characterized by the biggest one-day drop of indices, is identified the Black Monday in 1987 and our results show that the E P U index influences future returns of the S&P500 index in short investment horizons (frequency scales 4-8 months). The US data begin in January 1985; Japan and the E U data begin in January 1987 and the data for the GB begin in January 1998. We use European EPU index as a proxy measurement of economic policy uncertainty in Germany. 13 Figure 1: Wavelet Coherence for S&P500, UK100, Nikkei225, and D A X 3 0 . Note: The color scales represent wavelet coherencies; the black contours denote insignificance at five percent against red noise, and the light shading shows the regions probably influenced by the edge effects. The direction of the relationship (the leading indicator) is represented by arrows (a left arrow denotes an antiphase (180°) while a right arrow denotes in-phase (0° or 360°). A downward-pointing arrow indicatesas a leading indicator of stock market returns. Source: own estimation. In the case of longer horizons, the S&P500 index is a leading indicator for the frequency scales from 16 to 32 months. This relationship is valid for the whole tested period which could mean that the E P U index increases the uncertainty temporarily up to 16 months but in the case of long-term horizon, the stocks are more capable to determine future market direction and to determine the perception of the uncertainty itself on the markets. The second significant period is the period from 1997 to 2002; this period includes several economic policy shocks having an impact on the rise of uncertainty (the Russian crisis, the President Bush's election, the DotCom bubble, and the terrorist attacks in the US). The uncertainty on the markets is high because of these events and, as Figure 1 shows, the E P U index is a leading indicator of the S&P500 index. Thus, the increasing uncertainty weakened the decision-making of companies and as a result, the stocks dropped. The third significant period dates from 2002 to 2007 and can be characterized as the period of booming markets when the performance of stocks affected future uncertainty changes. The more detailed explanation is that the uncertainty rises very quickly during economic shocks and then decreases rapidly. Based on the fact, that there are not any negative events, investors perceive that the economy is in a good condition without any signs of possible present or future uncertainty. The last period started in 2007. The S&P500 index affected the uncertainty during the Global financial crisis in 2007 and 2008 but the explanation seems to be different in this case. The uncertainty was high, but this crisis was affected by the whitewashing of information by banks and companies. Therefore, the crisis appeared when it was clear that the companies and banks had held very poisonous assets and it forced the stock prices to drop severely when markets noticed the problem (Ramskogler, 2015). When we look at the results for wavelet coherence of the U K 100 index and the G B uncertainty (Figure 1, upperright sector), it is apparent that there are surprisingly very small significant areas of coherence; the values 14 of coherence for frequency scales shorter than 4 months are very fragmented. When compared with the S&P500 index, our results show similar characteristics for frequency scales from 4 to 16 months when the E P U index can be considered to be a leading indicator of the U K 1 0 0 returns for most of the analyzed period. The most significant area is partially shortened by a cone of influence; it is particularly the period during the DotCom bubble and the terrorist attacks in 2001. When we look closer at this coherence area, however, we can see that the coherence of the U K 1 0 0 index and the E P U index is significant until 2005. It is probably due to the Gulf war 2 that started in 2003 and the G B army participated in this conflict. The U K 100 index was a leading indicator of the E P U index changes for 64 and higher frequency scales after the Global financial crisis in 2007 and the E P U index was a leading indicator of the U K 1 0 0 returns for fractions up to 6 months. But there is still a notable area with no significant coherence between the E P U index and stock market returns. We can also see the periods of low uncertainty with no significant coherence in the post-crisis period after 2011. The significant area of coherence is related to the Brexit referendum in 2016 when the only significant period is detected with a positive impact of the higher uncertainty on the U K 1 0 0 returns. The explanation of this surprising phenomenon is quite simple; after the G B had decided to leave the E U , the G B P / E U R exchange rate dropped immediately by 10% of its value but no trade agreements with the E U were canceled. This depreciation of the British pound started an uptrend on stocks of the G B companies. However, after the first recalculations of the impact of Brexit on the G B economy had been released the prices were corrected back on the previous values. Therefore, the impact of the E P U index on the U K 100 index was not direct as it affected the G B P value first and the depreciation helped the stock values to rise. The coherence between the uncertainty and stock returns is the most significant in Japan (Figure 1, bottom- left sector). A remarkable fact is that the significant relationship is observable particularly for the frequency scales above 32 months, and it is valid for almost the entire analyzed period. The Nikkei225 index can be considered to be a leading indicator of the E P U index for the whole period that includes the DotCom bubble, the Global financial crisis in 2007, and Brexit that was also important for Japan. Besides that, there are significant areas for frequency scales lower than 16 months, and these areas are significant in the case of the increased uncertainty; these areas are related to the 1990 real estate bubble, then Japan came through several difficult moments to revive the Japanese economy from 1991 to 1999. The significant coherence is observable from 2000 to 2002 after the DotCom bubble, then also during the Global financial crisis in 2007 and to a lesser extent during the Brexit referendum. The fact that the mutual coherence of the E P U index and the Nikkei225 index is significant only in times of the increased uncertainty indicates that it is of high importance to focus on this relationship during times when economic policy shocks occur. The E P U index serves as a leading indicator of the Nikkei225 index in frequency scales up to 16 months except for the Global financial crisis in 2007 and these findings are very similar to that of the U S and the G B markets. The uncertainty did not take into account information affecting companies because this information was published when firms had financial problems or bankrupted (e.g. Lehman Brothers). A s such, the uncertainty itself appeared very late when problems of big firms, banks, and funds were serious, and as a results stock returns became a leading indicator for the uncertainty during this period (Ramskogler, 2015). In the case of the E U results (Figure 1, bottom-right sector), we can detect results similar for all four analyzed regions, and it is the existence of significant coherence particularly during economic shocks. We can also state that E P U index is a leading indicator of the D A X 3 0 index in most cases for frequency scales from 4 to 16 months. When compared with the other markets for long-term investors - the D A X 3 0 index is a leading indicator of the E P U index for frequency scales higher than 16 months. Information for long-term investors is more difficult to process as it is released in newspapers later when the market has already reacted to other economic fundamentals (Peters, 1994). For frequency scales from 4 to 16 months, the E P U index affects D A X 3 0 returns negatively. Again, this finding concerning the European financial markets is similar tothe other three markets; these results are significant particularly in crisis periods (in 2000-2002 or in 2007). We identify the returns of the D A X 3 0 indices to be a leading indicator of the E P U index (and not the uncertainty to be a leading indicator for returns) for this particular period. This specific information is more related to the D A X 3 0 index because this information whitewash was operated by German banks and companies on a large-scale. These companies were buying securitized securities issued by the U S banks, and these banks held subprime loans and mortgages in their portfolios. Therefore, the D A X 3 0 index dropped, and as a result, the crisis began, and the uncertainty boomed (Ramskogler, 2015). 15 4 Conclusion Our results confirm significant but time-variant co-movements between the E P U index and stock markets, especially during times of turbulence (Black Monday in 1987, the DotCom bubble, terrorist attacks in 2001, the Global financial crisis in 2007 etc.)- We interpret frequency scales as investment horizons and confirm time- varying comovements of the E P U index and stock market. Our results detect the co-movement for investment horizons between 4 and 32 months. Thus, we confirm that the E P U index serves as a leading indicator of stock market drops at short-term investment horizons. Therefore, we offer some investment recommendations for periods of uncertainty increases as there is an opportunity to speculate on stock indices drops through hedge funds, options, and other instruments. However, it is necessary to make this investment for at least 16 months ahead; for more risk-averse investors, these findings might at least indicate the time to close buy positions or to sell the assets. Investors look at different trading fundamental indicators in case of long-term investments (Peters, 1994), and the prices of stocks are moved sooner after this information is released. Third, we employ the phase shift and wavelet cross-correlation sequences to examine the lags (at different frequency scales) between E P U and stock market indices. We confirm prevailing lags between 2 and 6 months as such, these results signal that the E P U index serves as a leading indicator of stock market drops. Our results are in line with recent literature (Baker et al., 2016; Tiwari et al., 2019; Luo and Zhang, 2020; Stolbov and Shchepeleva, 2020) and provide robust evidence of coherence between the uncertainty and stock market returns. However, we contribute with distinguishing investment horizons separately. Acknowledgements This research was funded by the internal grant agency of the Mendel University, grant no. PEF_TP_2020008. Svatopluk Kapounek was supported by the Czech Science Foundation, grant No. 20-17044S. References [I] Abid, A . (2020). Economic policy uncertainty and exchange rates in emerging markets: Short and long runs evidence. Finance Research Letters, vol. 37, article no. 101378. [2] Albulescu, C. T., Demirer, R., Raheem I. D., and Tiwari, A . K . (2019). Does the U.S. economic policy uncertainty connect financial markets? Evidence from oil and commodity currencies. Energy Economics, vol. 83(C), pp. 375-388. [3] Antonakakis, N . , Chatziantoniou I., and Filis G . (2013). Dynamic co-movements of stock market returns, implied volatility and policy uncertainty. Economics Letters vol. 120, no. 1, pp. 87-92. [4] Baker, S. R., Bloom, N . , and Davis, S. J. (2016). Measuring Economic Policy Uncertainty. Quarterly Journal of Economics, vol. 131, no 4, pp. 1593-1636. [5] Bonsall, S. B., Green J., and Muller K . A . III. (2020). Market uncertainty and the importance of media coverage at earnings announcements. Journal of Accounting and Economics, vol. 69, no. 1, article no. 101264. [6] Canh, N . P., Binh, N . T., Thanh, S. and Schinckus, Ch. (2019). Determinants of foreign direct investment inflows: The role of economic policy uncertainty. International Economics, vol. 161, pp. 159-172. [7] Fidrmuc, J., S. Kapounek, and Junge, F. (2019). Cryptocurrency Market Efficiency: Evidence from TimeFrequency Analysis. Czech Journal of Economics and Finance, vol. 70, no. 2, pp. 121-144. [8] Chen, X . , and Chiang, T. C . (2020). Empirical investigation of changes in policy uncertainty on stock returns—Evidence from China's market. Research in International Business and Finance, vol. 53, article no. 101183. [9] He, F., Y . M a and Zhang, X . (2020). H o w does economic policy uncertainty affect corporate I n n o v a t i o n Evidence from China listed companies. International Review of Economics & Finance, vol. 67, pp. 225- 23. [10] Huang, J., Luo, Y . and Peng, Y . (2019). Corporate financial asset holdings under economic policy uncertainty: Precautionary saving or speculating? International Review of Economics & Finance, In Press. [II] Karp, A., and van Vuuren, G . (2019). Investment implications of the fractal market hypothesis. Annals of Financial Economics, vol. 14, no. 1, article no. 1950001. [12] Luo, Y . and Zhang, C. (2020). Economic policy uncertainty and stock price crash risk. Research in International Business and Finance, vol. 51, article no. 101112. [13] Peters, E.E. (1994). Fractal market analysis: Applying Chaos Theory to Investment and Economics. John Wiley & Sons. 16 [14] Ramskogler, P. 2015. Tracing the origins of the financial crisis. O E C D Journal: Financial Market Trends, vol. 2014/2, pp. 47-61. [15] Stolbov, M . and Shchepeleva, M . (2020). Systemic risk, economic policy uncertainty and firm bankruptcies: Evidence from multivariate causal inference. Research in International Business and Finance, vol. 52, no. 101172. [16] Tiwari, A . K., Jana, R. K . , Roubaud, D . (2019). The policy uncertainty and market volatility puzzle: Evidence from wavelet analysis. Finance Research Letters, vol. 31, pp. 278-284. [17] Tran, Q. T. (2019). Economic policy uncertainty and corporate risk-taking: International evidence. Journal of Multinational Financial Management, vol 52-53, no. 100605, pp. 1-9. 17 Optimal consumption with irreversible investment in the context of the Ramsey model Nastaran Ansari1 , Adriaan Van Zon2 , Olaf Sleijpen3 '4 Abstract. The Ramsey model is widely used in macroeconomic studies, but only few studies consider the irreversibility of investment which is an important aspect of reallife investment. In this paper, we consider a Ramsey model with irreversible investment. While the main focus of the current literature is on a qualitative discussion of such problems, our paper provides a framework for quantitative analysis of the transition path. Finding the optimal transition path in a Ramsey model with irreversible investment requires solving a multistage optimal control problem with two kinds of stages. These stages are called 'free' and 'blocked' intervals in the literature, with zero gross investment in the blocked interval and positive gross investment in the free interval. We show that the optimality conditions for such a problem imply the continuity of the control variable along the transition path, which is an important feature in finding the switching moments between free and blocked intervals. We use this feature and the backward integration method for a quantitative analysis of the transition path. Keywords: Ramsey model, Optimal control theory, multistage optimal control problem, Irreversibility of investment J E L Classification: P28, C610 A M S Classification: 49K04 1. Introduction The Ramsey model is one of the most popular and widely used models in macroeconomic studies. It has been constructed by Ramsey [1] to overcome one of the shortcomings of the Solow- Swan model by removing the constant saving rate assumption and allowing households to optimize their saving and consumption behaviour. In the Ramsey model, output is a function of the capital stock and can be used for consumption and investment purposes. In this model, due to discounting, a unit of consumption which occurs at a later time leads to less utility. However, investment through postponing consumption, increases the capital stock which leads to higher consumption in the future. Intertemporal optimization is used to determine the optimal amount of consumption over time under a macro-economic budget constraint. Most of the studies using a Ramsey-type model are based on the assumption of reversible investment. It means that consumption can temporarily exceed output, because it is possible to decumulate capital for consumption purposes. This assumption is unrealistic in many cases. In reality, once investment in fixed capital has taken place, it can't be used for consumption purposes anymore, hence, gross investment in capital must be non-negative. Arrow and Kurtz [2] implement the irreversibility of investment in the Ramsey model by means of a non-negativity constraint on gross investment. In order to find the optimal transition path toward the steady state, they define two kinds of intervals: free intervals, where the non-negativity constraint on gross investment is not binding and blocked intervals, where the non-negativity constraint is binding. They consider the order of the free and blocked intervals but they effectively disregard the duration of these intervals as well as the timing of the switching moments between the intervals. Rozenberg et al. [3] discuss a Ramsey model with irreversible investment and a constraint on accumulated pollution. They provide a qualitative discussion based on optimal control theory but in order to find a quantitative solution they use the G A M S solver as a black box. With respect to the literature, there is still a lack of a systematic framework that facilitates quantitative analysis of the transitional dynamics in the Ramsey model with irreversible investment. Our paper aims to shed more light on handling this problem by formulating it as a multistage optimal control problem. 1 Maastricht University, School of business and Economics, 6200 M D Maastricht, n.ansari@maastrichtuniversity.nl. 2 Maastricht University, School of business and Economics, 6200 M D Maastricht, adriaan.vanzon@maastrichtuniversity.nl. 3 Maastricht University, School of business and Economics, 6200 M D Maastricht, o.sleijpen@maastrichtuniversity.nl. 4 De Nederlandsche BankN.V., 1017 Z N Amsterdam, o.c.h.m.sleijpen@dnb.nl. 18 To this end, we will provide a relatively simple picture of the problem by representing the Ramsey model with irreversible investment as a three-stage optimal control problem. First, we show that this perspective allows us to specify the switching moments between free and blocked intervals. Secondly, we use the backward integration method, introduced by Brunner and Strulik in [4], to analyze the transition path toward the steady-state in a quantitative manner. 2. Method 2.1. The Ramsey model with irreversible investment In this section we discuss a Ramsey model with irreversible investment. The per capita Cobb-Douglas production function is given by5 : y = f(k)=A.ka , (1) where y is output, k is capital, A represents total factor productivity, while a is the partial output elasticity of capital. Utility, u(cj, is a function of consumption, c, is given by: u(c)=-^, (2) where o is the elasticity of substitution6 . The central planner's goal is to maximize the discounted value of total utility: max I -— e-P'dt, (3) 0 l-cr Subject to: k = i - S . k , (4) y = i + c, (5) i > 0. (6) equations (4), (5) and (6) represent the equation of the motion for capital, the budget constraint and irreversibility constraint, respectively, where 8 is the depreciation rate, and i represents the rate of gross investment in fixed capital. The Hamiltonian of this problem is: H = — .e-pt+X.k, (7) l-a The lagrangean of the problem is: L = H + \i. (A. ka - c), (8) From the first order conditions (FOC) of this Hamiltonian problem we have: ^ = 0=>c = ( e - " £ ( A + / i ) p , (9) oc dL X = — — = (8 — A. k~1+a . a).X + n(a. A. fca_1 ), (10) ok Ii. (A. ka - c) = 0, (11) where X is the co-state variable associated with capital. \i represents the lagrange multiplayer of the non-negative gross investment constraint and equation (11) implies that it is zero for c < y, i.e., for strictly positive gross investment we have \i = 0. Equation (10) represents the equation of motion for X. 5 For the sake of simplicity, we set the labour force equal to L0. Hence equation (1) can be regarded as a per capita production function. 6 We assume a > 1 and a 1. 19 2.2. A three-stage optimal control version of the Ramsey model with irreversible in- vestment In this section we consider the Ramsey model with irreversible investment as a multistage optimal control problem. According the Arrow and Kurtz paper [2] in such a problem the transition path must adhere to proposition 1: Proposition 1: "there exists k such that for all A: < k the optimal policy coincides with the reversible case while investment is zero for all k < k "[2], The situation in which gross investment is strictly positive while k < k, coincides with the irreversible case, is called a free interval. The situation in which the non-negativity constraint on gross investment is binding and k < k is called a blocked interval. Arrow and Kurtz [2] do not actually specify the value of k . However, we show that proposition 1 can be transformed to a three-stage optimal control problem that includes transversality conditions which can be used to obtain the value of k. B y keeping the terminology used in this paper close to the one used by Arrow & Kurtz [2], we define a three-stage model as follows: the first stage is associated with a blocked interval; the second stage is associated with a free interval and the third stage starts when we obtain the steady state. Stage one and two together define the transition path. We assume the Tl represents the switching moment between the first stage and the second stage, and the T2 represents the switching moment between the second stage and the third stage. So, T2 also represents the moment that the steady state arrives. In the three-stage model, the social planners' goal is to maximise total utility W subject to (4)-(6), where W is given by: W = WB + WF + ct>(kss,T2) (12) In equation (12) WB and WF are the total welfare accumulated during the first stage which is a blocked interval and the second stage, which is a free interval, respectively. (p(kss, T2) represents the scrap value function which shows the total utility gained by remaining in the steady state from time T2 onward. (p(kss, T2) shows the maximum value of the welfare integral of future utility starting from time 72 with an initial capital stock kss that remains unchanged during the steady state. Equation (12) can now be rewritten as follows: W = C l c - ^ . e - ? t dt + C ' - ^ . e~<* dt + C ^ l . e - t * dt, (13) where cB , cF and css represent the time paths of consumption during the three-stage defined above, obviously, css is constant during the steady state, while cB and cF vary over time. Now we consider the optimal consumption path and the Hamiltonians associated with each stage. In the first stage, investment i is zero. So, we have7 : HB = c -^.e-Pt +AB.(S.kB), (14) 1 — (T y = cB, (15) kB = -S.kB, (16) Equation (16) is the equation of the motion for the capital stock during the blocked interval. It follows that kBX is given by: kB,t = e-8 * • k0, (17) where k0 is the initial value of the capital stock. In order to find the optimal consumption path during the second stage, we use the F O C given by equations (9)-(l 1). In the second stage investment is positive and n = 0. Hence we have: HF = CJ ^- . e-PL + XF . (A. k F a - c - S. kF), (18) a J± = 0 ^ C F = ie-Pt mXp)i/„t ( 1 9 ) 7 From now on we will be using the subscripts, B, F, and SS to denote a blocked interval, a free interval and the steady state, respectively. 3 20 X r = ~ S = (5 ~ A K F ~ 1 + " A ) - ( 2 0 ) £ = _2e1P^ ( 2 1 ) 8 XF = 8 - A . k F ~ 1 + a a (22) Using equations (18)-(22), the equations of motion for the capital stock and the level of consumption during the free interval are given by: . cF. (Ak/^ct - 8 + p ) cF = a kF = A k F a - c F - 8.kF (24) In the third stage, we are at the steady state and kss = 0 and c'ss = 0 . 2.3. Finding the steady state The steady state would be obtained at the end of the second stage. In the steady state css and kss both should be zero. So, from equation (23) and (24) and assuming kss > 0 and css > 0 , the steady state value of c and k, (.css. Ks), are given by: (8.(1- a) + p A. a i A. a i \ \ a 8 + p 8 + p J 2.4. Transversality conditions and switching moments between intervals. In this section we show that the transversality conditions of the three-stage model that pertain to the optimal switching moment between the stages imply continuity of consumption. This means that k in proposition 1 is on the point of intersection of the stable arm and the y = c line. In order to find the optimum switching moment between the stages, the transversality conditions require the equality of the Hamiltonians of two adjoining stages at the switching moment between them (see [5] and [7]). The economic interpretation of this transversality condition is that staying one more unit of time in each stage adds the (utility) value of the Hamiltonian at that moment to the total utility. So, i f the Hamiltonians are not equal at the switching moment, then it means that staying longer or shorter in one of the stages may actually improve total utility. The gains and losses from lengthening each stage by one unit of time are as follows9 : dWB - = HRT, (26) dTi B i T 1 dWF dWF -jpj^ = HFT2 (28) d k then the optimal transition path consists of two intervals, a blocked interval followed by a free interval. N o w the question is that why this is the order of the intervals in proposition 1, why not to start with a free interval? If k0 > k , there are three possibilities to initiate the move to the steady state from the right-hand side of k: 1. c 0 > y, however, this is ruled out by the irreversibility of investment assumption; 2. c 0 = y, which means staying in the blocked interval at the beginning and then switching to the free interval later. 3. c 0 < y , which means starting with a free interval. The direction of the motions, as implied by equation (23) and (24) implies that for k0 > k choosing such a level of consumption would results in a move of the arrows far away from the steady state (as shown by the black and blue arrows in Figure 1). So, in order to hit the steady state, there must be a jump in consumption from the free interval to either the y = c line or on to the stable arm directly. As shown before, the optimality conditions imply continuity of consumption along the optimal transition path. Hence, it is not optimal to start with a free interval (like the third possibility) and the only remaining feasible option for choosing c 0 is the second option where c 0 = y. 5. Conclusion Our paper provides a framework for the quantitative analysis of the transitional dynamics in the Ramsey model with irreversible investment. We look at the problem as a multistage optimal control problem with two kind of stages namely free and blocked intervals. This perspective allows us to use a number of the transversality conditions to specify the switching moment between different type of intervals. We show that transversality conditions in the proposed multistage problem imply that at the switching moment between the intervals there is no jump in consumption. This feature determines the optimal order of free and blocked intervals and also it implies that the optimal switching moment between the intervals is defined by the duration of the movement along the y = c line up to the point of intersection with the stable arm. B y adding more constraints to the Ramsey model with irreversible investment, such as an emission ceiling or a constraint on available resources, the number and the order of the intervals could be different from the current problem. However, our paper provides a perspective on the handling of the irreversibility of investment constraint in the context of the Ramsey model in the simplest possible setting. This setting can easily be modified to handle more sophisticated version of the problem where the optimal timing of the stages is crucial aspect of the solution. In addition, i f the steady state arrives at some finite moment in time, its optimal arrival time and its optimal initial stocks can be derived using the scrap value function that capture the utility value of the steady state in combination with transversality conditions regarding the optimality of this arrival time as well as the optimality of initial stocks, as we will show in follow-up version of this paper. References [1] Barro, R. J. and Sala-i-Martin, X . (2004) Economic Growth(2nd ed.), The MIT Press. [2] Arrow, K . J., & Kurtz, K . (1970). Optimal Growth with Irreversible Investment in a Ramsey Model. Econometrica, 38(2), 331-344. [3] Rozenberg, J., et al.,(2019), Instrument choice and stranded assets in the transition to clean capital, Journal of Environmental Economics and Management, https://doi.Org/10.1016/j.jeem.2018.10.005. [4] Brunner, M . , Strulik, H . (2002), Solution of perfect foresight saddle point problems: a simple method and applications, Journal of economic dynamics & control, 26 (2002) 737-753 [5] Leonard, D. V a n long, N . (1995) 'Optimal control theory and static optimization in economies', Cambridge university press. [6] Ansari, N . Van zon, A . (2021) 'Ramsey model with irriversibile investment and emission ceiling ', Working paper. [7] V a n Zon, A . and David, P. (2012) 'Optimal multi-phase transition paths toward a global green economy', Unu-MERIT Working paper, (31), pp. 1-51. 23 Impact of incorporating and tailoring PRINCE2 into the project-oriented environment Bartoška Jan1 , Rydval Jan2 , Jedlanová Tereza3 Abstract. The paper proposes the use of the analytical network process ( A N P ) for quantification of the impact of incorporating the main topics of the international project management standard P R I N C E 2 into a semantic model, which displays the project management environment in a commercial project-oriented organisation. The semantic model is derived based on the organizational structure and the life cycle of a projects. The A N P network creates the basis for analysis of the preferences of the project roles, project documents, and other elements of the project management environment of the organization and analysis of their individual relationship to organization units. Using the A N P analysis the impact of incorporating the main topics of P R I N C E 2 into the project management environment of the organization is estimated and quantified. Furthermore, the A N P is used to perform a sensitivity analysis of the preferences of individual components of the semantic model when changing the importance of individual incorporated topics of P R I N C E 2 in the organization. The authors of the article build on their previous work and research activities in the field of project management. Keywords: Analytic Network Process; Corporate Organization; Multi-criteria Decision Making; P R I N C E 2 ; Project Management; Project Team Roles; Semantic Model; Sensitivity Analysis. J E L Classification: C44 A M S Classification: 90C35 1 Introduction International standard of project management P R I N C E 2 (PRoject I N Controlled Environment) is the leading structured project management method in the United Kingdom, and it is used across the whole world in the private and public sector [14]. According to [6], P R I N C E 2 is a process based on project management methodology. P R I N C E 2 was developed to gain control at the start, during the progress and at the completion of projects. It is project management based on three constraints (time, quality and cost connecting in the Project Management Triangle). P R I N C E 2 divides projects into manageable and controllable stages. It is necessary to understand howto make all three project constraints adjust to each other to deliver a project within the scope. The basis of the P R I N C E 2 method is to describe project management as planning, delegating, monitoring and control of all aspects of the project. It is necessary to achieve the project objectives within the expected performance targets for time, cost, quality, scope, benefits and risk. P R I N C E 2 clearly defines the roles and responsibilities of the project team members [6] and focuses on the product that has to be delivered. P R I N C E 2 has improved the product value especially i n the field of IT projects [7], but also i n other areas, for instance in the field of e-learning [2], Nowadays, according to [7] business companies are shifting within their project-oriented environment to use the international project management methodology P R I N C E 2 , not necessarily just in the field of IT projects. However, the implementation of P R I N C E 2 can also be a very difficult process, especially in terms of the dividing project into manageable and controllable stages [12]. A n d because the introduction of new ways and methods of project management can be a difficult process, it is necessary to define and describe the project environment of the organisation. Various methods and tools can be used for this, such as the semantic model, which has been used by 'Czech University of Life Sciences Prague, Department of Systems Engineering, Kamýcká 129, Prague, bartoska@pef.czu.cz. 2 Czech University of Life Sciences Prague, Department of Systems Engineering, Kamýcká 129, Prague, rydval@pef.czu.cz. 3 Czech University of Life Sciences Prague, Department of Systems Engineering, Kamýcká 129, Prague, jedlanova@pef.czu.cz. 24 [10], [16] or [5] and it is also possible to quantify individual parts of the project environment. The introduction and tailoring of the P R I N C E 2 into an organization always has a certain effect and impacts on the organization. The aim of this paper is to quantify and evaluate the impact of the international project management standard PRINCE2's incorporation into an organization's project-oriented environment. The paper proposes the use of the Analytical Network Process (ANP) for quantification of the P R I N C E 2 incorporation's impact on to a commercial project-oriented organization. After P R I N C E 2 incorporation, the project team members must still decide on their own project management framework. Therefore, preferences of the project team roles are quantified and the stability of these preferences values are tested using sensitivity analysis. The results of the sensitivity analysis show the stability of the individual roles weights due to the changes i n particular P R I N C E 2 principles' importance. 2 Materials and Methods 2.1 International standard of project management PRINCE2 P R I N C E 2 (PRojects I N Controlled Environments) is a structured project management method and practitioner certification programme [6]. P R I N C E 2 divide projects into manageable and controllable stages. P R I N C E 2 is a project management methodology o f 7s. The principles, themes and processes all follow this model. P P J N C E 2 derives its methods from 7 core principles. Collectively, these principles provide a framework for good practice. The Principles are [6]: Continued Business Justification, Learn from Experience, Define Roles and Responsibilities, Manage by Stages, Manage by Exception, Focus on Products, Tailor to the Environment. Themes provide insight into how the project should be managed [6]. They can be thought of as knowledge areas, or how principles are put into practice. They are set up at the beginning o f the project and then monitored throughout. Projects are kept on track by constantly addressing these themes - Business Case, Organisation, Quality, Plans, Risks, Changes and. Progress The P R T N C E 2 method also separates the running o f a project into 7 processes. Each one is overseen by the project manager and approved by the project board. Here is a breakdown o f each stage [6]: Starting U p a Project, Initiating a Project, Directing a Project, Controlling a Stage, Managing Product Delivery, Managing Stage Boundaries, Closing a Project. 2.2 Semantic model A semantic model consists of a semantic (associative) network that [11] define as "natural graph representation". In the semantic network, each node represents individual objects of described world and edges connecting these nodes and represent relationships between these objects [21]. The term "semantic network" was for the first time used by [13] in his dissertation on the representation of English words and according to [15] are semantic networks suitable for displaying and expressing big information resources, management structures and processes or other areas. 2.3 Analytic Network Process Many decision problems cannot be decomposed and structured hierarchically, i.e. they cannot be structured into an Analytic Hierarchy Process (AHP) model (see [18] for introduction to A H P theory), because they involve many interactions and dependencies of higher-level elements i n a hierarchy on lower-level elements, i.e. these problems can be structured into a network. A N P is represented by a network, then as a hierarchy ([17], [19], [20]). Therefore, the Analytic Network Process (ANP) is a generalization of the Analytic Hierarchy Process. A N P can include the dependencies between the elements of different levels of the hierarchy as well as of the elements of the same level of the hierarchy (higher-level and lower-level elements in a hierarchy). The A N P model can reflect the increasing complexity of a network structure, where the network can be created from different groups of elements. Each group of elements creates a network cluster, which includes a homogeneous set of elements. Connections can exist between clusters as well as between the elements i.e. between the elements inside the cluster and between the elements from different clusters. In A H P for hierarchical trees, synthesized global priorities are calculated by multiplying the local priorities, which are determined via pairwise comparisons of the priority of the parent element. In A N P this process is replaced by the Limit Matrix calculation ([1], [17], [19], [20]). The basic steps of the ANP method according to [19]: • The first step is to create a network, which describes the decision problem. The A N P network shows dependency among decision elements. • The second step is to conduct the pairwise comparisons of the elements within the clusters and among the clusters. A N P prioritizes not only decision elements but also their groups or clusters as it is often the case in 25 the real world. The consistency of these comparisons should be controlled. The consistency is measured by a consistency index defined by Saaty: n - \ (1) where lmax is the largest eigenvalue of Saaty's matrix and n is the number of criteria. Saaty's matrix is considered to be sufficiently consistent i f ls<0.1. The third step is to construct the Supermatrix. The priorities derived from the pairwise comparisons are entered into the appropriate position i n the Unweighted Supermatrix. This Supermatrix has to be normalized using clusters weights, the Weighted Supermatrix is calculated: (2) Ci c 2 • • c N Ci [Wii w 1 2 .•• w l n i w = c 2 w 2 1 w 2 2 • •• w 2 n c N w n l w n 2 .•• w n n . where each block Wy of the supermatrix consists of: Wtj = W n w 1 2 W 21 W 22 Wni Wn2 W„„ where: ii ^ W i j = l,j 6 ( l , n > (3) (4) The fourth step is to compute the Limit Matrix and global preferences of decision elements are obtained. The Limit Matrix is used to obtain stable weights from Weighted Supermatrix. Raising the Weighted Supermatrix to powers generates the Limit Matrix, the powers will converge to a given matrix (the Limit Matrix), or the powers will converge to a cycle of matrices (the Limit Matrix is the average of these matrices). From Limit Matrix the final global values of priority (preferences) are gained. These preferences present the best decision selection. More information on standard steps of the Limit Matrix calculation are i n [19] and see [1] for more information about algorithms for computing the Limit Matrix. Sensitivity analysis i n the A N P . Sensitivity analysis is used to check the results obtained through the A N P model, i.e. to check the stability of preferences [8]. To start with the sensitivity analysis, elements having the highest preferences of the observed cluster are identified first. The impact of the increased value of the preference should be observed on all other elements of the cluster. The interpretation of the data obtained from the sensitivity analysis is, that as the input value, the priority of selected nod in the Unweighted Supermatrix, changes from 0 to 1 and the corresponding priorities of the alternatives, are computed from the Limit Supermatrix ([17], [19]). A N P is i n this paper processed by the SuperDecisions software ([22]). The software that implements the Analytic Network Process based on Thomas L . Saaty's work, was developed by William J. Adams working with Rozann W . Saaty. 3 Results and Discussion 3.1 Case study: Incorporating and tailoring of PRINCE2 in a commercial unit The research in a commercial unit (the bank organization) took place from 2016 to 2020. The chosen organization is the international banking company with an extensive portfolio of banking services for corporate or personal clients - represents a typical corporate environment with developed project management. Within the research, a basic semantic model of project management was created as stated i n [3] or [4] and further described and interpreted in [5], [10] or [16] - the model includes a complete network of project roles, departments, project documentation, project restrictions, etc. The A N P for analysing the impact of incorporating the P R I N C E 2 into a project environment of a commercial unit is used as follows: The description of the project environment of a commercial unit is created using the semantic model of the this environment. Based on this description, a network for A N P is created consisting of clusters (sets 26 of indicators and elements) describing the project environment. It consists of five clusters: strategies, organisation project documentation, limits, and team roles as stated in the previous research [3], [4], [10], [16] or [5], which we follow up with this article. Then pairwise comparison of individual elements within the cluster and between clusters is performed according to the A N P network settings. This yields the weights of the individual elements. If the comparison values are left equal to 1, i.e. the weights of the individual elements depend only on the network structure (the project environment structure). If the project environment is then changed, e.g. by implementing a new project management standard (PRINCE2), which can change the relationships between the cluster elements of the project environment, the pairwise comparison is performed again and the impact of the introduction of the project management standard on the organisation is determined. So that after the incorporation of the basic principles of P R I N C E 2 into the project environment of the organization the changes are reflected i n the A N P model showing the project-oriented environment. These changes are mainly the setting of new relationships between the elements of the clusters: project documents, project team roles, and project constraints, especially in relation to the strategy of the organization, which now take into account the basic principles of P R I N C E 2 . The preferences of individual project team roles (shown in Table 1) before the incorporation of the P R I N C E 2 principles are presented by the Neutral model and after the introduction of the P R I N C E 2 principles are presented by the P R I N C E 2 model. In the Total column are the values of preferences from the Limit Matrix of A N P , the column Normal shows the values of preferences normalized for Project Team Roles cluster, and the column Ideal shows the preferences obtained by dividing the values in the Total column by the largest value in the column. Total Normal Ideal Ranking Project Team Roles Neutral Model PRINCE2 Model Neutral Model PRINCE2 Model Neutral Model PRINCE2 Model Neutral Model PRINCE2 Model Diff. Business Analyst (BAN) 0.00003 0.00471 0.00018 0.02646 0.00023 0.03790 8 7 1 Business Architect (BAR) 0.00015 0.00248 0.00101 0.01392 0.00124 0.01994 7 8 -1 IT Delivery Manager (ITDM) 0.01439 0.01687 0.09772 0.09473 0.12038 0.13572 2 2 0 Project Manager (PM) 0.11951 0.12430 0.81183 0.69800 1.00000 1.00000 1 1 0 Senior Supplier (SeS) 0.00408 0.00526 0.02773 0.02956 0.03416 0.04235 5 5 0 Senior User (SU) 0.00408 0.00526 0.02773 0.02956 0.03416 0.04235 4 4 0 Solution Architect (SAR) 0.00001 0.00515 0.00004 0.02891 0.00005 0.04142 9 6 3 Sponsor(Sp) 0.00089 0.00178 0.00602 0.01001 0.00741 0.01434 6 9 -3 Team Manager (TM) 0.00408 0.01226 0.02774 0.06886 0.03417 0.09865 3 3 0 Table 1 Quantification of the cluster Project Team Roles In the project management environment in the commercial unit, the Project Manager (PM) is evaluated i n the A N P model as the most important role, no other role is so important. However, after incorporating the P R I N C E 2 principles, the importance of the P M decreased, especially i n favor of the project role Team Manager (TM), although the overall ranking of these roles remained the same. Furthermore, incorporating P R I N C E 2 had a significant effect on reducing the importance of the Business Architect role (BAR) in favor of Business Analyst ( B A N ) and i n reducing the importance of the role Sponsor (Sp) i n favor of the Solution Architect (SAR) (shown in Table 1). This occurred when the changes arose i n roles and in roles' responsibility i n case of partial transformation of the organization. Especially, there is important change (exchange of position/location) at SP and S A R in which case is the consequence of the upper involvement of role at the agile method of management project (SAR is becoming in agile teams Product Owner usually), meanwhile the role SP is suppressed. Although the project role P M remained the most important role of the project team, its importance decreased the most compared to the other roles (see differences of weights in Figure 1). This is because the principles of P R I N C E 2 puts great emphasis on various roles i n individual project management processes and not just on P M . The highest increase in the importance of the project roles is thus evident by the T M (Figure 2). Again, this change came in the case of a change of responsibility at partial agile transformation of the organization. A t the role of T M came to increased responsibility i n the consequence of increasing meaning of teams at their agile management. One of PRINCE2's main principles is focused on products, therefore, preference values' stability of project team roles P M , T M , and IT D M has to be tested using sensitivity analysis. Sensitivity analysis was performed both in the model before the incorporating of P R I N C E 2 (Neutral model) and i n the model after the incorporating of P R I N C E 2 into the project environment of the company (PRINCE2 model). The stability of the preferences of these roles was examined i n terms of increasing the significance of P R I N C E 2 key elements Configuration (Figure 2 left) and Issue (Figure 2 right). A s the importance of the PRINCE2's element Configuration (x-axis) increased, in the Neutral model the importance of P M increased and the importance of the IT D M decreased (y-axis). In the P R I N C E 2 model, the opposite is true and the importance of T M increases. Furthermore, as the importance of the PRINCE2's element Issue increases, in the Neutral model, the importance of the P M also increased at the expense of other roles, but i n the P R I N C E 2 model, the importance of the T M increases. 27 1.00 0.90 0.80 0.70 g 0.60 3 0.50 ŕ 0.40 0.30 0.20 0.10 0.00 Business Business IT Project Analyst Architect Delivery Manager (BAN) (BAR) Manager (PM) Senior Senior Solution Sponsor Team Supplier User (STJ) Architect Manager (SeS) (SAR) (TM) 0.06 0.04 0.02 g "S 0.00 s -0.02 j* -0.04 ° u -0.06 ~ -0.08 |j -0.10 Q -0.12 -0.14 Project Roles Figure 1 Change of preferences after the introduction of the P R I N C E 2 Sensitivity analysis is a suitable tool for monitoring the possible change of preferences of selected project team roles in the event of a change in the particular P R I N C E 2 principal importance in the company's project environment [16]. O n the other hand, results of team roles importance as well as results from the sensitivity analysis obtained by A N P methods, refer only to the project-oriented environment in a specific commercial unit. As Ziemba [23] states in his work, without further research, the results cannot be generalized. In this research, the sensitivity analysis shows that the importance of particular project team roles is more sensitive i n the P R I N C E 2 model than in the Neutral model. However, this sensitivity analysis was conducted only due to the change i n the importance of the individual P R I N C E 2 principles. Furthermore, when the decision-makers can decide the importance of A N P structure elements, they can be sometimes inconsistent while filling in the pairwise comparison matrix. Then the consistency of the matrix can reach over a feasible consistency limit and the information value of the data can be ruined ([9]). IT Delivery Manager (PRINCE2) Project Manager (PRINCE2) Team Manager (PRINCE2) ' IT Delivery Manager (PRINCE2) Project Manager (PRINCE2) Team Manager (PRINCE2) IT Delivery Manager (Neutral) Project Manager (Neutral) Team Manager (Neutral) IT Delivery Manager (Neutral) Project Manager (Neutral) Team Manager (Neutral) Figure 2 A N P sensitivity analysis of project team roles 4 Conclusion The paper presents the use of semantic models of project management in the commercial sector (the banking sector), specifically i n the context of international methodology P R I N C E 2 and i n progress agile transformation in organization. The presented results are derived from partial results of the authors' own research from 2016 to 2020 and follow previous author's research. [3], [4], [5], [10] or [16]. Results of the case give evidence of changes in roles and in responsibility at the agile transformation of organization in connection with well-established methodology P R I N C E 2 . Monitoring changes i n the case of project roles is made possible using semantic model of project management. Especially, using A N P seems to be as significantly useful for quantification of variables just as it is for practice. Importance of international standards and methodology of project management is crucial for corporate practice, especially at agile approach in project management. Implementation and development of international standards and methodology of project management i n the project environment of organization depends and is tailored to a specific environment of organization. For using semantic models of project management, the A N P is a very useful instrument. The A N P enables quantification of variables i n project environment and monitoring their changes. 28 5 Acknowledgements This research is supported by the grant No. 2019A0015 "Ověření a rozvoj sémantického modelu řízení projektů" of the Internal Grant Agency of the University of Life Sciences Prague. References [I] Adams, B . (2011). SuperDecisions Limit Matrix Calculations. Decision Lens Inc. [2] Axinte, S.-D., Petrica, G., Barbu, I.-D. (2017). Managing a software development project complying with P R I N C E 2 standard. Proceedings of the 9th International Conference on Electronics, Computers and Artificial Intelligence, E C A I 2017 I S B N : 978-150906457-1. [3] Bartoška, J. (2016). A sematic model for the management ofprojects. Habilitation work (thesis). D S E F E M C U L S Prague. [4] Bartoška, J. (2016). Semantic Model of Organizational Project Structure. In: Proceedings of the 34th International Conference Mathematical Methods in Economics. Liberec: Technical University. I S B N 978- 80-7494-296-9. [5] Bartoška, J., Jedlanova, T., Rydval, J. (2019) Semantic Model of Project Management in Corporate practice In: Proceedings of the 37th International Conference Mathematical Methods in Economics. I S B N : 978-80- 7394-760-6 [6] Bennett, N . (2019). Managing Successful Projects with PRINCE2. Axelos I S B N 9780113315567. [7] De L a Cámara Delgado, M . , Marcilla, Fco.J.S., Calvo-Manzano, J.A., Vicente, E.F. (2012) Project management and IT governance Integrating P R I N C E 2 and ISO 38500. 7th Iberian Conference on Information Systems and Technologies, CISTI2012, I S B N : 978-989962477-1 [8] Farman H . , Javed H . , Jan B., Ahmad J., A l i S., Khalil F N , et al. (2017) Analytical network process based optimum cluster head selection i n wireless sensor network. PLoS ONE 12(7): e0180848. https://doi.org/10.1371/journal.pone.0180848. [9] Hlavatý, R., (2014). Saaty's matrix revisited: Securing the consistency of pairwise comparisons In Proceedings of the 32th International Conference Mathematical Methods in Economics. 287-292 , 2014. [10] Jedlanová, T., Bartoška, J., Vyskočilová, K . (2018). Semantic Model of Management in Student Projects. In: Proceedings of the 36th International Conference Mathematical Methods in Economics. Prague: MatfyzPress. I S B N 978-80-7378-372-3. [II] Mařík, V . (1993). Umělá inteligence: Díl 1. Praha: Academia, 264 s. I S B N 80-200-0496-3. [12] McGrath, S., Whitty, S.J. (2020). The suitability of P R I N C E 2 for engineering infrastructure. In: Journal of Modern Project Management (Vol 7, no 4) pp 312-347. [13] Quillian, R. (1968). A recognition procedure for transformational grammars. Doctoral dissertation. Cambridge ( M A ) : Massachusetts Institute of Technology. [14] Rupali, P.P., Kirti, N . M . (2017). Benefits and Issues i n Managing Project by P R I N C E 2 Methodology. In: International Journal of Advanced Research in Computer Science and Software Engineering. D O I : 10.23956/ijarcsse/V7I3/0134. [15] Rydval, J., Bartoška, J., Brožová, H . (2014). Semantic Network i n Information Processing for the Pork Market. AGRIS on-line Papers in Economics and Informatics 6, 59-61. [16] Rydval, J., Bartoška, J., Jedlanova, T. (2019) Sensitivity Analysis of Priorities of Project Team Roles Using the A N P Model In: Proceedings of the 37lh International Conference Mathematical Methods in Economics. I S B N : 978-80-7394-760-6. [17] Saaty, T. L . (1996). Decision Making with Dependence and Feedback: The Analytic Network Process, I S B N 0-9620317-9-8, R W S . [18] Saaty, T. L . (2000). Fundamentals of the Analytic Hierarchy Process. R W S Publications, Pittsburgh, 2000. [19] Saaty, T. L . (2001). Decision Making with Dependence and Feedback: The Analytic Network Process, The Analytic Hierarchy Process Series. Pittsburgh: I X , R W S Publications. [20] Saaty, T. L . (2003). The Analytic Hierarchy Process (AHP) for Decision Making and the Analytic Network Process (ANP) for Decision Making with Dependence and Feedback, Creative Decisions Foundation. [21] Sowa, J. F. (2000). Knowledge Representation: Logical, Philosophical, and Computational Foundations. Brooks Cole Publishing Co., Pacific Grove, C A . [22] SuperDecisions Software for Decision-Making. (2018). U R L . [23] Ziemba, P. (2019). Inter-Criteria Dependencies-Based Decision Support i n the Sustainable wind Energy Management. Energies 2019, 12, 749; doi:10.3390/enl2040749. 29 Assessment of personal ambiguity attitude in a series of online experiments Simona Bažantová1 , Vladislav Bína2 , Václav Kratochvíl3 , Klára Simůnková4 Abstract. The paper deals with the issue of personal ambiguity attitude as a vital characteristic of decision-making. Similarly, as i n the case of risk attitude, a person can seek ambiguity, show a neutral attitude, or have an ambiguity aversion. A t the end of 2020 and i n January 2021, an experiment consisting of a series of four lotteries was conducted. Each lottery consisted of 14 questions concerning bets on the results of drawing certain types of balls from an urn Financial incentives for repeated attendance stimulated the participants, and the five most successful won the (gradated) prices. The experiment was designed to assess personal attitude to ambiguity together with the measuring of this attitude and contained two variants of questions inspired by Ellsberg's experiments. Observing the pandemic limitations i n place at the time, the series of lotteries were held online through M S Teams software and a questionnaire website, which showed the particular game settings and saved the participants' answers. The paper analyses the behaviour i n each of the four lotteries and shows the bet changes i n time as the participants' learning effect and characteristics. Keywords: ambiguity, attitude, online experiment, decision-making J E L Classification: D01 A M S Classification: 91B03 1 Introduction Research on ambiguity has a long tradition. This phenomenon can be seen i n everyday situations, and it is wellknown that ambiguity (as a situation with no well-defined or vague probabilities) affects decision-making on a large scale. In some cases, people prefer risky bets to ambiguous ones, and this tendency is called ambiguity aversion. O n the other hand, there is also the opposite situation, where people are seeking ambiguity as another way to express their ambiguity attitude [15]. Publications concerning the ambiguity aversion and identifying its consequences and the causes dealing with, for example, demographic factors (e.g. age effect, see Sproten et al. [14], gender differences, see Schubert [13]), or individuals' characteristics and experiences (see, e.g., Buhr & Dugas [2]), sometimes also with regard to the context of the decision or situation factors. It is also evident that people's ambiguity attitudes can be affected by many factors (for review, see Furnham & Marks [7]). It appears that ambiguity attitude plays an important role i n everyday decision-making and a particularly important role i n decisions concerning economic problems (see, e.g., Bianchi & Tallon [1] or Dimmock et al. [5]). During the 20th century, the role of ambiguity was rather marginalized and risk attitude stayed i n the forefront as the crucial personal characteristic of a decision-maker. This dates back to Savage's axiomatic formalization leading to the expected utility theory (see Savage [12] and Mongin [11]). The importance of ambiguity attitude was stressed in later studies showing it as the second most important characteristic of human decision making (see Cohen [4] and Lauriola [10]). More recent papers analyse the thought experiments using non-incentivized variants of the Ellsberg approach [6] although the incentivized version of assessing ambiguity attitude appears to be important (for an example supporting this assertion, see Cavatorta & Schroder [3]). The series of experiments summarized i n this paper can be added to this stream of literature. 1 Prague University of Economics and Business - Faculty of Management, Jarošovská 1117/11, Jindřichův Hradec, simona.bazan- tova@vse.cz. 2 Prague University of Economics and Business - Faculty of Management, Jarošovská 1117/11, Jindřichův Hradec, vladislav.bina@vse.cz. 3 Prague University of Economics and Business - Faculty of Management, Jarošovská 1117/11, 377 01, J. Hradec & Institute of Information Theory and Automation, Prague, Czech Republic, velorex@utia.cas.cz 4 Prague University of Economics and Business - Faculty of Management, Jarošovská 1117/11, Jindřichův Hradec, klara.simunkova@vse.cz. 30 2 Methods and Data This paper focuses on behaviour under ambiguity and deals with personal ambiguity attitude as a vital decisionmaking characteristic. Furthermore, we analysed the changes i n the total amount of bets i n lotteries i n time as the participants' learning effect and personal characteristics. Experimental design and questions i n each online lottery follow experiments conducted under research project G A C R 19-06569S. Previous experiments (within the mentioned research project) were i n offline form. However, due to COVID-19 restrictions, at the very end of 2020 and i n January 2021, the experiments (consisting of a series of four lotteries) were conducted online through M S Teams software and a questionnaire website, showing the particular game settings and saving the participants' answers. Each online lottery consisted of 14 questions concerning bets on the results of drawing certain types of balls from an urn (the urns were presented in a random order to participants). These questions can be divided into three categories: One Red Ball Example, 6-Colour Example and Ellsberg's Example. For the specific and detailed text of all questions, see Jirousek & Kratochvil [8]. For the present study, four games (questions) were most critical (mentioned i n [8, pp. 54] and marked with abbreviations F l , F2, II and 12) due to the calculation of the personal coefficients of ambiguity aversion. Questions F l and F2 were the simulation of decision under risk (probability may be calculated). O n the other hand, questions II and 12 were the simulation of decision under ambiguity (probability may not be calculated). Ambiguity aversion was measured using the following formula (1) from Jirousek & Kratochvil [9]: aF = -J -^- and aG = (1) Note: ai (a2) refers to willing to bet a, (a2) points for taking part at lottery F l (F2) and bi (b2) refers to willing to bet bi (b2) points for taking part at lottery II (12) The semantics of coefficient a is "the higher the aversion, the higher the coefficient" [9, pp. 81]. In this paper, the personal coefficient of ambiguity aversion for each of the participants was calculated by the average values of a F and aG. It is essential to mention the critical difference between the present online experiments and previous offline experiments. In the online experiments, the respondents did not bet their own money (the respondents play only for points i n the game). In each lottery, won (or lost) game points were added to (or subtracted from) the personal account, and once all four lotteries were completed, we calculated the total number of game points to determine the winner. The measured variables in the present experiments are illustrated i n Table 1. Nickname Sex Game Result Total Bet Ambiguity coefficient Previous Result Change of ambiguity coefficient Anettve F 1 -60 107 0,35 NA NA Anettve F 2 64 181 0 -60 -0,35 Anettve F 3 0 104 0,29 64 0,29 Anettve F 4 0 126 0,48 0 0,19 Blackjack M 1 110 120 -1,50 A^4 A^4 Blackjack M 2 0 100 0 110 1.5 Blackjack M 3 65 140 0 0 0 Table 1 Dataset with measured variables • Ambiguity coefficients were calculated by using the formula mentioned above. This indicated the ambiguity seeking or ambiguity aversion of the participants (numerical variable) • Total Bets were defined as the total amount of the bet (game points) for each participant i n each lottery (numerical variable) • Change of ambiguity coefficients was calculated as the difference between a i n the present lottery and a in the previous lottery (numerical variable) • Sex of respondent (categorical variable) 31 • Game contains numerical marking of the lottery and shows the number of previous experiences with experimental situations (games or lotteries) • Result is the variable that shows the number of winning or losing game points i n the present lottery • Previous results are the variable that shows the number of winning or losing game points i n the previous lottery • Nickname as the identified respondents Data were analysed i n statistical software R, and multiway A N O V A was used to examine the relationship between the measured variables. Ambiguity coefficients, Total Bets and Change of ambiguity coefficients were independent variables and the others were factors i n A N O V A models. Due to the principals of A N O V A , we categorized the Ambiguity coefficients (values -2, -0.5, 0, 0.5,2 into the categories "negative", "rather negative", "rather positive", "positive"), Result (values -800, -100, 0, 100, 800 into the categories "negative", "rather negative", "rather positive", "positive") while Game was also defined as factor. The research sample consists mainly of students of the Faculty of Management, Prague University of Economics and Business. Financial incentives for repeated attendance stimulated the participants, and the five most successful won the (gradated) prizes. In total, 43 respondents participated i n a series of four lotteries. However, only students and respondents who participated i n two or more lotteries were included i n further analyses. Three respondents were excluded due to unfulfilled conditions, and thirteen respondents were excluded due to participation i n only one lottery. This reduction was used mainly due to the increased consistency of the sample and calculation with previous experiences. Therefore, 27 participants (10 male, 17 female) were included i n further analyses. 3 Results This paper analyses behaviour under ambiguity and risk i n each of four lotteries, and shows changes i n bets and ambiguity coefficients i n time as a learning effect in the participants' behaviour. We specifically investigated the effect of previous experience on the willingness to bet i n the next lottery following a successful or unsuccessful one, i n connection to the sex of the respondents, the number of previous experiences and the ambiguity coefficient. Furthermore, we examined the effects of measured variables on the personal coefficient of ambiguity aversion. 3.1 Personal coefficient of ambiguity aversion This section illustrates some experimental results of the personal coefficient of ambiguity aversion (a). The measuring of this coefficient was proposed above (see the section Methods and data). The average values of the personal coefficients of ambiguity aversion stay within the range of 0.22 to 0.29 i n four online experimental lotteries (fiioiuetyi= 0-2516, oiiotteiy2 = 0.2239, 0.05 during the systematic iterative reduction of the model). A similar analysis of the effects of the measured variables on the change of ambiguity coefficients across the lotteries ended similarly. In this context, we examined whether the change of personal ambiguity aversion coefficient was affected by the previous results (experiences). The other factors were the respondents' sex and previous experiences (including iterations between the variables and the blocks based on the respondents' nicknames). Contrary to expectations, the analysis did not detect any significant factor affecting ambiguity coefficient changes across the lotteries. 4 Conclusion The key findings of this study can be summarized as follows: First, the present paper suggests a significant effect of a negative result i n the previous lottery on the total bet i n the following one, which could be caused by a change in the participant's game strategy (i.e. an 'all-or-nothing' basis). Future research may examine this effect in a game with endangered own money. Second, i n our online experimental lotteries, the mean value of the personal coefficients of ambiguity aversion stays within the range of 0.22-0.29, indicating that participants are inclined to ambiguity aversion. However, personal coefficients a stay within -1 to 1, which is interesting because, in our experiments, the participants were relatively homogeneous (regarding education and age). Despite the similarities of our sample, the alfa coefficient range was extensive, and we did not indicate the effects of the sex of the respondent the previous result (a successful or unsuccessful game), and the number of experiences on the level of personal ambiguity aversion coefficient. Therefore, it appears that individuals' ambiguity aversion may be more influenced by various internal factors such as attitude and motivation to play the game rather than any variables included i n the present paper. Therefore, i n further research we shall turn our attention to a more profound examination of the internal factors of the participants. The primary limitation to the generalization of the results i n this study is a relatively small sample size. In addition, the sample of respondents is not representative of the population. Acknowledgements This study was supported by the National Science Foundation of the Czech Republic ( G A C R ) under project no. 19-06569S. 34 References [I] Bianchi, M , & Tallon, J. M . (2019). Ambiguity preferences and portfolio choices: Evidence from the field. Management Science, (55(4), 1486-1501. [2] Buhr, K . , & Dugas, M . J. (2006). Investigating the construct validity of intolerance of uncertainty and its unique relationship with worry. Journal of anxiety disorders, 20(2), 222-236. [3] Cavatorta, E., & Schroder, D . (2019). Measuring ambiguity preferences: A new ambiguity preference survey module. Journal of Risk and Uncertainty, 58(1), 71-100. [4] Cohen, M . , Jaffray, J. Y . , & Said, T. (1987). Experimental comparison of individual behavior under risk and under uncertainty for gains and for losses. Organizational behavior and human decision processes, 39(1), 1- 22. [5] Dimmock, S. G., Kouwenberg, R., Mitchell, O. S., & Peijnenburg, K . (2016). Ambiguity aversion and household portfolio choice puzzles: Empirical evidence. Journal of Financial Economics, 119(3), 559-577. [6] Ellsberg, D. (1961). Risk, ambiguity, and the Savage axioms. The quarterly journal of economics, 643-669. [7] Furnham, A . & Marks, J. (2013). Tolerance of Ambiguity: A Review of the Recent Literature. Psychology, 4, 717-728. [8] Jiroušek, R. & Kratochvíl, V . (2019). Preliminary Results from Experiments on the Behavior under Ambiguity. In: M . Inuiguchi, R. Jiroušek & V Kratochvíl (Eds.), Proceedings of the 22nd Czech-Japan Seminar on Data Analysis and Decision Making (CJS' 19) (pp. 53-64). Nový Svetlov, Czech Republic [9] Jiroušek, R. & Kratochvíl, V . (2020). O n subjective expected value under ambiguity. International Journal of Approximate Reasoning, 127, 70-82. [10] Lauriola, M . , & Levin, I. P. (2001). Relating individual differences i n attitude toward ambiguity to risky choices. Journal of Behavioral Decision Making, 14(2), 107-122. [II] Mongin, P. (1997). Expected utility theory. Handbook of economic methodology, 342350, 234-350. [12] Savage, L . J. (1954). The Foundations of Statistics. N e w York: John Wiley & Sons Inc [13] Schubert, R., Gysler, M . , Brown, M . , & Brachinger, H . W . (2000). Gender specific attitudes towards risk and ambiguity: An experimental investigation (No. 00/17). Economics Working Paper Series. [14] Sproten, A., Diener, C , Fiebach, C , & Schwieren, C. (2010). Aging and decision making: How aging affects decisions under uncertainty (No. 508). Discussion Paper Series. [15] Trautmann, S. T. & van de Kuilen, G. (2015). Ambiguity attitudes. The Wiley Blackwell handbook of judgment and decision making. 1: 89-116. Chichester: John Wiley & Sons Ltd. 35 An original two-index model of the multi-depot vehicle routing problem. Zuzana Borčinová1 , Štefan Peško2 Abstract. Vehicle routing problem (VRP) is a family of combinatorial optimization problems which aim to determine the lowest cost vehicle routes to serve a set of customers. In a solution of V R P , the vehicle routes originate from one or several depots and should return to the same depot they started with while ensuring that the total demand on each route does not exceed the vehicle capacity. In this paper, we present a two-index vehicle-flow formulation for the multi-depot vehicle routing problem ( M D V R P ) including a new constraint used to forbid routes to have the starting and ending points at two different depots. Computational experiments on several instances of varying depots and customer sizes showed that the optimal solutions were obtained by proposed formulation in a lower C P U time compared to the three-index formulations. Keywords: vehicle routing problem, multiple depots, mixed-integer linear programming model, vehicle-flow formulation J E L Classification: C44 A M S Classification: 90C15 1 Introduction Vehicle routing problem (VRP) is a family of combinatorial optimization problems which have many practical applications in the fields of transportation, distribution, and logistics. V R P is defined as designing routes for a fleet of vehicles to serve a set of customers with known demands. Each route is assumed to start and end at a depot and each customer is to be fully serviced exactly once. The primary objective is to minimize the total distance traveled by all vehicles. In a large number of practical situations, additional constraints are usually defined for variants of the V R P . For example, Capacitated VRP, or C V R P (every vehicle has a limited capacity), V R P with time windows, or V R P T W (the service at each customer must start within a given time interval), Multi-depot VRP, or M D V R P (vehicles are located in several different depots), Pickup and Delivery VRP, or P D V R P (customers may require a pickup or a delivery service), Periodic V R P , or P V R P (customers require repeated visits during the planning horizon), etc. This paper is focused on M D V R P . As mentioned above, M D V R P is a variant of the V R P where more than one depot is considered. Given the locations of depots and customers, the M D V R P requires the assignment of customers to depots and the vehicle routing for visiting these customers such that: (1) each vehicle route starts and ends at the same depot, (2) each customer is serviced exactly once by a vehicle, (3) the total demand of each route does not exceed the vehicle capacity, (4) the total cost of the distribution is minimized. Even for relatively small size instances, the M D V R P is NP-hard and difficult to solve to optimality. Therefore, most solution methods proposed for the M D V R P are heuristics and metaheuristics. For example, Tabu Search has been used in [5]. A n Adaptive Large Neighborhood Search approach was introduced in [12]. Genetic Algorithm is used in [14] and a hybrid algorithm based on Iterated Local Search is applied in [13]. Other hybrid metaheuristic algorithms combining Greedy Randomized Adaptive Search Procedure, Iterated Local Search, and Simulated Annealing are proposed in [1]. In [11], the authors present a parallel coevolutionary algorithm based on evolution strategy, and in [3] an algorithm based on the General Variable Neighborhood Search is proposed. Only a few exact algorithms exist for the M D V R P , and these are only practical for relatively small problem sizes. Laporte et al. [8] were the first to report optimal solutions for problem sizes up to 50 customers and 8 depots by use of a branch-and-bound technique. Another exact approach for asymmetric M D V R P s by Laporte et al. [9] first transformed the problem into an equivalent constraint assignment problem, and then applied a branch-and-bound method to problem instances containing up to 80 customers and 3 depots. Contardo and Martinelli [4] proposed 1 University of Žilina, Faculty of management science and informatics, Department of Mathematical Methods and Operations Research, Univerzitná 8215/1, Žilina, Slovakia, zuzana.borcinova@fri.uniza.s 2 University of Žilina, Faculty of management science and informatics, Department of Mathematical Methods and Operations Research, Univerzitná 8215/1, Žilina, Slovakia, stefan.pesko@fri.uniza.sk 36 an exact method based on ad-hoc vehicle-flow and set-partitioning formulations and solved the first by the cutting planes methods and the second by column-and-cut generation. A recent survey of exact and heuristic methods for solving M D V R P can be found in [10] and in [6]. Our contributions lie in introducing a new formulation for M D V R P under capacity constraint. Specifically, we propose a model based on a two-index vehicle-flow formulation, to which we include a new constraint used to forbid routes to have the starting and ending points at two different depots. To validate our approach, we also consider V R P with single-depot ( C V R P ) as a particular case of M D V R P . The remainder of this paper is organized as follows. Section 2 describes the M D V R P . A new two-index mixed-integer linear programming formulation for M D V R P is proposed in Section 3. In Section 4, a computational comparison of different formulations for M D V R P on several families of instances is reported. Finally, some conclusions are drawn in Section 5. 2 Multi-Depot Vehicle Routing Problem The M D V R P can be formalized as follows. Let G = (V, H) be a complete directed graph with the node-set V and the arc-set H - {{i,j) : i,j &VJ + j}. The nodes are partitioned into two subsets: the set of customers to be served Vc and the set of depots VD, with y c u Vb = V and Vc n Vb = 0- The positive traveling cost cy is associated with each arc e H. Each customer i e Vc has a certain positive demand di. In every depot, there is a fleet of p vehicles with the same capacity, denoted as Q, whereas K represents the set of all vehicles. Figure 1 shows a M D V R P with twelve customers, in the form Vc - {1,2, . . . , 1 2 } and two depots, named Vb = {13,14}. In this example, depot 13 has three routes, that serves customers 1, 2, 3, 4, 5, 11, and 12, while depot 14 has two routes, serving customers 6, 7, 8, 9, and 10. In the following, we introduce a mathematical formulation for M D V R P that extends C V R P [7] to consider multiple depots. Let three-index binary decision variables jcy* are equal to 1 when vehicle k visits node j immediately after node i, and 0 otherwise. Auxiliary variables y,-fc e (di, Q) indicate upper bound on the load already distributed by vehicle k upon leaving customer i. The three-index mathematical model is defined as follows: MDVRP-3i Figure 1 Example of M D V R P (1) (iJ)eHkeK 37 Subject to 2 J]jsy* = l , V j e V c , (2) ieVJtj keK x y f c - J] xj i k = 0, Vk€K;J€Vc, (3) J] J] xijk < 1, Vfe € tf, (4) i'eVD j€V,i*j J] J] xjik < 1, Vfe € tf, (5) !€Vd J€V,i±j yik + dj-yjk<{\-xijk)Q, Vi €Vc;j €Vc,i ± j;k € K, (6) x y f c e { 0 , l } , W(i,j)€H;k€K, (7) 4 < y » < 0 , Vi€Vc;k€K. (8) Objective (1) minimizes the total traveling cost. Constraints (2) guarantee that each customer is visited exactly once. Constraints (3) impose the degree balance of each node, including both customers and depots. Constraints (4) and (5) establish that each vehicle starts and finishes at most in one depot. Constraints (6) eliminate the sub-tours in the solutions and ensure that the vehicle capacity is not exceeded. Finally, (7) and (8) are obligatory constraints. Three-index formulations for V R P have limited practical use due to their large number of variables. In the next section, we propose a two-index vehicle-flow formulation for M D V R P that naturally eliminates the number of variables. 3 Two-index mathematical model of MDVRP In this section, we first describe a new constraint used to assign the customers to the depots and then present the formulation itself. 3.1 Assignment constraint Using the same notation as for three-index formulation, for every arc (i, j) e H, we define a binary variable jcy equal to 1 i f arc (i, j) is used. For every customer i eVc, let Zi e Vb be an integer decision variable that determines which depot customer i is assigned to, according to the following constraints: M (xij - \ ) < Z i - Z j < M (1 - xtj), V(i, j) e H, (9) where M is a sufficiently large positive constant. For each depot i e VD is explicitly zt = i. Proposition 1. Constraints (9) impose that if the arc is used by a vehicle route, then customers i and j are assigned to the same depot. Proof. Let the arc (i, j) is used by a vehicle route, i.e. jcy = 1. Then according to (9) is 0 < n - zj < 0, what implies that Zi = Zj. Otherwise, i f the arc (i, j) is not used, i.e. x y = 0, then -M < zt - Zj < M, it means that Zi and Zj can but may not be the same. • For example, in Figure 2, x%4 = 1 and Z3 - z\ — 13. In other case, = 0 and both z\, Z5 are equal to 13, while also X4,6 = 0, but z\ and Z(, are different (z4= 13 and Z(, = 14). Note that (9) also forbid routes to have the starting and ending points at two different depots. 38 Figure 2 Assignment variables 3.2 A two-index vehicle-flow formulation Analogously as in three-index formulation, auxiliary variable yt e (du Q) is also associated with every customer and used in the sub-tour elimination constraints that ensure the continuity of the vehicle route in terms of the demand. Now, we can formulate the M D V R P by the following mixed-integer linear programming model: MDVRP-2i Minimize ^ Cij Xij, (10) (iJ)eH Subject to 2 xtj = l, V i e V c , (11) J] xji = l, V i e V f c , (12) jeV.itj YJ X H ^ P> V / e (13) ! ' e V c M (Xij - 1) < zt - zj < M (1 - x^), V(i, j)eH (14) yt + dj-yj<(l- x^) Q, Vi e Vc, j e V c , i # (15) x i 7 e { 0 , l } , V(i,j)€H, (16) 4 < y* < ft V i e Vc, (17) zt = i, Vi e Vb, (18) min(VD ) < Zj < max(Vz)), V; e V c . (19) Constraints (11) - (13) are degree constraints for customers and depots, respectively. The assignment constraints of customers to depots are given by (14). Constraints (15) are the capacity and sub-tour elimination constraints. At last, the obligatory constraints (16) - (19) define different kinds of decision variables. 4 Computational experiments In this section we describe the implementation of proposed mathematical models and the results obtained on a set of problem instances. The mathematical models were coded in Python 3.7 and solved by the solver Gurobi 8.1. A computer equipped with an Intel Core i7-5960X, 3 G H z processor, and 32 G B of R A M was used to perform the computational experiments. The problem instances used in our experiments are generated based on Cordeau's instances [5]. Namely, for generating an instance of n customers and m depots, we randomly extract from the larger instance pOl n customer locations and their demands as well as m depots locations. The number of available vehicles per depot p is varied from 1 to 3. The rationale behind this is to provide instances where the solver can found an optimal solution within the short time limit to have the possibility of comparing the performance of both formulations proposed in this work, i.e. M D V R P - 3 i and M D V R P - 2 i , respectively. 39 Instance MDVRP -3i MDVRP - 2i id n m /? ß Cost Gap{%) Time(s) Cost Gap{%) Time(s) 1. 20 2 1 160 314.973 0.000 6.73 314.973 0.000 1.62 2. 20 2 2 80 365.902 0.000 970.15 365.902 0.000 97.74 3. 20 2 3 80 360.997 0.000 1 547.47 360.997 0.000 25.47 4. 20 3 1 160 308.103 0.000 1.56 308.103 0.000 1.03 5. 20 3 2 80 326.138 0.000 101.68 326.138 0.000 6.54 6. 20 3 3 80 326.138 0.000 460.61 326.138 0.000 3.37 7. 20 4 1 80 340.714 0.000 1 508.49 340.714 0.000 28.56 8. 20 4 2 80 323.487 0.000 360.81 323.487 0.000 9.34 9. 20 4 3 80 323.487 0.008 325.54 323.487 0.000 3.95 10. 25 2 1 240 345.251 0.000 2 325.33 345.251 0.000 5.35 11. 25 2 2 160 367.284 0.000 1 399.47 367.284 0.000 199.33 12. 25 2 3 160 367.284 0.000 5 272.81 367.284 0.000 395.75 13. 25 3 1 160 358.960 0.003 128.02 358.960 0.000 11.53 14. 25 3 2 160 358.960 0.000 224.18 358.960 0.000 15.11 15. 25 3 3 160 358.960 0.000 288.44 358.960 0.000 17.67 16. 25 4 1 160 375.638 0.000 249.26 375.638 0.000 26.42 17. 25 4 2 160 357.411 0.000 134.68 357.411 0.000 28.26 18. 25 4 3 80 388.732 0.000 3 680.19 388.732 0.000 84.87 Table 1 Comparison of computational times Instance n P Q BKS Cost Gap{%) Time(s) P-nl6-k8 15 8 35 450 450.000 0.000 1.49 P-nl9-k2 18 2 160 212 212.000 0.000 320.79 P-n20-k2 19 2 160 216 216.000 0.000 2825.22 P-n21-k2 20 2 160 211 211.000 0.000 519.59 P-n22-k2 21 2 160 216 216.000 0.000 1225.74 P-n22-k8 21 8 3000 603 603.000 0.000 165.35 P-n23-k8 22 8 40 529 529.000 0.000 3065.47 Table 2 Results for C V R P Table 1 reports the computational results obtained by both optimization models. In this case, the first column, divided into five sub-columns, shows the main characteristics of the test problems: id - the identifier of instance to solve, n - the number of customers, m - the number of depots, p - the number of vehicles per depot, and Q - capacity of the vehicle. The next columns show the results obtained by the three-index model MDVRPSi and two-index model MDVRP-2i. In each case, columns Cost, Gap{%), and Time(s) present the objective value, gap, and computational time in seconds reported by solver, respectively. As shown in this table, both models solved all instances to optimality. However, model M D V R P - 2 i , in general, is much faster when compared to model M D V R P - 3 L To validate our two-index model of M D V R P , we also consider the C V R P as a particular case of the M D V R P where the number of depots m = 1. We conduct computational experiments on several instances from the literature, namely class P proposed by Augerat [2]. These instances can be found in h t t p : / / v r p . a t d - l a b . i n f . p u c - r i o . b r / i n d e x . p h p / e n / . We present the computational results obtained with M D V R P - 2 i in Table 2. In this table, the column labeled Instance represents the name of the instance, and columns n, p, and Q show the number of customers, the number of vehicles, and the vehicle capacity, respectively. Column BKS represents the best-known solution to the problem, as reported in the literature. Under columns labeled Cost, Gap{%), and Time(s) we report the results obtained by M D V R P - 2 L These results confirm that the proposed two-index model with a new assignment constraint can solve exactly two classes of problems, the M D V R P and the C V R P . 40 5 Conclusion In this work, we have presented a new two-index formulation for a multi-depot vehicle routing problem under capacity constraint. The problem is modeled using vehicle-flow formulation including a new assignment constraint of the customers to the depots, that is shown to forbid routes to have the starting and ending points at two different depots. Presented results allow us to conclude that our model considerably improves the computational speed. Although our model has been tested only for the M D V R P and the C V R P , it can be extended to other variants of the V R P This would be a possible topic of our future research. Acknowledgements This work was supported by the research grants V E G A 1/0776/20 „Vehicle routing and scheduling in uncertain conditions" and the Slovak Research and Development Agency under the Contract no. A P W - 1 9 - 0 4 4 1 "Allocation of limited resources to public service systems with conflicting quality criteria". References [1] Allahyari, S., Salari, M . , Vigo, D . (2015). A hybrid metaheuristic algorithm for the multi-depot covering tour vehicle routing problem. Eur. J. Open Res. 242 (3). (pp. 756-768). [2] Augerat, P. (1995). Approchepolyhedrale duprobleme de tournees de vehicules. Ph.D. Thesis, France: Institut National Polytechnique de Grenoble. [3] Bezerra, S.N., de Souza, S.R., Souza, M . J . F (2018). A G V N S algorithm for solving the multi-depot vehicle routing problem. Electron. Notes Discrete Math. 66, 5th International Conference on Variable Neighborhood Search, (pp. 167-174). [4] Contardo, C , Martinelli, R. (2014). A new exact algorithm for the multi-depot vehicle routing problem under capacity and route length constraints. Discret. Optim. 12 (pp. 129-146). [5] Cordeau, J . F , Gendreau, M . , Laporte, G . (1997). A tabu search heuristic for periodic and multi-depot vehicle routing problems. Networks 30(2) (pp. 105-119). [6] Jayarathna, D.G.N.D., Lanel, G.H.J., Juman, Z . A . M . S . (2021). Survey on Ten Years of Multi-Depot Vehicle Routing Problems: Mathematical Models, Solution Methods and Real-Life Applications. Sustainable Development Research; Vol. 3, No. 1 DOI: 10.30560/sdr.v3nlp36. [7] Kulkarni, R.V., Bhave, P R . (1985). Integer programming formulations of vehicle routing problems. European Journal of Operational Research 20 (pp. 58-67). [8] Laporte, G . , Nobert, Y , Arpin, D . (1984). Optimal solutions to capacitated multidepot vehicle routing problems. Congressus Numerantiwn 44 (pp. 283-292). [9] Laporte, G., Nobert, Y , Taillefer, S. (1988). Solving a family of multi-depot vehicle routing and locationrouting problems. Transp. Sci. 22 (pp. 161-172). [10] Montoya-Torres, J.R., Franco, J.L., Isaza, S.N., Jimenez, H.F., Herazo-Padilla, N . (2015). A literature review on the vehicle routing problem with multiple depots. Comput. Ind. Eng. 79 (pp. 115-129). [11] de Oliveira, F.B., Enayatifar, R., Sadaei, H.J., Guimaraes, E G . , Potvin, J Y . (2016). A cooperative coevolutionary algorithm for the multi-depot vehicle routing problem. Expert Syst. Appl. 43 (pp. 117-130). [12] Pisinger, D., Ropke, S. (2007). A general heuristic for vehicle routing problems. Comput. Oper. Res. 34 (8) (pp. 2403-2435). [13] Subramanian, A . , Uchoa, E., Ochi, L . S . (2013). A hybrid algorithm for a class of vehicle routing problems. Comput. Oper. Res. 40 (10) (pp. 2519-2531). [14] Vidal, T., Crainic, T.G., Gendreau, M . , Lahrichi, N , Rei, W. (2012). A hybrid genetic algorithm for multidepot and periodic vehicle routing problems. Oper. Res. 60 (3) (pp. 611-624). 41 Portfolio selection via a dynamic moving mean-variance model Adam Borovička1 Abstract. Investment decision making, or portfolio selection, is not usually an easy process. Development in the capital market can be influenced by many factors. Prediction in this field becomes difficult. This uncertainty is reflected in the performance of the investment. Many approaches and methods supporting investment decision making reflect the uncertainty through the risk of not meeting the expected investment profit. Notoriously known Markowitz model measures the risk by a variance. However, one fact is neglected by (not only) this model. Risk, or return, is unstable over time. This dynamics should be taken into account to make a representative, robust decision. Dynamizing the process is performed via a 'moving' form of both characteristics. The dynamic version of Markowitz model called a moving mean-variance model is proposed. Both characteristics are enumerated through a developed approach in all overlapping subperiods from which moving means and (co)variances are calculated. The designed model is applied to select a portfolio from popular open unit trusts. To demonstrate the benefit of a dynamized model, the result is confronted with the output of a 'static' mean-variance model. Keywords: dynamic, moving mean-variance, portfolio, unit trust J E L Classification: C44, C61, G i l A M S Classification: 90B50, 90C30, 90C70 1 Introduction The portfolio selection problem is still a 'hot' topic. Why? A t first, more and more people are considering investing to appreciate their free funds. This effort is accelerated by an ever-expanding range of investment instruments covering wide investor audience. Secondly, a development in the capital market is not usually predictable, which makes a decision making complicated. These phenomena of the world of investment encourage the development of user-friendly methods and approaches that would be a significant support for making representative, robust investment decisions. One of the most essential questions within the investment decision making process is a portfolio selection. For this purpose, the Markowitz mean-variance concept, based on a diversification idea, is widely applied [5,6]. This approach can take into account two most important characteristics of the investment - return and risk. Another benefit is its easy applicability for a wider range of investment instruments. On the other side, the already massive use of this concept is sometimes limited by a few shortcomings. Besides a detailed discussed assumption of normally distributed returns or a penalization of positive deviation from average return, one aspect (especially in the context of the capital market environment) is, in my opinion, neglected. This is an assumption of stability of risk, or return of the portfolio over time. However, a level of the uncertainty in the capital market may not be stable over time. It means that the monitored characteristics, measured by mean and variance, would not have to be considered as static elements. H o w to take this aspect into account in the model? Answering this crucial question can move a model applicability closer to the portfolio making reality. One way to do this can be through the fuzzy set theory. In current literature, a mean-variance model is fuzzified through the returns represented as (triangular) fuzzy numbers. Then, the mean and (co)variance of fuzzy numbers are calculated, e.g. Huang [2]. This process quantifies ('defuzzifies') both characteristics on the vague (fuzzy) data in the crisp form. This concept, however, does not fully follow the instability of investment characteristics over time. A better way, how to represent a variable uncertainty, or instable (return) volatility, is an expression of mean and variance of the portfolio return as triangular fuzzy numbers on the crisp data [1]. However, this sophisticated approach may weaken its applicability due to a slightly more complex algorithm. So, I dare to propose an even friendlier approach based on 'moving' concept. Then the instability of mean and variance over time is concluded by dynamizing the process using a moving average. The observed historical period is divided into a few smaller 1 Prague University of Economics and Business, Department of Econometrics, W. Churchill Sq. 4, Prague, Czech Republic, adam.borovicka@vse.cz. 42 parts shifting by one-time subperiod. In these time-overlapping periods, both characteristics are calculated. Finally, the general mean and variance of the portfolio return are computed from partial characteristics of the subperiods via simple or weighted averaging. The main benefit of such a concept is less data complexity and user-friendliness. Thus, the main aim of this article is to improve the original mean-variance concept to take into account a present instability of uncertainty (volatility) through by dynamizing the process using moving average concept. Novel concept is called a moving mean-variance model. The second aim is to demonstrate an application power of the proposed concept in a portfolio selection process in the capital market with increasingly popular open unit trusts. Composition of the portfolios made from the unit trusts offered Česká spořitelna is analyzed. The result is compared by the output of original mean-variance model to demonstrate the algorithm-application differences. The rest of the article is structured as follows. Section 2 describes a portfolio selection procedure based on the proposed moving mean-variance model. Section 3 deals with a practical selecting a portfolio from the open unit trusts via a developed approach. Section 4 summarizes the main contributions of the article and outlines some ideas for future research. 2 Portfolio selection procedure using moving mean-variance concept This section proposes a novel version of mean-variance model based on the 'moving' principle. This model is embedded in the whole portfolio selection process that is described in the following several steps. 2.1 Step 1: Determination of the investment policy At the beginning of the investment decision making process, the investment policy must be declared [4,7]. The first very important aspect is a purpose of the investment - financing of the study, financial protection of the pension age, loan repayment, creation a fund for unexpected events, etc. The investment horizon is closely related to the investment intention. The amount of available financial funds also affects the investment. The frequency of the investment is also important (on-time, continuous). The investment is significantly influenced by the attitude to the risk which is also related to the expectation of investment performance. The form of a portfolio management (passive, active) also shapes the investment policy, experiences or financial literacy as well. 2.2 Step 2: Data collection and characteristic calculation After a preselecting suitable investment instruments (assets) based on the investment policy, all necessary data (price, fee, volume of trades, etc.) must be collected. A l l required characteristics of the assets (return, risk, etc.) are then calculated. Financial data are mostly publicly available, which is a positive aspect of a decision making in the capital market. Assuming known historical prices, return and risk of the assets can be calculated. Let p defines the number of equally long time periods with n observations of returns for m assets. Then return as a mean of the z'-th asset in the f-th period is calculated as follows i = l,2,...,m, t = 1,2,...,p, where rm, i = 1 , 2 , m , t = 1 , 2 , p , k = l , 2 , n , represents the fe-th observation of a return of the z'-fh asset in the f-th period. The difference between the beginning, or end, of two consecutive periods is constant throughout the considered history. Neighboring periods overlap between the beginning of one period and the end of the previous one. Then the return of the z'-th asset as moving mean can be proposed in a simple, or weighted form as follows p p (2) r,=——, or rt = ^wt7it i=\,2,...,m, where wt,t =\,2,...,p,is the weight of the f-th period. The weights are standardized, so the following holds £ ; > ( = 1 . The weights can reflect an importance of the particular subperiods. For instance, it is possible to consider a stronger effect of last subperiods to the development at the beginning of the investment horizon. Over time, the influence declines, and development occurs over a longer time horizon, which, however, is always more 43 or less influenced by the recent past. Therefore, the values of weights of the subperiods have a declining trend towards the past. Weights can be subjectively determined using (e.g.) scoring method. The covariance of return of the i-th and 7-th asset in the f-th period is computed through the following formula Z O k -rit)(rjtk -rjt) vv=— i,j = \,2,...,m,t = \,2,...,p, where ruk, or rjtk,i, j = 1 , 2 , m , t = 1 , 2 , p , k = 1 , 2 , n , denotes the k-th observation of return of the z'-th, ory'-th asset in the t-th period. Then the moving (co)variance of return of the i-th and y'-th asset can be developed in a simple, or weighted form as follows 1 p p = - I X » ' o r a n = E w « ° w Uj=l,2,...,m. (4) P (=1 1=1 Now, the return and risk of the portfolio can be designed, the model with all investment conditions as well. 2.3 Step 3: Portfolio making After processing data over all periods, the following moving mean-variance model is proposed. This is Markowitz model (without short sales) with specially prepared data (made in the previous step) min xT Ex rT x > r eT x = l (5) 1 x>0 where xT = (x1, x 2 , x m ) is the vector of variables denoted as xt,i = 1 , 2 , m , representing a share of the i-th asset in the portfolio. The vector r T = (t\,r2,...,rm) contains returns measured by moving mean marked as r(,i = 1,2,...,m, for the i-th asset. £ = (cry ) is the matrix with the generic elements cr:j,i, j = l,2,...,m, reflecting the mutual influence of i-th and y'-th asset return measured by moving (co)variance. eT is a vector of ones only serving for making a portfolio as a whole, r' denotes a minimum required level of portfolio return (reference return). The designed model can be supplemented through another necessary investment conditions, e.g. minimum/maximum share of one asset in the portfolio, that can be mathematically formulated within the set X. This version of the model minimizes investment risk, denoted as xT Ex measured by moving variance of the portfolio return, under the conditions of minimum required return due to its user friendliness. Setting a return is certainly easier and more practical than determining the risk. The reference value of portfolio return can be inspired by its minimum and maximum possible level. Denote rT x™a '', rT x™l n as a maximum, or minimum attainable portfolio return (formalized as r T x measured by moving mean) representing its ideal and basal value on the set of all necessary investment conditions (formulated in model (5) and eventually in the set X). Then the following holds x™* = arg max rT x x f ° = arg min xT Ex eT x = 1 eT x = 1 , or . (6) x>0 x>0 xeX xeX The basal value is reasonably determined in the context of (the best) risk value. Finally, the minimum required level of portfolio return can be determined through the following proposed formula r' =r T x f n + h (rT x™a x - r T x f " ) , (7) where h e (0,1) actually measures a rate of risk aversion i n the spirit of shabby "higher return-higher risk" (closer to 0 means higher risk aversion). O f course, the reference level can also be set in another way. In any case, the interval ^rT x™m ,rT x™a ''^ should guide preferences in the portfolio selection process. The effective frontier can then be drawn on this return interval. Thus, after a reference level determination, the model (5) with any additional 44 conditions (included in the set X) can be solved (in any optimization software supporting quadratic programming) to make a portfolio. 2.4 Step 4: Portfolio evaluation and revision The passive investor lets the portfolio "to live its own life". The active investor regularly monitors a portfolio performance. The composition of the portfolio is revised as needed. To reoptimize the portfolio, model (5) with the actual data and investor preferences can be applied very effectively. The preferences may change with regard to the current life situation of the investor and his surroundings. 3 Selecting the portfolio of unit trusts via moving mean-variance model Let us introduce a real-life portfolio selection problem in the Czech capital market with open unit trusts. The analysis focuses on the most often investment situation reflecting longer-time investment made by investors having a smaller amount of free funds. To select the most suitable investment portfolio, a designed portfolio selection procedure using proposed moving mean-variance model is applied. 3.1 Step 1: Investment policy specification The investment is conceived as longer-term. Its purpose can be a financial protection in pension age. Or more generally, this is the intention of appreciation of available funds, which will not be needed in the foreseeable future. Such an intention determines a rather conservative investment approach. This investor is not able to take too much risk. He would rather be satisfied with a smaller, but 'surer' return. Many investors of this category are not very experienced in the capital market. His role will be rather passive. Under all mentioned circumstances, the open unit trust is a suitable investment instrument. Although a composition of the portfolio will not be subject to excessive changes, it should be clear to the investor. Therefore, the portfolio should not consist of too many funds. Based on a personal investment experiences of the author and discussion of the investment consultant, an adequate number of assets is three to six. 3.2 Step 2: Price collection, return and risk calculation To invest in the open unit trusts, the investor usually turns to his home bank. Today, most banks in the Czech market already offer open unit trusts, or at least mediate trading with them. A s a long-term client of Česká spořitelna with a lot of practical experiences with its investment instruments, the open unit trusts offered and managed by Česká spořitelna are chosen. In addition, the value of property i n the Česká spořitelna funds is the second largest in the Czech market. The offer of open unit trusts is very wide. Based on the investment policy, thirteen open unit trusts are preselected. There are five bond funds (Sporoinvest, Sporobond, Trendbond, Corporate Bond, High Yield Bond), five mixed funds (Fund of Controlled Yields, Equity M i x , Dynamic M i x , Balanced M i x , Conservative Mix) and three equity funds (Sporotrend, Global Stocks, Top Stocks). These unit trusts have a sufficiently long history. In order to represent a long-term price development, the period from 2011 to 2019, representing price falls, ups and also calmer times, is selected. Prices from the last trading day of the moth are downloaded from the Česká spořitelna Investment Center [3]. To reflect a dynamic instability, a nine-year period is divided into five overlapping five-year subperiods gradually shifted by one year, thus from 2011-2015 to 2015- 2019 period. In each subperiod, the mean (return) and (co)variances of open unit trusts are calculated from 60 observations through formulas (1) and (3). Then the moving means and (co)variances are computed by (2) and (4). The weights are determined based on the idea of more significant influence of recent development (about this idea see more in Section 2.2). Then the weights of five subperiods are chronologically determined as 0.1 (for period 2011-2015), 0.1 (2012-2016), 0.2 (2013-2017), 0.25 (2014-2018) and 0.35 (2015-2019) based on subjective discretion reflecting personal (analytic) investment experiences supported by scoring method. Both essential characteristics (in %), gradually from bond, through mixed to equity funds, are shown below in the matrix and vector (9). 3.3 Step 3: Portfolio making Through the models (6) with the additional conditions 0.15y < x < 0.4y,y e {0,1} included in the set X ensuring the requirement for a limited number of funds in the portfolio, the basal and ideal value of a portfolio return is determined. To make a portfolio, the model (5) with additional conditions formulated as follows 45 min x T E x r T x > 0.347 eT x = l 0.15y < x < 0 . 4 y ' x > 0 y e {0,1} (8) where x = (xl,x2,...,xl3) represents shares of the funds in the order indicated in Step 2, i.e. from i = 1 » Sporoinvest to i = 13 » Top Stocks. With the same order of elements, the matrix of (co)variances and vector of returns are constructed in the following form 0 016 0 042 0 069 0.049 0.077 0.020 0.140 0.110 0.084 0.047 0.133 0.130 0.178 " "-0.006" 0 042 0 374 0 612 0.138 0.292 0.063 0.379 0.369 0.341 0.212 0.586 0.500 0.254 0.130 0 069 0 612 2 755 0.774 0.764 0.137 1.078 0.953 0.842 0.500 3.476 1.502 0.373 -0.077 0 049 0 138 0 774 1.727 0.934 0.120 1.254 0.948 0.658 0.344 3.468 0.851 1.575 0.150 0 077 0 292 0 764 0.934 1.447 0.166 2.195 1.638 1.175 0.589 3.638 1.988 2.833 0.246 0 020 0 063 0 137 0.120 0.166 0.037 0.309 0.235 0.175 0.093 0.392 0.343 0.413 -0.059 0 140 0 379 1 078 1.254 2.195 0.309 6.624 4.719 3.154 1,464 7.024 7.034 9.954 , r = 0.378 0 110 0 369 0 953 0.948 1.638 0.235 4.719 3.408 2.309 1.089 5.158 5.002 6.952 0.258 0 084 0 341 0 842 0.658 1.175 0.175 3.154 2.309 1.605 0.778 3.633 3.374 4.461 0.219 0 047 0 212 0 500 0.344 0.589 0.093 1.464 1.089 0.778 0.400 1.767 1.583 1.972 0.110 0 133 0 586 3 476 3.468 3.638 0.392 7.024 5.158 3.633 1.767 17.762 6.534 8.209 0.143 0 130 0 500 1 502 0.851 1.988 0.343 7.034 5.002 3.374 1.583 6.534 10.583 10.440 0.692 0 178 0 254 0 373 1.575 2.833 0.413 9.954 6.952 4.461 1.972 8.209 10.440 22.447 0.854 (9) The reference level of portfolio return is computed in the spirit of a longer-term rather conservative strategy. Thus, h = 0.5 from (7) reflects a more risk-averse attitude. Then the reference return level is computed as r =0.0001 + 0.5(0.695-0.0001) =0.347%. The solution of model (8) represents the portfolio with a following composition: 38.21% Sporobond, 29.22% High Yield Bond and 32.57% Global Stocks. The significant share of Sporobond is not surprising. This fund has solid positive return and mainly has a strong diversification ability through low covariances of returns with other funds. Equity fund Global Stocks significantly helps to achieve a reference return of the investment by the second greatest monthly expected return. Its greater risk is compensated by the third fund High Yield Bond with solid return and diversification power with the other two funds. In case of greater risk aversion, mixed fund Sporoinvest will inevitably become (despite virtually zero return) part of the portfolio due to sovereignly lowest covariances. O n the other side, with a declining risk aversion, the equity fund Top Stocks will start participating in the portfolio thanks to the greatest return, especially at the expense of Sporobond. Finally, how did the application of the dynamized version of the mean-variance model (proposed 'moving' form) affect the investment decision making? Let us compare the efficient frontiers made by the original mean-variance model, moving mean-variance model with simple and weighted characteristics (named as simple and weighted moving mean-variance model) as shown in Figure 1. - Mean-variance - Simple moving mean-variance - Weighted moving mean-variance Risk [%] Figure 1 Effective frontiers made by three various form of mean-variance model 46 It is obvious that a dynamic version of a mean-variance model provides a different set of portfolios than the original Markowitz form. The set of funds participating in the portfolios is quite stable (approximately 6 funds), but their shares sometimes differ significantly. The efficient frontier from the simple moving mean-variance model quite copies the efficient frontier generated by mean-variance model. The main reason is an applied simple form of moving average which does not significantly change the input data of the original model. The weighted moving form provides a much different result. As can be seen that the portfolios with the same risk mostly provide a lower level of return than the portfolios made by mean-variance model. The main reason is the fact that the last five-year subperiod 2015-2019 showing lower returns has significantly highest weight. Emphasizing the impact of the recent subperiod proves to be appropriate, which ultimately confirms the beginning of 'life' of the investment in 2020. Although the time is too short for a longer-term investment, the performance from January 2020 to March 2021 is positive (4.78%). So far, the investment decision based on the moving mean-variance model seems to be the right one. However, only time will tell about the overall success of the investment. 4 Conclusion The article deals with a modification of the mean-variance model to take into account an instable uncertainty (volatility) in the capital market. The proposed dynamic version of the Markowitz model is named moving meanvariance model. The observed historical period is divided into shorter overlapping subperiods within which a mean and (co)variance are calculated. Gradual shifting by one time period from the start to the end of the historical period can capture the changing return instability. Inclusion of this non-negligible feature of the capital market is reflected in the composition of the investment portfolio(s) which is more or less different from the portfolio made by the original mean-variance concept. The proposed concept has proven itself algorithmically and practically. Adequate inclusion of the variable instability has proven to be beneficial, in particular when selecting a portfolio of open unit trusts. The proposed moving mean-variance model is clearly trying to get closer to the reality of capital market processes which makes an investment decision more representative and robust. Further research could be focused on the weights of particular subperiods. A s they influence the result, i.e. the final decision, determining their values would deserve even more detailed analysis. Another interesting research area would be to compare a 'moving' version of the model with its fuzzy form also expressing the changing uncertainty, both from an algorithmic and application point of view. Acknowledgements The research project was supported by Grant No. F4/42/2021 of the Internal Grant Agency, Faculty of Informatics and Statistics, Prague University of Economics and Business. References [1] Borovička, A . (2021). Stock portfolio selection under unstable uncertainty via fuzzy mean-semivariance model. Submitted to Central European Journal of Operations Research. [2] Huang, X . (2007). Portfolio selection with fuzzy returns. Journal of Intelligent & Fuzzy Systems, 18, 383- 390. [3] Investment Center. (2021). Archiv prodejních cen fondů, [online] Available at: https://cz.prod­ ucts.erstegroup.com/RetaiVcs/Ke_stauC5uBEenuC3uAD/Dokumenty_ke_stauC5uBEenuC3uAD/Archiv_prodejnuC3uADch_cen_fonduC5uAF/index.phtml, [Accessed 10 M a y 2021] [4] Levy, H . (1999). Introduction to investments. Cincinnati: International Thomson Publishing. [5] Markowitz, H . M . (1952). Portfolio selection. Journal of Science, 7, 77-91. [6] Markowitz, H . M . (1959). Portfolio selection: efficient diversification of investments. New York: John Wiley & Sons, Inc. [7] Steigauf, S. (2003). Fondy - jak vydělávat pomocí fondů. Praha: Grada Publishing. 47 A shadow utility of portfolios efficient with respect to the second order stochastic dominance Martin Branda1 Abstract. We consider diversification-consistent D E A models which are consistent with the second order stochastic dominance (SSD). These models can identify the portfolios which are S S D efficient and suggest the revision of portfolio weights for the inefficient ones. There is also a way how to reconstruct the utility of particular investors based on efficient portfolio which they hold. We apply the above mentioned approaches to industry representative portfolios and discuss the risk aversion of the investors. We focus on the sensitivity with respect to various levels of the risk aversion. Keywords: Data envelopment analysis, diversification, second order stochastic dominance, risk aversion, shadow utility, sensitivity J E L Classification: C44 A M S Classification: 90C15 1 Introduction Data envelopment analysis (DEA), introduced in [13], is nowadays an important class of models which serve to access efficiency of decision making units which consume given set of inputs to produce several outputs. The applications ranges from bank branches up to country regions efficiency, cf. [23]. A special attention has been paid to applications in finance, especially to efficiency of mutual funds and investment opportunities in general. Since the seminar work [26], many papers has been published on applications as well as methodology, see, e.g., [5, 11, 14, 25]. Recently, new class of D E A models with diversification, known also as diversificationconsistent D E A (DC D E A ) , was introduced in [19]. These new models overcame the drawback of the traditional D E A models which does not take into account diversification effect between considered investment opportunities. In other words, if risk measures were considered as the inputs, the traditional D E A model underestimated the risk of the combination of investment opportunities and classified some of them as efficient, even though some improvement in the risk criterion is possible. Since the work [19], several D C D E A classes of models were investigated. Note that previously several attempts can be found in the literature, in particular in [11, 12, 17] which were focused on mean-variance, and mean-variance-skewness efficiency. They also introduced shadow utility functions based on the moment criteria. Paper [6] dealt with D C D E A models based on general deviation measures and investigated the strength of the proposed models as well as inclusion of condition on sparsity of portfolios. In [7], the models were generalized and the analysis was extended to coherent risk measures using the directional distance measures. Bootstrap technique was employed to investigate the empirical properties and stability of the models and resulting scores. The dynamic extension was introduced by [21]. They decomposed the overall efficiency of mutual funds over the whole investment period into efficiencies at individual investment periods taking into account dependence among the periods. Paper [8] studied models with Value at Risk inputs and proposed tractable reformulations. Traditional D E A models were used to approximate the efficient frontier and to assess performance of portfolios by [24]. In [22], two directional distance based diversification super-efficiency models for discriminating efficient funds were proposed. Paper [29] was focused on robustness and integrated parameter uncertainty into diversification-consistent D E A models leading to bi-level problems which were then transformed into equivalent single-level D E A problems. Note that in many cases, for discretely distributed returns and proper choices of risk measures, the authors showed that the proposed models can be formulated as linear programming problems which enables to solve even large instances of the obtained problems to optimality. A n important research topic is relation of D E A efficiency and stochastic dominance efficiency. The efficiency with respect to stochastic dominance is a well established concept in financial mathematics since [15, 16], see also [20]. In D E A literature, we can find several papers which were investigating relations to stochastic dominance efficiency, in particular [25] introduced several models which are consistent with second order stochastic dominance (SSD), whereas [18] proposed equivalent models. In [9], an equivalence between new class of diversification-consistent 1 Charles University in Prague Faculty of Mathematics and Physics Department of Probability and Mathematical Statistics Sokolovská 83, Prague 186 75, Czech Republic tel.: +420 221 913 404, fax: +420 222 323 316, branda@karlin.mff.cuni.cz 48 D E A models and stochastic dominance efficiency tests with respect to S S D was shown. This relation was further elaborated by [7]. The equivalences were then generalized to JV-order stochastic dominance efficiency tests in [10]. Recently, [2] proposed new approach how to incorporate the risk aversion of a particular investor into the D C D E A framework which is equivalent with SSD efficiency. They derived a shadow utility which renders the D C D E A / S S D efficient portfolios as optimal for the investor. We will focus on this approach and propose an additional sensitivity analysis with respect to the investor risk aversion. The approach relies on spectral risk measures which were proposed by [1] as a special class of coherent risk measures [4]. Using the proper choice of the risk spectra, we can identify the optimal investment opportunity for any risk-averse investor, see [28]. The paper is organized as follows. Section 2 reviews the D C D E A models and the basic notation of efficiency with respect to the second order stochastic dominance. In Section 3, an approach to risk aversion based on spectral risk measures is summarized. Section 4 provides a numerical study with a special attention to sensitivity with respect to the risk aversion. Below we will assume that n assets with random rates of return Ri are available and we can use any (nonnegative1 ) combination to compose a portfolio. This leads to the following sets of available investment opportunities, or simply portfolios: 2 Diversification consistent DEA models First, we review the general formulation of a diversification-consistent D E A model as it was proposed in [7]. It employs J return measures Sj as the outputs and K coherent risk measures % k as the inputs. Coherent risk measures were proposed in [4] as real functionals on £p(Q.) space with finite p-th moment (usually p e {1,2}), which fulfill the following axioms: (Rl)translation equivariance: < R{X + c) - "RiX) - c for all X e X P ( £ 2 ) and constants e e l , (R2)positive homogenity: 7?(0) = 0, and ft(AX) = Aft(X) for all X e X P ( Í 2 ) and all Ä > 0, (R3)subadditivity: + X2) < R(Xi) + H(X2) for all Xu X2 e £P(Ú), (R4)monotonicity: K(Xi) < K(X2) w h e n X , > X2, XUX2 e £,p(to). Note that the axioms (R2) and (R3) imply convexity. We say that & is a return measure i f there exists a coherent risk measure % such that & - Since both coherent risk as well as return measures can take positive as well as negative values, the D C D E A models proposed by paper [7] were based on the directional distance measures where, for a benchmark portfolio XQ e X, the directions are defined as These directions quantify the maximal possible improvements over the risk and return measures for the benchmark portfolio XQ to reach the efficient frontier. The frontier corresponds to the strong Pareto-Koopmans efficiency, i.e. we say that XQ is efficient, if there is not other portfolio X\ e X such that with at least one inequality strict. This efficiency can be then accessed by the following diversification-consistent D E A model based on directional distance measure: (1) ej(Xo) = maxSj(X) - 8j(X0), dk(X0) = Sj(Xo),Wj, xA > Sj(X0) + {p)dp (5) Jo where we consider the quantile function Fx\p) = min{x : Fx(x) > p}, p e [0,1], (6) and an admissible risk spectrum which must be: (Al)positive: for all / c [0,1] holds £(p)dp>0, (A2)non-increasing: for all q e (0,1) and s > 0 such that [q - s, q + s] c [0,1], holds • q+£ (A3)normalized: r1 (p)dp = 1 (p(p)dp > / (f>(p)dp, ,~£ Jq 11011 - Jo Note that Conditional Value at Risk on level a can be obtained as a special case for risk spectrum for the risk spectrum = — — 1 { 0 < p < I-a}. 1 - a In general, investors can identify their risk aversion by choosing the risk spectrum

s ~ s+l), S = I,. . .,S, (8) (9) together with 0, (12) 1 - e K We will compare the derived shadow risk spectrum and the investor's one. We consider parameters k e { 1 , 2 , . . . , 15} which cover most of the realistic risk aversions of real investors. Table 1 contains the industry representative portfolios with most changes in ranking according to the risk aversion and using the distance between the ideal and projected portfolios. We can observe that portfolio Ships is the best according to the risk aversion represented by parameters k e {1,2,3,4}, whereas is one the worst for k e { 1 0 , . . . , 15}. On the other hand, portfolio Oil, which is the best for k e {13,14,15} is very far from the ideal portfolio for k e { 1 , . . . , 10}. To summarize, we can observe that the ranking is highly dependent of the risk aversion level. 5 Conclusions In this paper, we have reviewed the diversification-consistent D E A models which are equivalent to the stochastic dominance tests with respect to the second order stochastic dominance. We have proposed a sensitivity analysis of the ranking of considered industry representative portfolios with respect to the various levels of the investor's risk aversion. In particular, we have compared distances between the shadow and the theoretical (empirical) risk spectra showing high dependence of the ranking on the risk aversion parameters. More demanding models are postponed as a topic for future research where they can be solved using the numerical technique proposed in [3]. Acknowledgements This work was supported by the Grant Agency of the Czech Republic under the grant project 19-2823IX. References [1] Acerbi, C : Spectral measures of risk: A coherent representation of subjective risk aversion. Journal of Banking & Finance 26 (2002), 1505-1518. [2] Adam, L . , Branda, M . : Risk-aversion in data envelopment analysis models with diversification. Omega 102 (2021), 102338. 52 [3] Adam, L . , Branda, M . , Heitsch, H . , Henrion, R.: Solving joint chance constrained problems using regularization and Benders' decomposition. Annals of Operations Research 292 (2020), 683-709. [4] Artzner, R, Delbaen, F., Eber, J.-M., Heath, D.: Coherent measures of risk. Mathematical Finance 9 (1999), 203-228. [5] Basso, A . , Funari, S.: A data envelopment analysis approach to measure the mutual fund performance. European Journal of Operational Research 135 (3) (2001), 477—492. [6] Branda, M . : Diversification-consistent data envelopment analysis with general deviation measures. European Journal of Operational Research 226 (3) (2013), 626-635. [7] Branda, M . : Diversification-consistent data envelopment analysis based on directional-distance measures. Omega 52 (2015), 65-76. [8] Branda, M . : Mean-value at risk portfolio efficiency: approaches based on data envelopment analysis models with negative data and their empirical behaviour. 40R 14 (1) (2016), 77-99. [9] Branda, M . , Kopa, M . : On relations between DEA-risk models and stochastic dominance efficiency tests. Central European Journal of Operations Research 22 (1) (2014), 13-35. [10] Branda, M . , Kopa, M . : D E A models equivalent to general N-th order stochastic dominance efficiency tests. Operations Research Letters 44 (2) (2016), 285-289. [11] Briec, W., Kerstens, K , Lesourd, J.-B.: Single period Markowitz portfolio selection, performance gauging and duality: a variation on the Luenberger shortage function. Journal of Optimization Theory and Applications 120(1) (2004), 1-27. [12] Briec, W., Kerstens, K , Jokung, O.: Mean-variance-skewness portfolio performance gauging: A general shortage function and dual approach. Management Science 53 (2007), 135-149. [13] Charnes, A., Cooper, W., Rhodes, E.: Measuring the efficiency of decision-making units. European Journal of Operational Research 2 (1978), 429-444. [14] Chen, Z., L i n , R.: Mutual fund performance evaluation using data envelopment analysis with new risk measures. OR Spectrum 28 (2006), 375-398. [15] Hadar, J., Russell, W.R.: Rules for ordering uncertain prospects. American Economic Review 9 (1969), 25-34. [16] Hanoch, G . , Levy H . : The efficient analysis of choices involving risk. Review of Economic Studies 36 (3) (1969), 335-346. [17] Joro, T , Na, R: Portfolio performance evaluation in a mean-variance-skewness framework. European Journal of Operational Research 175 (2006), 446—461. [18] Kuosmannen, T : Performance measurement and best-practice benchmarking of mutual funds: combining stochastic dominance criteria with data envelopment analysis. Journal of Productivity Analysis 28 (2007), 71-86. [19] Lamb, J.D., Tee, K - H . : Data envelopment analysis models of investment funds. European Journal of Operational Research 216 (3) (2012), 687-696. [20] Levy, H . : Stochastic dominance: Investment decision making under uncertainty. Second edition, Springer, New York, 2006. [21] L i n , R., Chen, Z., Hu, Q., L i , Z.: Dynamic network D E A approach with diversification to multiperiod performance evaluation of funds. OR Spectrum 39 (2017), 821-860. [22] L i n , R . , L i , Z.: Directional distance based diversification super-efficiency D E A models for mutual funds. Omega 97 (2020), 102096. [23] L i u , J.S., L u , L.Y.Y., L u , W . - M . , L i n , B . J . Y : Data envelopment analysis 1978-2010: A citation-based literature survey. Omega 41 (1) (2013), 3-15. [24] Liu, W., Zhou, Z., Liu, D., Xiao, H . : Estimation of portfolio efficiency via D E A . Omega 52 (2015), 107-118. [25] Lozano, S., Gutierrez, E.: Data envelopment analysis of mutual funds based on second-order stochastic dominance. European Journal of Operational Research 189 (2008), 230-244. [26] Murthi, B.P.S., Choi, Y . K . , Desai, P.: Efficiency of mutual funds and portfolio performance measurement: a non-parametric approach. European Journal of Operational Research 98 (2) (1997), 408^-18. [27] Rockafellar, R.T., Uryasev, S.: Conditional value-at-risk for general loss distributions. Journal of Banking and Finance 26 (2002), 1443-1471. [28] Wächter, H . P , Mazzoni, T.: Consistent modeling of risk averse behavior with spectral risk measures. European Journal of Operational Research 229 (2) (2013), 487—495. [29] Xiao, H . , Ren, T , Zhou, Z., Liu, W.: Parameter uncertainty in estimation of portfolio efficiency: Evidence from an interval diversification-consistent D E A approach. Omega 103 (2021), 102357. 53 Efficient Values of Selected Factors of DMUs Production Structure Using Particular DEA Model Helena Brožová1 , Milan Vlach2 Abstract. We suggest a particular application of the data envelopment analysis method for estimation of selected inputs and outputs values simultaneously to create all decision-making units as efficient as possible. It is an iterative approach allowing the decision-maker to find an efficient pattern for all decision making units. The proposed method is based on the Cook, Green and Zhu [6] approach and it is illustrated by simple example. Keywords: Efficiency, efficient value of input, efficient value of output, dual role factor, C C R model J E L Classification: C44, C61, D24 A M S Classification: 90B50, 90C05, 90C90 1 Introduction A number of methods have been developed for the analysis of decision making units' ( D M U ) efficiency and evaluation of their inputs and outputs. In general, the more efficient the unit is, the less inputs it consumes and the more outputs it produces. It is a multiple attribute analysis problem, so the problem can be approach by the multiple criteria decision methods. It can be also approach by parametric methods based on statistical tools like regression analysis. There are also nonparametric approaches that use mathematical programming, like the data envelopment analysis ( D E A ) proposed and developed by Charnes, Cooper and Rhodes [4] and Banker, Charnes and Cooper [1]. The D E A measures the productive efficiency of the production process of D M U s on the basis of their own inputs and outputs. Typically, these models give an efficiency index for monitored D M U , which means to achieve an efficient pattern for the D M U by all inputs decreasing or all outputs increasing. However, changing all the parameters of a monitored D M U leading to its efficiency may not always be possible and we may assume that it would often be necessary to determine changes in only some selected factors. Therefore, different problems occur; namely, how to find the selected inputs, outputs or both values maintaining the other parameters of the D M U at which the unit will be efficient or to find this value for all monitored D M U s to be efficient. Cook, Green and Zhu [6] dealt with the similar problem of the so-called dual-role factor or its reallocation leading to the efficient D M U s . Their approach (revisiting Beasley model [2]) assumed that some factors can play both roles - input from the point of view of one D M U and output from the point of view of another D M U - simultaneously. Accordingly, its value needs to be reduced or increased. However, their approach does not always show specific changes and, in addition, may not lead to the efficiency of all units. This approach was extended and generalized by Chen [5]. In this paper, we suggest a specific D E A method to create efficient patterns for all D M U s by estimation of selected inputs and outputs values simultaneously. The method is based on the Cook, Green and Zhu [6] approach and its application is demonstrated by the small example consists of 5 D M U s with 2 inputs and 1 output. One input and one output can be always changed. This example is used, because it can be represented also graphically. The paper is organized as follows. Section 2 introduces briefly the Charnes, Cooper and Rhodes (CCR) D E A model and the dual-role factors D E A model. Section 3 describes our suggested method for evaluation of selected inputs and outputs to achieve efficiency of all D M U s . Section 4 contains example illustrating how the method can be applied. Section 5 concludes the paper by summarizing the paper findings. 1 lCzech University of Life Sciences, Faculty of Economics and Management, Department of Systems Engineering, Kamýcká 129, Prague 6, Czech Republic, brozova@pef.czu.cz 2 2Charles University, Faculty of Mathematics and Physics, Department of Theoretical Computer Science and Mathematical Logic, Malostranské náměstí 25, Prague 1, Czech Republic, milan.vlach@mff.cuni.cz 54 2 DEA models and efficient inputs and outputs values The D E A models are generally used to evaluate efficiency of units and to set the necessary changes in inputs (or outputs) for inefficient units to increase their efficiency and to reach the efficiency frontier. Charnes, Cooper and Rhodes [4] introduced a model, now called the C C R D E A model, for evaluation of the efficiency of a set of D M U s that transforms multiple inputs into multiple outputs under the assumption of the constant returns to scale. The outputs and inputs can be of various characteristics and of variety of forms that can be sometimes difficult to measure. The input oriented C C R D E A model [4] measuring efficiency of D M U is constructed as the maximization of the ratio of weighted outputs to weighted inputs under the constraints that, for all D M U s , this ratio is less than or equal to 1 and its value for the efficient D M U is equal to 1. Let H be the D M U which efficiency is measured. Linearization of C C R D E A model leads to the following linear programming problem (envelopment model): I ViuXiH — 1 1 Maximize UjHyjH subject to _ J V m X . k + J UjHyjk < 0 ,fee K = {1,2 p} W ;'=i UjH > 0, j = 1,2, ...,n viH > 0, i = 1,2,...,m and the corresponding dual problem (multiplier model): * H * £ H - ^ 4 * £ H -St = 0,i = 1,2 m keK Minimize * H subject to yy f c - ^ Af c yy f c + s + = 0,y = 1, 2 , . . . . >7 ( 2 » keK Xk > 0, k 6 K where variables u ; f c and vik are weights of outputs and inputs, Xk are multipliers, st , Sy" are slack variables, y ; f c is the value of jth output from unit k, and xik is the value of ith input to kth D M U , H is index of the evaluated D M U . This notation is used through the whole paper. Let (u* ,v*, A*, s*~, s*+ ), i = 1,2,... ,m,j = 1,2,..., n, be an optimal solution of the problems (1) and (2), and let 4>H be the optimal value of both objective functions. There are two basic results of the D E A models. First, the efficiency score > uy yy f e subject to f^1 (4)7=1 fee * 7 = 1 «iW; L f c < 5i f c < a j w S , 1 = 1 p, ft 6 K Uj > 0, j = 1,2, ...,n vt > 0, i = 1,2,...,m at > 0,8lk >0,,l = l,...,p,k £ K 3 Estimation of the efficient values of selected inputs and outputs Suppose now we want to change only selected parameters of a monitored D M U leading to efficiency of all D M U s . To achieve it, we propose a method how to find the optimal values of selected inputs and outputs with the aim the all D M U s are efficient which is based on the model (4). Let flk,l = 1, ...,p,k £ K are the inputs and gqk,q = 1,... ,r,k £ K are the outputs whose values we wish to determine so that D M U s become as efficient as possible. The proposed model (an extension of linear form of model (4)) is as follows: X XViXik+ Xa ^ik )=1 keK \i=l 1=1 j m V n r - ^ vtxik - ^ ajlk + ^ u j y j k + ^ n q g q k <0,keK In r \ L=l 1=1 7=1 q=l Maximize V VUjyJ k + V nqgqk J subject to wfk < flk < w"k,l = 1,...,p,k £ K (5) keK \j=l q=l I I ^ „ ^ „ÜzL qk 0, j = 1,2, ...,n vt > 0, i = l , 2 , . . . , m aL > 0,/ut > 0,/ = 1 p.kEK ßq > 0,gqk > 0,q = 1 r.keK Using new variables Slk = atflk, a q k = ßqgqk > 0, q = 1,..., r, I = 1,..., p, k £ K, we obtain the following linearization of (5): keK \i=i 1=1 J m V n r vtxik - ^ Slk + ^ u,y,f c + ^ ffqfc < 0, fe £ K In r \ i=l i=l j'=l i=l Maximize V V u; y; f c + V ffqfc J subject to ctfW^. < 5; f c < a;W;k, I = 1,... ,p,k E K (6) k «V=i i=i / u q z L q k < a q k < t i q z u q k , q = l r,fc£tf u ; > 0, j = 1,2,... , n > 0, i = 1 , 2 , . . . , m aL > 0,8lk >0,,l = 1, ...,p,k £ A: > 0, a q k > 0, q = 1,..., r, fe £ 7Y Lemma 1 Both model (5) and model (6) have always optimal solutions. Proof. The feasible set of model (6), that is, the set of points satisfying all constraints of model (6) is bounded and closed, and the objective function is linear. Therefore the maximum of the objective function attains its maximum at some point of feasible set. The feasible set of model (5) is also bounded and closed, it is enough to put fkl = lk /at a n d gqk = 0qk I[iq ,1 = 1, ••• ,p, q = 1, •••, r, k £ K and again its linear objective function has its maximum. • Note: In order to be able to calculate values fki,gqk in the optimal solution of model (6), it is necessary that a.i,u q be larger than some very small positive value e > 0, So, we have to use the constraints Oil > e,,l = 1,..., p and uq > £,,q = 1,...,r. 56 Lemma 2 A l l D M U s are efficient i f and only if, for the optimal solution of model (5) or (6), we have * = 1. B) Suppose now the optimal aggregate efficiency YJj=iUjyjh + T.q=iPqdqh-in this case some D M U d exists, for which has to be YIiLiv ix ld + Y?i=ia ifid < Z y = i u y y/d + Y,r q=iHq9qd- i n this case — £™ i i^id — 2f=ia ;/id + 2y=iu yy/d + Yq=i P-q9qd > 0 - 1 1 means, that at least one constraint is not valid. Such solution is infeasible and input oriented efficiency of such D M U would be greater than 1. This implies that all D M U s have to be efficient. • Lemma 3 There are such lower and upper bounds w/^, wk\ , zqk, zqk, I = 1,...,p, q = 1,..., r, k E K that the optimal solution of model (5) and also (6) have value m a x (gak)-» H H q=l,...,r,k£K H Although it is theoretically possible to find the required values of inputs and outputs so that all D M U s are effective, the results obtained in this way may not always be feasible in practice. Therefore, the following iterative procedure is often terminated even i f not all D M U s are effective. In this case, it is necessary to relax the limits for the searched values of inputs and outputs; that is, to reduce the lower limits of inputs and increase the upper limits of outputs 3.1 Iterative procedure The procedure for obtaining the values of the selected inputs and outputs so that all units are as efficient as possible has the following steps: 1. For efficient D M U s and the selected inputs set the values w/^ = wk\ = flk, I = 1,..., p, k £ K efficient, and the selected outputs set zqk = zqk = g q k , q = 1,...,r, k £ K efficient. 2. For nonefficient D M U s and the selected inputs choose arbitrary the values wfk = smallW = m i n (fik),1 = 1, ••• ,p,k E Knonefficient and wk = flk,I = 1,...,p,k £ Knonefficient. l=l,...,p,keK 3. For nonefficient D M U s and the selected outputs choose arbitrary the values zqk = bigZ.q = 1, ...,r,k £ K nonefficient and zqk = g q k , q = 1,..., r, k £ K nonefficient. 4. Solve the model (6) and find the optimal solution 5. If the optimal value t\ when P{t[) - b. Then close the positions (i.e. sell potfolio n, or, equivalently, buy portfolio -n). Then, repeat {1} and {2} forever. This strategy generates a sequence of times t\ < t[ < ?2 < t'2 < • • •, where the time interval [ti, t•] refers to open positions and the interval [t., f ! + i ] refers to closed positions. Remark. This is just a convention; the strategy could be alternatively reformulated as follows. {1* }Wait until time ti when P(fi) = a. Then buy portfolio rj. {2* }Wait until time fj > t\ when P(f[) = b. Then buy portfolio -2rj. {3* }Wait until time t2 > f[ when P(t2) = a- Then buy portfolio 2rj and iterate forever. This is a symmetric version of the trading with no idle times in time intervals [t'., t{+\]. Its analysis is analogous; namely, in the limit it is a version with doubled profit compared to strategy {1 }-{2}. This is why we can restrict our attention only to {1 }-{2}. Disregarding some pathological cases, the stationarity phenomenon allows us to take advantage of mean reversion, meaning that once the price process P{t) deviates from its mean, it returns to \x in finite time. If the thresholds a, b are chosen "reasonably", then in time t\ we collect deterministic profit n0:=P(t'i)-P(ti) = b-a. In this setup, choice of a trading strategy reduces to the choice of the thresholds a, b. The pair (a, b) is simply referred to as strategy. What is random here is the the length of a trade cycle Ti := ti+i - ti. The tradeoff is between the deterministic profit TTQ and the time to its collection, which is measured by the mean value of the trade cycle as T : = E T T . Thus it makes sense to standardize the profit per unit of time, i.e., to measure profit as n0 = l i m r1 7r0 A^(f), t—>oo where N(t) is the counting process for the number of trade cycles in timer window [0, t]. B y stationarity of P(t), the process N(t) grows at a linear rate and it follows that ITo is well-defined. 2.2 Transaction costs We will assume that a transaction is associated with transaction cost ci per unit of asset At, meaning that adjustment of a position in asset A : costs c: dollars. Then, the profit per trade cycle is n :- b - a - c, including transaction costs n c •= 2 ^ \rji\a (the factor 2 corresponds to the fact that positions are open and then closed which requires a pair of transactions). Then, profit-per-time-unit reduces to n = U(a,b) = l i m t~l nNt. t—>CO 61 2.3 Ornstein-Uhlenbeck process and its standardization To find expressions for HQ, the crucial quantity is the random process N(t). To derive concrete results, it is necessary to make further assumptions about the particular form of the price process P(t). We follow the work [2], [6] and others and assume that it follows the Ornstein-Uhlenbeck equation dP(t) = T(p - P(t)) + crdW{t), with parameters p (mean value), T > 0 (speed of mean reversion) and cr > 0 (volatility), where W(t) stands for the standard Wiener process. In the sequel, if not stated otherwise, all unreferenced statements follow [2] and [6]. Here we make an essential assumption: all of the parameters TJ, p, T , cr are assumed to be known exactly, meaning that they are not econometric estimates. This is an idealistic assumption, which is however frequent in portfolio theory. (In practice, the parameters are estimated from finite-sample observations of the price processes P\(t),... ,P„ (t) which means that they suffer from statistical errors; this fact is disregarded in this text but is surely worth investigating in depth). Then we can use Ito's Lemma and consider a standardized version P of the Ornstein-Uhlenbeck process given by substitution t = Tt, Pit) = {2r)112 o--l {P {t) - p). Now we can assume, without loss of generality, that T = 1, p = 0 and c = V 2 ; this is equivalent to saying that the price process P(t) is standardized. In this setup, the profit Tl(a, b) is indeed a function of a, b only (and not ix, cr, T). A n d the task reduces to the optimization problem "how to choose a, b to maximize Tl(a, b)"l 3 Bertram's optimization problem 3.1 The case with no risk constraints The optimization problem from the last paragraph, maxll(fl, b) subject to b - a - c > 0, (1) a,b is referred to as Bertram's (unconstrained) problem. It has the following properties: (a) Optimization problem (1) is convex. (b) Its optimum is symmetric, meaning that its optimal solution (a*, b*) satisfies b* - -a*. (c) The function Il(a, b) is not elementary; the optimum (a*, b*) cannot be expected to have a closed algebraic form. In particular, the "best" known expression is (using k :- 2k - 1 as a shorthand) oo £ £ n(„,6) = £, r ^ ^ r w ^ i ^ . ( 2 ) k=i k - 3.2 The case with bounded risk It is natural to augment (1) with risk constraints. If R(a, b) is a risk measure, it is assumed that an exogenous admissible risk level ^ m a x is given and the risk-constrained problem takes the form maxnffl, b) subject to b - a - c > 0, R(a, b) < Rmax . (3) a,b First of all, it is natural to consider strategies with long expected trading cycles to be "more adverse" than strategies with shorter cycles. Thus, it is natural to consider a bound on T given by expression (2): Rl :=7~. As a second alternative, a natural choice is the volatility of profit var(nNt) in the long run t —» oo. Unfortunately, lim \iax{nNt) = 0, t—>oo and thus it is a trivial measure. Thus we need standardization in the form of per-time-unit volatility of profit defined as R2 = R2(a, b) := lim rl/2 var(nNt)l/2 . (4) t—>co 62 Figure 1 The space (a, b) of feasible strategies satisfying b - a - c > 0 with fixed costs c - 0.02, the symmetry line a - -b and the optimal strategy for the unconstrained problem (1) depicted by a blue circle. Figure (a): Contour plot of the profit function Tl(a, b). Figure (b): Contour plot of risk measure Ri(a, b), corresponding to the expected length of a trade cycle. Figure (c): Contour plot of risk measure Riia, b), corresponding to the per-time-unit volatility of profit. Figure (d): Contour plot of risk measure R3 (a, b), corresponding to the volatility of the length of a trade cycle. It is easy to show that the normalization of the variance by t w in (4) results in a nontrivial risk measure only for a) = - 1 / 2 . The available expressions for R2 are non-elementary again: we have R2 = n ^ V T - y 2 , where, denoting the digamma function by 0 is the sum of observed frequencies in the y'-th column, ;=i N = 2_2J is the total number of observations. i=i j=i If the null hypothesis H0 is true, then the chi-square statistic %2 has a chi-square distribution with k = ( m - l ) - ( n - l ) degrees of freedom, i.e. %2 ~ %2 (k). P-value of the chi-square test statistic calculated from table 1 was found to be 0.001, meaning that the hypothesis HQ is refused on 1% level of significance (0.00 K0.01). Therefore, there is a relationship between the type of business and the awareness of online marketing. Obviously, small craft businesses are much less aware of online marketing than other enterprises. The ratio of craftsmen not aware of online marketing is 11/33 = 0.3 while the ratio of other enterprises not aware of online marketing is much smaller and equal to 14 /131 = 0.11. Described statistical procedure was performed for all remaining questions and for the questions analyzed later in September 2020. The results are summarized in the following table:2 Question number 1 2 3 4 5 6 7 8 9 10 P-value 0.001 0.031 0.109 0.078 0.003 0.195 0.047 0.066 0.025 0.012 (January 2020) ** * ** * ** ** P-value 0.000 0.000 0.004 0.283 0.000 0.030 0.001 0.002 0.152 0.206 (September 2020) ** ** Question number 11 12 13 14 15 16 17 18 19 20 P-value (January 2020) 0.013 ** 0.006 0.029 ** 0.013 ** 0.167 P-value (September 2020) 0.000 0.001 0.000 0.003 0.071 * 0.003 0.000 0.000 0.083 * 0.000 Table 2 Summary of the results of the chi-square test of independence between categorical variable type of business and other categorical variables defined by the questions 1-20 for the data from January 2020 as well as September 2020. 2 Symbols ***, ** and * indicate that the result of the statistical test is significant on 1%, 5% and 10% level of significance, respectively. 68 Legend: (short description of questions 1-20) 1. awareness of online marketing 11. emailing tool 2. web pages 12. C R M 3. mobile application 13. social networks 4. e-shop 14. effectiveness on social networks 5. contact form 15. P P C 6. chat box 16. budget for online marketing 7. visit rate 17. reputation 8. Google Analytics 18. savings 9. S E O 19. new online technologies 10. S E M 20. new online technologies in the near future The results indicate a statistically important relationship between the type of business and the utilization of a given online marketing tool (specified in questions 1-20) in most cases. This provides empirical evidence for the validity of the hypothesis that the utilization of online marketing tools is different in small craft self-employed businesses than in other enterprises. There are few exceptions in which statistical dependency was not confirmed at none of the standard statistical levels of significance. These are questions numbers 3, 6 and 15 for data from January 2020 and questions 4, 9 and 10 for data from January 2021. The utilization of a given marketing tool by small selfemployed craft businesses and by other enterprises can be considered the same in these cases. Let's now investigate the ratio of small businesses and other enterprises with a positive answer "yes" to a given question, which is summarized by the following figure: January 2020 $ % 0,8 2 3 4 5 6 7 8 9 10 11 12 13 14 15 question • craftsmen - • - other enterprises 1 S % 0,8 '•*> ^ r. ^ £ § 0,6 £ > 0,4 o S .2 S 0,2 3 & _ September 2020 nm \ « \ n \ f ' * \ v / / t \ t / / \* / m — \ ~ » \ • ) s / 1 2 3 4 5 6 7 9 10 11 12 13 14 15 16 17 18 19 20 question craftsmen - • - other enterprises Figure 1 Ratio of small craft businesses and other enterprises with a positive answer to questions 1-20 for the data from January 2020 and September 2020. The figure documents that the ratio of small businesses with a positive answer to the given questions is lower than the corresponding ratio of other enterprises. This result is quite robust as it holds for all questions and for both 69 datasets, which represents another empirical evidence for the hypothesis that the degree of utilization of online marketing tools in small self-employed craft businesses is significantly smaller than in other enterprises. It can also be seen from Figure 1 that there is a relatively high ratio of positive answers in small craft businesses as well as other enterprises in the following questions: 1. awareness of online marketing 2. web pages 5. contact form 7. visit rate 13. social networks 17. reputation 19. new online technologies 20. new online technologies in the near future Thus, the most popular online marketing tools are web pages, contact form and social networks. This observation suggests that there is a potential for promoting the utilization of other online marketing tools between Czech companies (mobile applications, e-shop, chat box, Google Analytics, S E O , S E M , special emailing tools, C R M and PPC). Quite a relatively high percentage of companies monitor the visit rate of their web pages, consider online marketing tools as a way to promote their reputation and begun to use new online technologies and plan to do so in the near future as well. Nonetheless, a rather low percentage of companies are willing to increase their budget for online marketing activities during the current Covid-19 crisis (question number 16). 4 Dynamics trends in utilization of online marketing tools Let's now investigate the dynamics in time. The ratio of firms with a positive answer to the above mentioned questions is now in figure 2 displayed in such a way that enables us to visually analyze changes in time. •5 0 , 4 .2 S 0,2 n 0 \ A V'- • 1 2 3 4 5 6 7 8 «— January 2020 9 10 11 12 13 14 15 16 17 18 19 20 questions • September 2020 Figure 2 Dynamics of the ratio of small craft self-employed businesses and other enterprises with a positive answer to questions 1-20. 70 The first graph of figure 2 shows that the percentage of small craft businesses with a positive answer to abovementioned questions is roughly the same in January as in September 2020. From this point of view, the utilization of online marketing tools among small craft businesses did not change much during the current Covid-19 crisis. Nonetheless, the exploitation of online marketing tools did change among other enterprises, and the change had a positive sign for most of the questions. Therefore, the differences in the utilization of online marketing tools between small craft self-employed businesses and other enterprises increased during the current Covid-19 crisis. 5 Conclusion McKinsey (2020) states that the decline in the global economy due to Covid-19 measures has already overcome the Great Recession of 2009. McKinsey conducted a research showing that only businesses using new technologies addressing the changing environment will stay competitive and will be able to adapt. Their results suggest that the only way out of the global economic crisis is to accelerate the use of new technologies including online marketing tools. Results from this paper show that this is not true for the segment of the Czech small craft businesses. Utilization of online marketing tools by small self-employed companies in the Czech Republic is very low and the current Covid-19 crisis did not change this trend. Utilization of online marketing tools by other Czech companies was considerably higher at the beginning of this crisis and there is also a growing trend in the exploitation of these online tools by larger companies. Thus, the current Covid-19 crisis has even increased the difference between small craft businesses and other enterprises regarding their utilization of online marketing tools. The application of e-business within firms not only improves business processes, administration, sales, financial management, human resources, and service quality, but also an exchange in communication between companies, customers, suppliers, banks, and public administration (Cetlová, Velimov, 2019). Nonetheless, small self-employed craft businesses in the Czech Republic have not yet overcome the technological barriers associated with the application of these tools despite the current Covid-19 crisis which emphasized the importance of these online tools. References [1] Breckova, P. Karas, M . (2020). Online Technology and Promotion Tools in SMEs. Innovative Marketing, 16(3), 85-97. http://dx.doi.org/10.2151 l/im.l6(3).2020.08 [2] Cetlová, H . , Velinov, E . (2019). Online Marketing Activities and Marketing Communication Tools in Czech Small and Medium-Sized Enterprises. Marketing Identity, 7(1), 803-815. [3] Cetlová, H . , Marciník, R., Velinov, E . (2020). Research Into Online Marketing Activities and Tools for Measuring their Efficiency by Entrepreneurs in the Czech Republic. Socioekonomické a humanitní studie, 11(1), 41-54. [4] Civelek, M . , Gajdka, K , Světlík, J., Vavrečka, V . (2020). Differences in the Usage of Online marketing and Social Media Tools: Evidence from Czech, Slovakian and Hungarian SMEs. Equilibrium. Quarterly Journal of Economics and Economic Policy, 15(3), 537-563. https://doi.org/10.24136/eq.2020.024 [5] Eid, R., El-Gohary, H . (2013). The Impact of E-Marketing Used on Small Business Enterprises' marketing success. The Service Industries Journal, 33(1), 31-50. https://doi.org/10.1080/02642069.2011.594878 [6] Kingsnorth, S. (2016). Digital Marketing Strategy: An Integrated Approach to Online Marketing. Kogan Page Publishers. I S B N 978-0-7494-7470-6 [7] McKinsey COVID-19: Implications for business. M a y 7 2021. Executive Briefing, [online], [cit. 2021-07- 05]. Available at: https://www.mckinsey.com/business-functions/risk/our-insights/covid-19-implications- for-business 71 On the crossing numbers of join of one graph on six vertices with path using cyclic permutation E m i l i a Drazenska 1 A b s t r a c t . The crossing number, cr(H), of a simple graph H is the minimal number of edge crossings over all good drawings of H i n the plane. In general, compute the crossing number for a given graph is a very difficult problem. T h e crossing numbers of a few families of graphs are known. One of them are join products of special graphs. In the paper, we extend known results concerning crossing number of the join product G + Pn, where the graph G consists of 5-cycle whose one vertex is adjacent to another vertex of this cycle and one other isolated vertex, and Pn is the path on n vertices. The methods used in the paper are based on combinatorial properties of cyclic permutations and the proof is done with the help of software. K e y w o r d s : graphs, drawings, crossing numbers, cyclic permutation, join product. J E L c l a s s i f i c a t i o n : C02 A M S c l a s s i f i c a t i o n : 05C10; 05C38 1 Introduction Let the graph H be a simple, undirected and connected graph with vertex set V and edge set E. The crossing number, cr(ff), of a graph H is the m i n i m u m number of edge crossings i n any drawing of H i n the plane. (For the definition of a drawing, see [7].) The drawing with a m i n i m u m number of crossings (an optimal drawing) must be a good drawing, meaning that each two edges have at most one point i n common, which is either a commom end-vertex or a crossing. The problem of reducing the number of crossings on the edges in the drawings of graphs was studied i n many areas and the most important area is V L S I technology. T h e crossing numbers have been studied to improve the readability of hierarchical structures and automated graph drawings. The visualized graph should be easy to read and understand. For the understandability of graph drawings, the reducing of crossings is likely the most important. The investigation on the crossing number of a given graph is very difficult problem. Garey and Johnson proved [4] that computing cr(ff) is an NP-complete problem. The exact values of crossing numbers are known for several special classes of graphs. One of them is a join products of two graphs. The join product Hi + H2 of two graphs Hi = (V\,Ex) and H2 = (V2, E2) is obtained from the vertex-disjoint copies of Hi and H2 by adding all edges between V(H\) and V ( i ? 2 ) For | V ( C ? i ) | = m and | V ( ( J 2 ) | = n, the edge set of H1+H2 is the union of disjoint edge sets of the graphs Hi, H2, and the complete bipartite graph Km^n. Let Dn denote the discrete graph on n vertices, let Pn and Cn be the path and the cycle on n vertices. In the proofs of the paper, we will often use the term "region" also i n nonplanar drawings. In this case, crossings are considered to be vertices of the "map". The exact values for crossing numbers of H + Pn and H + Cn for all graphs H of order at most four are given in [10], and the crossing numbers of the graphs H + Dn, H + Pn, and H + Cn are also known for some graphs H of order five and six, see [2], [3], [7], [8], [9], [11], [12], [13], [14], and [15]. In this paper we extend these results by giving the exact values of the crossing numbers for join product for a special graph G on six vertices with the path Pn. 1 Technical University in Košice, Faculty of Electrical Engineering and Informatics, Department of Mathematics and Theoretical Informatics, Němcovej 32, 042 00 Košice, Slovak Republic, e-mail: emilia.drazenska@tuke.sk 72 (a) (b) (c) (d) (e) (/) F i g u r e 2: Six drawings of the graph G with one crossing D (D(H)) be a good drawing of the graph H. We denote by crD(H) the number of crossings among edges of H in the drawing D. Let ff; and ffj be two edge-disjoint subgraphs of H. We denote by cro(Hi, Hj) the number of crossings between the edges of Hi and edges of Hj, and the number of crossings among edges of Hi i n D by crjj(ffi). In the paper, some proofs will be also based on the Kleitman's result on crossing numbers of the complete bipartite graphs [6]. More precisely, he proved that ci(Kmtn) m 1 m — 1 1 n 1 7 1 — 1 _2~J [ 2 J . 2 J L 2 J if imn{m, n} < 6. (1) 2 The crossing number of G + Pn Let G be the connected graph of order six consisting of the four-cycle and three-cycle with a common edge and one more vertex adjacent with a vertex of degree three (see Figure 1). We consider the join product of the graph G with the the discrete graph Dn. The graph G + Dn consists of one copy of G and of n vertices t\,..., t„. Each vertex ti, i = 1 , . . . , n, is adjacent to every vertex of G. Let Tl , i = 1,... ,TI, denote the subgraph that is uniquely induced by the six edges incident with the fixed vertex ti. This means that the graph T 1 U • • • U T™ is isomorphic to the complete bipartite graph K6iU and G + Dn = GuK6tn = Gu((jT^j. (2) We also use the same definition and notation for the good drawing D of the graph G + Pn. The graph G + Pn contains G + Dn as a subgraph, and therefore let P* denote the path induced on vertices of G + Pn not belonging to the subgraph G. The path P* consists of the vertices t\,... ,tn and the edges {U, tj+i}, for i = 1,..., TI — 1. Thus, G + Pn = GuK6>nUP* = GU (0r ) U P n - (3 ) 73 Let D be a good drawing of the graph G + Pn. The rotation r o t o ( i j ) of a vertex ti i n the drawing D is the cyclic permutation that records the (cyclic) counter-clockwise order i n which the edges leave ti, see [5]. We use the notation (123456) if the counter-clockwise order the edges incident with the vertex ti is Uvi, UV2, Uvz, Uvi, tiV§, and UVQ. For i, j 6 if rot_i(ij) = r o t o ( i j ) . If Q{mto{ti), rot_»(£j)) denotes the m i n i m u m number of interchanges of adjacent elements of r o t o ( i j ) required to produce the inverse cyclic permutation of rot_t(£j), then crjj(TZ ,T3 ) > Q(rotoiti), rotoitj)). B y P we will understand the inverse cyclic permutation to the cyclic permutation P. In this paper, some parts of proofs can be done with the help of software that generates all cyclic permutations of six-element set in [1]. We will separate the subgraphs T% for i G { 1 , . . . , n} of the graph G + DN into three mutually disjoint subsets. For i G { l , . . . , n } , let RD = {T* : c r ^ G , ^ ) = 0} and SD = {T* : c r ^ G , ^ ) = 1}. Every other subgraph T% crosses the edges of G at least twice i n D. For T% G i?r>, let F% denotes the subgraph G U T- of G + P „ , and let be the subdrawing induced by D. According to the arguments in the proof of Theorem 1, i n the optimal drawing D of G + Pn, there is a subgraph T% whose edges cross the edges of the graph G at most once. Thus, we will deal with only drawings of G with a possibility of existence of a subgraph T% G RD^SDAssume first a good subdrawing of the graph G i n which there is no crossing on the edges of G . In this case, we obtain three nonisomorphic planar drawings of G shown i n Figure 1. T h e vertex notation of the graph G i n Figure 1 will be justified later. Let us first assume the drawing of G with the corresponding vertex notation i n such a way as shown in Figure 1(a). W e need to list all possible rotations rot_t(tj) which can appear in D if the edges of T% do not cross the edges of G . Since there is only one subdrawing of F% \ { « 2 , ^ 3 } represented by the rotation (1546), there are two ways of obtaining the subdrawing of F% depending on the region i n which the edges UV2 and tiVs are placed. W e denote these two possibilities under our consideration by A\, A2, summarized i n Table 1. A s for our considerations it does not play a role in which of the regions is unbounded, assume the drawings shown i n Figure 3. Let us discuss all possible rotations rot £>(£») which can appear in D if the edges of TL cross the edges of G exactly once. The vertex ti must be placed i n the region with at least five vertices of G on its boundary, which means that the vertex ti be placed i n outer region of G . The edge V^VQ can be crossed only by the edge Uvs, and the edge vivs can be crossed either by the edge UV2 or the edge UVQ. So, there are three configurations represented by the cyclic permutations Ai = (125436), A2 = (154632) and ^3 = (163254). The edge V4V5 can be crosseed either by the edge tiV2 or the edge £ ^ 3 . A n d , there are three configurations represented by the cyclic permutations A4 = (135246), A5 = (125346) and Ae = (152463). T w o edges, either tiVs or tiVi, can cross the edge V2V5. A n d we have three configurations represented by the cyclic permutations A7 = (123546), ^8 = (132456) and Ag = (124563). The edge V^VQ can be crossed by the edge £ ^ 4 , so there exist two configurations represented by the cyclic permutations A10 = (132564), An = (125643). The last possibility is that the edge «2^3 can be crossed either by the edge £ ^ 4 or the edge tiV5. Thus, we have four configurations represented by the cyclic permutations A12 = (142563), A13 = (134256), A14 = (135246) and A15 = (152463) (see Figure 4). Now assume the drawing of G with the corresponding vertex notation in such a way as shown in Figure 1(b) or 1(c). Since i n both drawings there are at most 5 vertices of G i n the boundary of any region, the edges of any T% cross the edges of G at least once. Consider all possible drawings of G U T% i n which the edges of T% cross the edges of G exactly once. First, assume the drawing of G as shown i n Figure 1(b). Let the vertex ti be placed i n the same region as the vertex v\. Since there is only one subdrawing of G U T% \ \yz,v§\ represented by the rotation (1452), there are two and four possibilities how to obtain the subdrawing G U T% depending on which region the edge tiV% is placed and which edge of G is crossed by the edge UVQ. These 2x4=8 possibilities under our consideration are denoted by Bx = (134652),B2 = (146523),% = (136452),% = (164523),% = (145263),% = (134526),% = (134562) and % = (145623). If the vertex ti is placed in the outer region, there is the only subdrawing of G U TL \ {vi} represented by the rotation (25463). A n d , there are three possibilities how to obtain the subdrawing GUT1 depending on which edge of G is crossed by the edge Uv\. These three possibilities 74 (a) (b) F i g u r e 3 : Drawings of all possible configurations of the graph F 1 , if cr(G) = 0 At : (132546) A*2 : (125463) T a b l e 1: Configurations of graph G U T \ TL e RD are BQ = (154632), B\Q = (146325) and B\\ = (125463) (see Figure 5). Second, assume the drawing of G as shown in Figure 1(c). T h e vertex ti must be placed in the outer region. There is the only subdrawing of G U T% \ {vi} represented by the rotation (32546) and there are two possibilities how to obtain the subdrawing G U T ' depending on which edge of G is crossed by the edge tiV\. These two possibilities are denoted by d = (163254) and C 2 = (132546) (see Figure 6). Assume the drawing of the graph G with one crossing among its edges. We will consider only such drawings of G for which there is a possibility of the existence of a subgraph TL G RJJ, because of arguments in the proof of Theorem 1. Since there is TR 6 for i = 1,2. The lower bounds of number of crossings of two configurations from set { ^ 1 , ^ 2 } a r e m ^ e Table 2. - At A\ At 6 5 Al 5 6 T a b l e 2: Lower bounds of numbers of crossings for two configurations from {At, A%} 75 A13 A15 F i g u r e 4: Fifteen drawings of the graph G U T ' for TL , if G has a drawing as i n Figure 1(c) - Ci c2 Ci 6 5 c2 5 6 Table 6: Lower bounds of numbers of crossings between two different T% and TJ with configurations Ck and Ci of G U T* and G U P , respectively 78 F i g u r e 7: Nine drawings of the graph F1 , if cr(G) = 1 Now, consider the set of configurations Af = {Ai, i = 1 , . . . , 15} and subset of this set which contains configurations occuring i n the considered drawing D of G + Pn. The verification of the lower bounds for the number of crossings of two configurations from Af proceeds in the same way as before (the drawings of subgraph G U T 1 are i n Figure 4). The resulting lower bounds for the number of crossings of configurations from Af are summarized in the Table 3. Take into consideration the set of configurations {Bi, i = 1 , . . . , 8} and subset of this set which contains configurations occured in the studied drawing D of G + Pn. In the Table 4 are summarized lower bounds of numbers of crossings for any two configurations of the subgraphs GUT1 and GUT1 with drawings i n Figure 5. Assume the subset of the set {BQ,BIQ,BH} contains configurations occuring in examined drawing D of G + Pn. In the Table 5, there are lower bounds of numbers of crossings for any two configurations of the subgraphs G U T ' , G U T-? with drawings i n Figure 5. A s the edges of the path Pn are not crossed i n D, {B\,...,Bg} is disjoint with {Bg, Bio,Bu}. Consider set of configurations {Ci,C2} and subset of this set contains configurations occured in concrete drawing D of G + Pn. T h e lower bounds of numbers of crossings for any two configurations of the subgraphs G U T ' , GUTJ with drawings i n Figure 6 are given in the Table 6. L e m m a 1. Let D be a good drawing of G + Pn, n > 2 with a vertex notation of the graph G as i n Figure 1(a). If Tn G RD such that Fn has configuration A* G M v for i = 1,2, then cvD(Tn ,Tk ) > 3 for any Tk G SD. Proof. Let in D the graph Fn have configuration A\. Since Tk G SD, the vertex tk cannot be placed i n the region bounded either by 4-cycle or 3-cycle of the graph G. If the vertex tk is placed in the region with vertices vi,vs,VQ,tn on its boundary (Figure 3(a)), it is easy to see, that cr£>(T™,Tf e ) > 3. Moreover, if the vertex tk is placed in another region, there are exactly two vertices of the graph G on its boundary, then CTo(Tn ,Tk ) > 3. The same idea can be used for configuration A\. • T h e o r e m 1. c r ( G + Pn) = 6 [ f J L ^ J + 2 [ f J + 1 forn> 2. Proof. There is a drawing of G + Pn (see Figure 8) with 6 [ ^ J [^ir^J + ^ L f J + 1 crossings. Thus, we have c r ( G + Pn) < 6 [^J + 2 [^J + 1. We prove the reverse inequality by induction on n. Using algorithm on the website h t t p : / / c r o s s i n g s . u o s . d e / , we can prove that the result is true for n = 2 and n = 3. Suppose now that, for n > 4, there is a drawing D with crD(G + Pn) < 6 n 1 n — 1 1 -i_ 0 n L2JI 2 J .2. and let crD(G + Pm) > 6 TO 1 TO — 1 I — + 2 L 2 JL 2 J in ~2. 1. 1 for any positive integer TO < n. (4) (5) As the graph G + Dn is a subgraph of the graph G + Pn and crD(G + Dn) = 6 |_§ J + 2 ( s e e [13]), it implies c r ^ G + Pn) = 6 |_f J + 2 |_f J- Thus, no edge of the path Pn is crossed i n D and all vertices t\,..., tn are placed i n the same region of the subdrawing D{G). 79 F i g u r e 8 : The graph of G + Pn with 6 |_f J [ ^ J + 2 |_f J + 1 crossings First, we prove that the considered drawing D must be antipodal-free, that is c r ^ (T% T-7 ) =t 0 for all h ji i 7^ 3- A s a contradiction suppose that, without loss of generality, c r ^ T ™ - 1 , Tn ) = 0. If cr(G) = 0, using Table 2, both subgraphs T™ and T n _ 1 are not from the set i?j> Assume there is exactly one of Tn or T n _ 1 in the set RD- W i t h o u t loss of generality, T™ S i?r> A s every region in the drawing of G U T™ contains at most four vertices of G on its boundary and c r ^ T ™ - 1 , TN ) = 0, we have c r ^ T ™ - 1 , G ) > 2. If both T " and T " " 1 are from the set SD, then c r D ( G , T n _ 1 U T " ) > 2. If cr(G) > 1, one can easy to see that T " and T " " 1 are from the set SD, thus c r D ( G , T ™ - 1 U T " ) > 2. The fact that cv{K6>3) = 6 implies that any Tk , k = 1 , 2 , . . . , n — 2, crosses T n _ 1 U T™ at least six times. So, for the number of crossings, i n D, we have crD(G + Pn) = crD ( G + P „ _ 2 ) + c r ^ T ™ - 1 U T " ) + crD(K6in-2,Tn ~1 U T " ) - c r o C G . T " - 1 U T " ) > 6 ra-2 ra-2 J + 1 + 6(ra - 2) + 2 ra 1 ra — 1 1 - + 2 L 2 J L 2 J It contradicts with ( 4 ) . So, D must be an antipodal-free. Moreover, our assumption on D together with cr(K§^n) = 6 [§J [^ir^J implies that c r D ( G ) + c r D ( G , / Y 6 , „ ) < 2 [ - J . Let us denote r = \RD\ and s = | £ D | . Then, cr£,(G) + Or + Is + 2(ra - r - s) < 2 (6) If c r D ( G ) = 0, then 2r + s > 2ra — 2 |J . Moreover, if r = 0, then s = ra. C a s e 1: c r D ( G ) = 0. First, we can choose the vertex notation of the graph G as shown i n Figure 1(a). Let us consider that n = s. We have two possibilities. (i) For every i ^ j : cro {Tl ,T^) > 3. W i t h o u t lost of generality let T™ € SD and let us fix G U TN . T h e n we have cr(G + Pn) > cxD(K6,n-i) + cxD(K6,n-i,G U T " ) + crD(G U Tn ) > > 6 ra — 1 j [ ^ J + 4 ( r a - l ) + ra 1 ra — 1 1 1 > 6 hd [ 2 J + 2 (ii) Using Table 3, there is not T ^ T - 5 3, we fixed G U Tk . A n d the same inequalities as i n previous case (i) hold. If there is not such Tk G SD, without lost of generality, let for T " , ^ 1 G SD is crD(Tn -\Tn ) = 2 and let us fix G U T n _ 1 UT™. In this step we are interested i n all possible drawings of the subgraph G U T 1 for some Tl G We have c r ^ T ™ - 1 U T " , T l ) > 6 for every Tl with i 7^ n , n — 1, by summing the values in all columns in the considered two rows of Table 3. Thus, cr(G + P „ ) > crD(K6^2) + c r D ( K 6 ^ 2 , G U T " U T n _ ) + c r D ( G U T " U T™ ) > > 6 4 > 6 n 1 n - l \ 1 0 n L2JI 2 J + 2 .2. Let us consider that n / s, that is, n > s + 1. Using (6), we have r > 1. Let us assume that T ™ e i J n with P™ having configuration either .4J or _4_2- We will discuss two possibilities over congruence n modulo 2. • Let n be even. B y fixing the graph G U T " and using Table 2, L e m m a 1, we have cr(G + Pn) > crD(K6,n-i) + cxD(K6,n-i, G U T " ) + crD(G U T " ) > ra — 1 I ra — 2> 6 J J + 5 ( r - l ) + 4 s + 3 ( r a - r - s ) + 0 J [—^— J + 3ra + (2r + s) - 5 > ra — 1 > 6 ra — 1 3n - [ f j ) - 5 > e n 1 n — 1 n [ f j ) - 5 > e + 2[ f j ) - 5 > e .2J L 2 _ .2. • Let n be odd. B y fixing the subgraph Tn , cr(G + P „ ) > c r D ( G + P „ _ 1 ) + c r D ( G + P „ _ 1 , T n ) > - J + 1 + 5(r - 1) + 3s + l ( n - r - s) + 0> 6 n — 1 | n - 2 -1- 9 n — L 2 J L 2 J I 2 > 6 = 6 n — 1 1 n - 2 j + 2= 6 L 2 J 2 j + 2 n - 2 j + 2 n - + n + 2 j + 2 2 J + n + ra — 1 J + ra + 2(2r + s ) - 4 > ra — 1 4 > 6 Consider the drawing of G as in the Figure 1(b) or 1(c). A s we mentioned above, the edges of every T% cross the edges of G . A s there does not exist TL G RD, i-e., r = 0, using (6), we have s = n. We use the results from the Tables 4, 5, 6, and we have c r ^ (T% T-7 ) > 3 for every Thus, by fixing the graph G U T ™ we have cr(G + P „ ) > croiKe^) + cxD(K6,n-i, G U T " ) + c r D ( G U T " ) > „ ra — 1 > 6 ra 1 ra — 1 1 -i_ 0 ra L2 JI 2 J .2. C a s e 2: crD(G) = 1. Based on equation (5), r > 1. W i t h o u t lost of generality, we assume that T™ G RD- W i t h respect to drawings of G U T™ (see Figure 7), it is possible to verify, that the edges of T% cross the edges of G U Tn at least four times for every i = 1,... ,n — 1. So, by fixing the graph P™ we have > 6 ra — 1 ra 1 ra — 1 1 -i_ 0 ra L2JI 2 J . 2 . C a s e 3 : c r D ( G ) > 2 We use the same idee ^ . . existence of a subgraph T l G flc We use the same idea as i n previous case for all possioie drawings 01 m e grapn u wnn i " in the considering drawing D. It completes the proof .ible drawings of the graph G with a possibility of an r l i r i T T T i »1 j-v T4- s~*s~\l-yi -t-\l *^ii- rtn 4- 1-» /"\ T-\-**y-» s~\4* | 81 R e f e r e n c e s [1] Berezny, S., Busa, J. Jr., Stas, M . (2018). Software solution of the algorithm of the cyclic-order graph, Acta Electrotechnica et Informatica, 18, No. 1, 3-10. [2] Berezny, S., Stas, M . (2017). On the crossing number of the join of five vertex graph G with the discrete graph Dn, Acta Electrotechnica et Informatica, 17, No. 3, 27-32. [3] Berezny, S., Stas, M . (2018). Cyclic permutations and crossing numbers of join products of symmetric graph of order six, Carpathian J. Math., 34, No. 2, 143-155. [4] Garey, M . R., Johnson, D. S. (1983). Crossing number is NP-complete. SIAM J . Algebraic. Discrete Methods, 4, 312-316. [5] Hernandez-Velez, C , Medina, C , Salazar, G . (2014). The optimal drawing of K5^n. Electronic Journal of Combinatorics, 21(4), 29. [6] Kleitman, D. J. (1970). The crossing number of K5n. J . Combinatorial Theory, 9 , 315-323. [7] Klesc, M . (2010). The crossing number of join of the special graph on six vertices with path and cycle, Discrete Math., 310, 1475-1481. [8] Klesc, M . (2007). The join of graphs and crossing numbers, Electron. Notes in Discrete Math., 28, 349-355. [9] Klesc, M . , Schrotter, S. (2012). The crossing numbers of join of paths and cycles with two graphs of order five, Combinatorial Algorithms, Sprinder, LNCS, 7125, 160-167. [10] Klesc, M . , Schrotter, S. (2011). The crossing numbers of join products of paths with graphs of order four, Discuss. Math. Graph Theory, 31, 321-331. [11] Klesc, M . , Valo, M . (2012). Minimum crossings in join of graphs with paths and cycles, Acta Electrotechnica et Informatica, 12, No. 3, 32-37. [12] Stas, M . (2017). On the crossing number of the join of the discrete graph with one graph of order five, J . Math. Model, and Geometry, 5, No. 2, 12-19. [13] Stas, M . (2018). Cyclic permutations: Crossing numbers of the join products of graphs, Proc. Aplimat 2018: 17th Conference on Applied Mathematics, 979-987. [14] Stas, M . (2019). Determining crossing number of one graph of order five using cyclic permutations, Proc. Aplimat 2019: 18th Conference on Applied Mathematics, 1126-1134. [15] Stas, M . (2019). Determining crossing number of join of the discrete graph with two symmetric graphs of order five, Symmetry, 11, No. 2. [16] Woodall, D. R. (1993). Cyclic-order graphs and Zarankiewicz's crossing number conjecture, J. Graph Theory, 17, 657-671. 82 Efficiency of Credit Risk Management and Their Determinants in Central European Banking Industries Xiaoshan Feng1 Abstract. Credit risk is one of the major risks in commercial banks. Therefore, whether commercial banks conduct effective credit risk management and employ technology changes with the times are essential. The main aim of this study is to evaluate the performance of credit risk management and productivity change and identify the determinants. We employ the Data Envelopment Analysis ( D E A ) on selected commercial banks in the Czech Republic, Germany, Republic of Austria, Poland, and Hungary. Based on valid data from 2012 to 2019, selected banks received lower efficiency scores using variable returns to scale in line with expectations. Additionally, strong evidence from the Malmquist Index demonstrated that the selected banking industries have various improvements during the past 8 years due to innovation in credit risk management. Furthermore, logistic regression results emphasized the significant differences among banking industries and suggested the credit risk measurement (SA/IRB), size, capital adequacy, and ownership have significant influences on the likelihood of banks being efficient on credit risk management. Keywords: Credit risk management, non-performing loans ratio, macroeconomic variables, bank performance indicator, central Europe, data envelopment analysis, Malmquist index, logistic regression. J E L Classification: G21, C31, C67, C80, C61, C58 AMS Classification: 62M10, 91G40, 91G70, 90C05 1 Introduction For the past few years, accompanied by the recovery of the whole economy, robust growth in the volume of banking lending activities achieved. Thus, commercial banks need to be more cautious about the quality of their assets. While the outbreak of the global pandemic (COVID-19) lightened an incoming challenge for the global economy. Not only bank industry should prepare for the possible forthcoming deterioration, but policymakers and regulation institutions also need to implement a more effective framework to reduce the relevant risks. As one of the most important risks, credit risk can lead to a huge failure i n banks. Subsequently, investigating the efficiency of credit risk management of commercial banks becomes one of the most important steps to measure the overall soundness of the banking industry. Simultaneously, it is necessary to get to the bottom of the possible determinants i n credit risk management efficiency. The objective of this paper is to evaluate how the efficiency of credit risk management is influenced by the macroeconomic and bank itself determinants for selected banking industries in Central Europe. In this paper, we selected 10 commercial banks from each of 5 countries i n Central Europe, which are, Czech Republic, Germany, Republic of Austria, Poland, and Hungary, respectively. This paper is divided into five sections. The first section starts with the introduction and the last one ends with the conclusion. The second section includes the literature review. Section 3 presents a brief description of methodology and data collection. In the fourth section, the empirical results will be discussed. 2 Literature Review In general, prior research typically investigated the operational efficiency of the banking industry, while the credit risk management efficiency has received limited attention in this area. Additionally, studies which employed the logistic regression model are scarce especially from the standpoint of efficiency. In this section, we will summarize and compare the relevant research. The most widely used technique to measure efficiency is Data Envelopment Analysis (DEA), M u c h of the current literature conducts D E A to evaluate banking efficiency. Řepková [11] suggested that the efficiency scores from 1 VSB - Technical University of Ostrava, Department of Finance, Sokolska tfida 33 702 00 Ostrava, Czech Republic, xiaoshan.feng@vsb.cz. 83 the B C C model have higher values than from the C C R model due to the elimination of deposit management inefficiency. The study applied the Malmquist index to estimate the efficiency change i n the Czech banks over time from 2001 to 2010. The negative growth in efficiency indicates the industry has lacked innovation or technological progress during the time. Several studies incorporate non-performing loans ratio (NRLs) as a proxy of credit risk when measuring the efficiency of credit risk management i n the banking sector. Undesirable outputs like NRLs may present i n the banking sector which prefers to be minimized. There is an abundance of research that incorporates undesirable outputs into the analysis. Paradi and Zhu [9] have surveyed bank branch efficiency and performance research with D E A . The study mentioned three approaches when non-performing loans are incorporated in previous literature. The first is to leave the N P L s ratio as an output but use the inverse value. The second method is to treat this undesirable output as input, which applied in other studies (Puri and Yadav [10]; Toloo and Hanclova [14]). The third one is to treat it as an undesirable output with an assumption of weak disposability, which requires that undesirable outputs can be reduced, but at a cost of fewer desirable outputs produced. Gaganis et al. [5] included loan loss provisions (LLPs) as an input variable to examine the efficiency and productivity of a Greek bank's branches from 2000 to 2005. The finding shows that the inclusion of loan loss provisions as an input variable increases the efficiency score. More recent attention has focused on the determinants of credit risk management, when the N P L s ratio is widely incorporated as a proxy of credit risk (Messai and Jouini [13]; S karica [12]; Louzis, et al. [8]). One of the most important steps before credit risk management is to measure the credit risk. Since the enforcement of Basel II, banks calculate their minimum capital requirements under Pillar I use risk weights provided by the standardized approach (SA) or the internal ratings-based approach (IRB). Cucinelli et al. [4] suggested that banks using IRB were able to curb the increase i n credit risk driven by the macroeconomic slowdown better than banks under the standardized approach, which pro-vide the study of using IRB shows better performance than using S A . Hakenes and Schnabel [7] published an analysis of bank size and risk-taking under Basel II, they found out although bank can choose between S A and IRB, while makes larger banks a competitive advantage and pushes smaller banks to take higher risks. This may even lead to higher aggregate risk-taking. Thus far, few studies have incorporated credit risk measurement as one of the impact factors for credit risk management efficiency. There exhibits abundant literature focus on the bank's operational efficiency and lack of timeliness. It is rare to investigate efficiency from the view of credit risk management of banks. O n top of that, when many studies examine the determinants of credit risk management, dynamic panel data analysis is widely used i n previous literature. Scarce research specified the determinants of credit risk efficiency from external and internal impacts. Hence, the knowledge gap can be addressed. This study will first measure the efficiency scores from D E A , subsequently, examine the determinants of banking credit risk management efficiency using a logistic regression model. To diversify the choices of determinants, in addition to macroeconomic perspective, we will incorporate the bank relevant factors which are, respectively, profitability factor, size, ownership (domestic or foreign-owned), capital adequacy, and the credit risk measurement (IRB or SA). 3 Methodology and Data collection Following the previous literature, we apply Data Envelopment Analysis and the Malmquist index to measure the efficiency of credit risk management and productivity change of selected 10 banks from each of the five countries, respectively, Austria, the Czech Republic, Germany, Hungary, and Poland during the period from 2012-2019. Moreover, we employ the logistic regression model to investigate the possible internal and external determinants of credit risk management efficiency. 3.1 Two Classic Models of Data Envelopment Analysis D E A is a linear programming-based method, which introduced by Charnes, Cooper and Rhodes in 1978. D E A is used to evaluate the relative efficiency of a set of decision-making units (DMUs) with multiple inputs and multiple outputs. Then, Banker, Chames and Cooper has proposed a model in 1984, named the B C C , which is an extended version of the C C R model. The main difference between these two models is different returns to scale. The C C R model assumes all D M U s are operating at an optimal scale, that is, constant returns to scale (CRS); While the B C C model assumes variable returns to scale (VRS). In D E A models, we measure the efficiency of each DMU. One of the most frequently used methods to measure efficiency is by the ratio. Suppose we have n D M U s i n the population, each D M U produces s outputs while 84 consuming m inputs. Consider DMUj,j represents n D M U s , xri and yri are the matrixes of inputs and outputs respectively. The efficiency rate of such a unit can be expressed as: Vr=xuryri (1) The efficiency rate is the ratio of the weighted sum of outputs to weighted sum of inputs. The D E A model assumed inputs and outputs should be non-negative. Let DMUj to be evaluated on any trial be designated as DMU0, where o = (1,2 n ) . A ratio of two linear functions can construct the linear-fractional programming model as follows: S r = l " r y ™ (2) max 6 = — v,u 5 ™ i VtXio , . . . Y?r=iUryrj ^ . . , . (3) subject to — < 1,0 = 1,2, ...,n) u1,u2,...,ur > 0, (r = 1,2, ...,s) (4) v1,v2, —,Vi > 0, (i = 1,2, ...,m) (5) Where 0 is the technical efficiency of DM£/0 to be estimated, vt (i = 1,2,..., m) is the optimized weight of input and the output ur (r = 1,2,..., s). yr ; - is observed amount of output of the r-fh type for the y'-th D M U , xtj is observed amount of input of the i-fh type for the y'-th D M U . Moreover, we will apply the Malmquist index to deal with our panel data, to evaluate the productivity change of a D M U between two time periods and is an example i n comparative statistical analysis. Farrell developed the Malmquist index as a measurement of productive efficiency i n 1957, then Fare decomposed M I into two terms i n 1994, it can be defined as "Catch-up" and "Frontier-shift" terms. The catch-up term indicates the degree of a D M U improves or worsens its efficiency. The frontier-shift term is used to figure out the change in the efficient frontiers between two time periods. The Malmquist index is computed as the geometric mean of Catch-up and Frontier- shift. MI = (Catch - up) x (Frontier - shift) (6) When M I larger than 1, it means progress in the total factor productivity of the DMU0 from period 1 to period 2, while M I equals 1 means no change, and M I less than one indicates deterioration in the total factor productivity. a) Data Selection for Measuring the Efficiency by DEA and MI To make sure the results of applying D E A are accurate, the number of inputs and outputs, and D M U s must support the rule of thumb, which proposed firstly by Golany and Roll [6], then developed by Bowlin [3], that is, it should have three times the number of D M U s as there are input and output variables, if this condition will not be met, the results are not reliable (Toloo and Tichy [15]). In this study, we will collect data of 10 representative commercial banks from each of the five industries, which includes the large size bank, medium and small size. A l l data are from the annual report of each bank on a consolidated basis. This study aims to investigate the efficiency of credit risk management i n the banking industries. Therefore, we will apply the intermediation approach to measure the efficiency of credit risk management based on the D E A model. To assess credit risk modeling i n the banking industry, Berg et al. [2] suggested using non-performing loans as a proxy of credit risk i n a nonparametric study of the bank production, Altunbas et al. [1] incorporated loan loss provisions to analyze the efficiency of Japanese banks. Therefore, based on previous literature, we proxy credit risk by the ratio of non-performing loans to total gross loans, so-called N P L s ratio. Then incorporate loan loss provision ratio to represents the ratio of provision and nonperforming loans, which is primarily to reflect commercial banks' abilities to compensate for loan losses and to protect against credit risk. Generally, this paper developed two inputs and one output, with 10 D M U s which satisfy the rule of thumb. Input xx is loan loss provision, Output y loans and receivables as output, non-performing loan as undesirable output, based on the treatment of undesirable output from previous literature, N P L s ratio is transformed as Input x2. b) Logistic Regression Model Furthermore, after we obtain efficiency score based on the C C R model and the B C C model, we can estimate the determinants of banking credit risk management efficiency using regression model, since the efficiency can be measured as binary outcomes, we can model the conditional probabilities of the response outcome, rather than 85 give a binary result. Therefore, we apply logistic regression model i n this paper. The logistic model could be interpreted based on an underlying linear model, shows below: Yilt = B0+ X[XB + eix ,1 = 1 N,t = l T. (7) Where the subscripts i and t denote the cross-sectional (JV) and time dimension (T) of the panel data, respectively. There is A: (A: = 1,..., K) regressor in Xix, not including a constant term. Xit is explanatory variable value for i-th section at t-th dimension; B0 is the intercept; B is the slope coefficient of a (A: x 1) vector. The variable eix, can be called as the error term i n the relationship, represents factors other than explanatory variables that affect dependent variables. Since we have a binary output variable Yit, and we want to model the conditional probability p(Yix = l\X'ix = xix) as a function of xix: n(xix) = V(Yix = l\X[x=xix) (8) The logistic regression model can be constructed as follows: n(Xit) (9) \.Z\ = Po+XUP,i = l N,t = l T. K ) c) Data Selection for Logistic Regression Model To investigate the determinants of the efficiency of banks' credit risk management, the dependent variable i n the regression model is the efficiency score obtained from the previously mentioned D E A model. The independent variables are, respectively, size of the bank, which measured as the natural logarithm of the value of total assets in Czech koruna; Capital Adequacy Ratio (CAR), measured by dividing a bank's total capital by its risk-weighted assets; Return on average assets ( R O A A ) , it is calculated as the ratio of net income to average total assets; G D P growth rate, which is the year-on-year annual G D P growth rate. Risk-weighted assets calculated by the Standardized Approach (SA), which is calculated as the ratio of R W A s under S A to total R W A s . Ownership of bank is the binary variable, i n which 1 represents foreign-owned, 0 represents domestic-owned. The binary outcomes are measured by 1 and 0, which represent D M U is efficient and inefficient, respectively. 4 Empirical results 4.1 Efficiency of selected banking industries In this session, we will compare the five banking industries from efficiency results and the Malmquist index. The efficiency obtained from D E A SolverPro™. Generally, selected countries exhibited different levels of asymmetry within each banking industry, in which the Czech Banking industry has the largest one, T E ranges from 0.11-1. Fig. 1 shows T E calculated under the C C R model. The efficiency score ranges from 0.17-0.62. German banking industry fluctuated sharply among 5 selected industries, the efficiency score dramatically fell i n 2015 due to the record-setting loss of the largest bank - Deutsche Bank, after experienced the Brexit and following events, the German banking system has a relatively low performance of credit risk management and started making a recovery based on the potential of retail banking service and the investing of IT upgrades. Hungary has relatively low efficiency scores, due to the importance of the banking industry i n Hungary is surprisingly low, and relatively lowamount of loans and advances to the household. Similar to Řepková [1], we obtained higher scores from B C C due to the variable returns to scale. Fig. 2 provides strong evidence that the average P T E large range from 0.32-0.92. During the year 2015, Hungary, Poland, and Germany experienced a sharp decline in credit risk management. Figure 1 Technical efficiency among selected industries Figure 2 Pure-technical efficiency among selected industries In addition, the Austrian banking industry suffered a continuous collapse during 2012-2017, but a jump in credit risk management efficiency through technology progress in 2018. In contrast, the Czech banking industry has 86 relatively high efficiency of credit risk management till 2017, while a sharp collapse exhibited which might be caused by the announcement, that is, to exit from the exchange rate commitment. This is the first time i n eleven years of the Czech intervention currency market. After the C N B announced the decision, the Czech koruna fell 3.2% to the euro, it was the biggest decline after 2010, A t the same time, the C N B also announced to keep interest rates unchanged. This decline can correspond to the technology progress decline from 2017. The Malmquist index measures the productivity change of a D M U . If M I is bigger than 1, it will indicate progress in the productivity change of a D M U from period 1 to 2. We can see from Fig. 3, only the German and the Austrian banking industries have sharp fluctuations and M I larger than 2 and 8 during the period 2014 to 2015, and 2015 to 2016 respectively, which indicate these two industries have huge productivity changes during specific periods. Except for Poland, the rest of the selected banking industries have not improved their development in credit risk management but have steady improvements for recent years. While under the assumption of variable returns to scale, only the German banking industry has exceptional fluctuation during the period 2014 to 2015. But during the period 4 and 5, except the Czech Republic and Hungary, other banking industries did not have progress. 4 4 4> <• c,b> d. (1) The table 1 defines the so-called coordination game, where the goal is to coordinate both firms so that they use the same technology, where both firms achieve higher values than when each uses a different technology. It is actually a non-cooperative game of two players (firms 1 and 2) with two strategies (selection of technology A and B). The basic concept of solving non-cooperative games is to find the so-called Nash equilibrium, when changing the strategy of any of the players, provided that the strategies of other players remain unchanged, this player can only be damaged. In the coordination game there are two Nash equilibrium solutions; equilibrium (A, A ) - both firms choose technology A and equilibrium (B, B) - both firms choose technology B. Firm 2 Technology A Technology B Firm 1 TechnolojyyA a; a d; c TechnologXyB c; d b; b Table 1 Values of technologies for firms The existence of a number of equilibrium solutions raises the question of how firms will coordinate their activities. If technology A is new technology and technology B is old, the following two types of market failure can be observed. If it holds that the value a > b and is chosen equilibrium solution (B, B), ie. firms will stay with the worse rated old technology B, then there is talk of excess inertia. If it holds that the value b > a and is chosen equilibrium solution (A, A), ie. firms move to the worse-rated new technology A, then there is talk of excess momentum. Both symmetric equilibrium solutions can be suitable candidates for selection. Another refinement of the notion of Nash equilibrium solution is the Pareto-dominant and risk-dominant equilibrium solution. A n equilibrium solution is Pareto dominant when there are no other strategies for which the value of the solution is better for at least one player and not worse for the other players. A n equilibrium solution is risk-dominant i f the best response of both players remains unchanged until the opponent selects an equilibrium strategy with a probability of at least 0.5 (see [5]). If applicable a > b > c > d > 0 and (b - d) > (a - c), (2) then the equilibrium solution (A, A) is Pareto-dominant and the equilibrium solution (B, B) is risk-dominant. 3 Arthur's basic model Consider the Arthur's basic model [1] with two firms 1 and 2 and two technologies A and B. Each firm makes the decision to purchase the technology according to the initial preferred own value of the technology and the network externalities associated with each technology. These values are summarized in Table 2, where a\ is the original preferred value of technology A for firm 1, UA is the size of the network using technology A , s is the parameter of the network value. Analogous designations apply to firm 2 and technology B. Technology A Technology B Firm 1 Firm 2 ai + sriA bi + sriB 02 + s HA bi + STIB Table 2 Values for technology selection Increasing returns from scale are considered, so the parameter s is always positive. Firm 1 initially prefers technology A and firm 2 initially prefers technology B, so this is true 90 fli > b\ and ai < bi. (3) According to Arthur's model, one firm buys one technology in each time period. The firm comes to market in the period ?,. The type of incoming firm is a random component in the model, both firms have the same probability of arrival. The selection of technology by firm i is determined by a combination of three factors: • type of firm (random component); • the initial preferred value of the technology; • the number of previous selections of each technology. The model assumes a two-component value of the technology. The first component is the own value of the technology, the second component is the network value. Arthur's shows that under these conditions, users will be locked in one of the technologies. This result can be easily derived from the matrix in Table 2. Firm 1 will initially prefer technology A, given the initial preferred own value and no network value. Firm 1 will switch to technology B as soon as it takes effect b\ + s riB > a\ + s HA. The inequality can be rewritten in the form of a so-called switching inequality (ai-bi) nB-nA> . (4) (5) This inequality, together with a similar inequality for firm 2, determines the absorption barriers. Once the difference in network size with technology B exceeds the size of network with technology A by a certain value, determined by the initial preferred values and the parameter of the network value s, users will be locked by technology B, which becomes the standard. Firm 1 will give up technology A if the size of the network with technology B is such that the benefit from technology B exceeds the original preferred value for firm 1. At this point, both firms 1 and 2 will only buy technology B and the size of the network with technology A will not magnify. This analysis is expressed graphically in Figure 1. Difference Technology A Absorotion barrier 0 Technology B Absorption barrier Figure 1 Absorption barriers Time 4 Generalization of Arthur's model Arthur's basic model allows for certain generalizations (see [2]). We present a model with the possibilities of using converters so that the technologies become compatible. Next, we present models where prices for the purchase of technologies are introduced and an impact on social wellfare is analyzed. 4.1 Model with converters The introduction of converters, which allow compatibility between technologies, leads to interesting changes in Arthur's model. Let us introduce the compatibility parameters ICAB and ICBA, from the interval between zero and one, which measure the compatibility of technology A with technology B, respectively the compatibility of technology B with technology A. The values are summarized in Table 3, from which we derive some relations for absorption barriers. Technology A Technology B Firm 1 Firm 2 a\ + s riA + kAB sriB bi + s ris+ kßA s TIA ai + sriA + kAB SUB bi + s ns+ kßA s « A Table 3 Values for technology selection with converters 91 Assume a reciprocal converter that allows compatibility in both directions and kAB = kBA. Then the switching ine quality has a shape (ii-&i) nB~nA>Compared to the basic Arthur's model, the absorption barriers are multiplied by a coefficient (6) (7) Depending on the value of the compatibility parameter kAB, the following situations may occur: • For completely incompatible technologies, compatibility parameter kAB = 0, we get the basic Arthur's model. • For a partially compatible reciprocal converter, compatibility parameter 0 < kAB < 1, the absorption barriers are wider than in the situation without converters. As the value of the kAB compatibility parameter increases, the absorption barriers expand. • For a fully compatible reciprocal converter, compatibility parameter kAB = 1, absorption barriers will be removed and users will not be locked by any technology. The situation can be captured graphically in Figure 2. Difference Technology A Technology B Time Figure 2 Fully compatible reciprocal converter We can perform a similar analysis for a bidirectional converter where the compatibility parameters are not the same (kAB ^ kBA). Let's further examine the situation when introducing a one-way converter. Assume that technology A has access to technology B (0 < kAB < 1) using a one-way converter and technology B does not have access to technology A (kBA = 0). Depending on the value of the kAB compatibility parameter, the following situations may occur: • For a partially compatible one-way converter, compatibility parameter 0 < kAB < 1: Firm 1 will switch from the originally preferred technology A to technology B as soon as it takes effect (1 - kAB) nB-nA> . s Firm 2 will switch from the originally preferred technology B to technology A as soon as it takes effect nA - (1 - kAB) nB > . (8) (9) The absorption barrier for technology A is approaching and the absorption barrier for technology B is moving away. For a fully compatible one-way converter, compatibility parameter kAB = 1: Firm 1 would switch from the originally preferred technology A to technology B as soon as it takes effect (ii-&i) nA > • (10) however, this inequality never occurs because the left side is always negative and the right side is posi- tive. Firm 2 will switch from the originally preferred technology B to technology A as soon as it takes effect ( b 2 - a 2 ) nA > • ( I D As the number of users grows, Firm 2 will switch to Technology A , which is becoming the standard. The absorption barrier for technology A gets even closer and the absorption barrier for technology B ceases to exist. 92 4.2 Model with prices So far, we have examined the situation without considering the prices for the acquired technology. Let us give a simple generalization of the Arthur's model, which explains how the standardization process works if we include in the model technology suppliers who strategically set prices. The modification has the same assumptions as the basic Arthur's model, in addition to introducing some other conditions. It assumes the existence of two suppliers A, B, each of which sponsors its own technology. These firms have market power to set prices, different from the marginal costs of manufacturing technology. Each technology is associated with fixed costs, which are always incurred and marginal costs are incurred only if the supplier sells the unit in a given period. Each supplier starts production with an initial subsidy to establish a fund. The supplier strategically sets the prices for the technology units sold, which flow to the suppliers in the fund when the unit is sold. Pricing strategically influences consumer behavior, subsidizing low prices attracts more people. However, once the fund is exhausted, the supplier goes bankrupt and production ends. Let denote as PA the price per unit of technology A, as PB the price per unit of technology B. The values, including the own value of the technology and the network value, less the price paid for the technology, are summarized in Table 4. If the prices PA = PB = 0, we get the basic Arthur's model. Technology A Technology B Firm 1 Firm 2 a[ + sriA-pA b[ + snB-pB 02+SllA-pA bl+STlB-pB Table 4 Values for technology selection with prices Firm 1, for which we assume the original preference for technology A , will switch to technology B as soon as it takes effect nB-riA> [ ( O 1 - 6 1 ) - ( P A - P B ) ] (12) A similar switching inequality applies to firm 2. The consumer's decision to switch to another technology is given by comparing the difference between the initial preferred values of the technologies, the difference in prices and the size of the networks. If the supplier can sell at a sufficiently lower price than the competition, the effects of the difference between the initial preferred technology values and the network effects may be outweighed in the consumer's decision. A n important feature of pricing decisions is the lack of an accurate forecast of a competitor's prices in each period. 4.3 Model with an impact on social wellfare The selection of standards and issues of technology compatibility have an impact on social wellfare. It is possible to formulate a coordination game where firms are in a conflict situation when distributing revenues from standardization. A situation where there are more type 1 firms and more type 2 firms in the economy can be modeled as a game with a single firm 1 and a single firm 2, which compete in several rounds. In each round, it is a matter of determining the equilibrium for two firms and their contribution to social wellfare, given by the sum of the achieved values for both firms. Both firms 1 and 2 have a selection of two strategies, selection of technology A or technology B. Consider in the payoff matrix the actual values of technologies and network values, including the possibility of using the converter for compatibility with other technologies, ie the same values as the model with converters. The values for the coordination game are given in Table 5. Firm 2 Technology A Technology B Firm 1 Technology A a\ + s riA + kAB s m; dl + S llA + kAB S IIB a\ + s riA + kAB s UB\ bi + s nB+ kBA s nA Technology B b[ + s nB+ kBA s nA\ dl + S llA + kAB S IIB b[ + SnB+ kBA s nA\ bi + s nB+ kBA s nA Table 5 Impact on social wellfare 93 The influence of converter efficiency on social wellfare can be demonstrated on specific values. For lower values of the compatibility parameter, the game has coordinating Nash equilibrium solutions, equilibrium (A, A) and/or equilibrium (B, B). As the compatibility parameter increases, the values for mixed networks containing both A and B technologies increase. A s the compatibility parameter is further increased, the values for mixed networks increase and the social value also increases. 5 Conclusions Coordination game, Arthur's model, their modifications and generalizations are used to analyze the problems of technology coordination in networks. Important conclusions can be drawn from the analyzed simple models. These conclusions concern changes in the selection and adoption of technology in the use of converters and strategic pricing. The influence of technology compatibility on the selection of standards and on social wellfare is also analyzed. The introduction of converters will change the locking of technologies and the adoption of standards. The specific changes depend on the type of converter, whether it is unidirectional or bidirectional and whether it is partially or fully compatible. In the case of a bidirectional converter, the introduction of a bidirectional partially compatible converter will delay the adoption of the technology standard, while the introduction of a bidirectionally fully compatible converter will completely prevent the adoption of the technology standard. In the case of a one-way converter, the introduction of a one-way partially compatible converter means a tendency towards locking by the preferred technology, and the introduction of a one-way fully compatible converter will cause the preferred technology to be adopted as standard. The introduction of strategic technology pricing will significantly affect the selection and adoption of technology as a standard. If the financial resources of technology suppliers are highly asymmetric, then strong companies, by setting low prices, outweigh the disadvantages of the own value of the technology and can gain an increasing number of users and thus achieve the adoption of their own technology as a standard. For weak companies, this strategy is very risky and can lead to financial problems even if the technology's own value benefits. Technology compatibility affects the selection of standards and social wellfare. As the compatibility parameter increases, the values for mixed networks increase, and the social value increases and the deviation from coordinating solutions increases. Applied coordination principles for the use of technologies in networks can be modified and adapted to other problems on networks, such as the coordination of resources in supply chains or project portfolios (see [3]). Acknowledgements This work was supported by the grant No. I G A F4/42/2021, Faculty of Informatics and Statistics, Prague University of Economics and Business. The paper is a result of an institutional research project no.7429/2020/02 "System approach to selected ICT trends" supported by University of Finance and Administration, Prague. References [1] Arthur, W . (1989). Competing Technologies, Increasing Returns, and Lock-in by Historical Events. Economic Journal 99, 116-131. [2] Fiala, P. (2016). Dynamické vytváření cen a alokace zdrojů v sítích. Praha: Professional Publishing. [3] Fiala, P. (2018). Project portfolio designing using data envelopment analysis and De Novo optimisation. Central European Journal of Operations Research 26, 847-859. [4] Gottinger, H.-W. (2006). Economies of Network Industries. London: Routledge. [5] Harsanyi, J.C. &Selten, R. (1988). A General Theory of Equilibrium Selection in Games. Cambridge: M I T Press. [6] Kreps, D . (1991). Game Theory and Economic Modelling. Oxford: Oxford University Press. [7] Shy, O. (2001). The Economics of Network Industries. Cambridge: Cambridge University Press. 94 The Impact of Covid-19 on Mutual Relations of Czech Macro-aggregates: Effect of Structural Changes Jakub Fischer1 , Kristýna Vltavská2 Abstract. One of the assumptions of economic analyses and prognoses consists of the stability of mutual relations of macro-aggregates. For example, there is a close connection between gross domestic product and gross national income. As another example, the share of the value-added on the (gross) production does not change quarter-to-quarter in standard time. Nevertheless, covid-19 pandemia and governmental measures led to the change in these relations. While ICT is going stronger, manufacturing remains stable, and services like accommodation and food service activities were almost entirely closed. Furthermore, the sectorial change (from non-financial institutions to government institutions, from market to non-market production) influences G D P and current taxes. The paper aims to analyse the impact of structural changes in 2020 on the mutual relations of the Czech macro-aggregates and ratio indicators. A s the primary method, we use index decomposition, particularly the decomposition of total indicators into the levels effect and substitution effect. In 2020, based on the preliminary data, the substitution effect positively influences yearon-year changes in labour productivity and hourly wages and salaries (both by 0.5 p.p.). It has no impact on monthly wages and salaries' development. Keywords: macro-aggregates, gross domestic product, labour productivity, covid-19, index decomposition J E L Classification: C43, E24, 047 A M S Classification: 62P20 1 Introduction The stability of mutual relations between key macro-aggregates is one of the assumptions of economic analyses and prognoses. As examples, we can mention a close connection between gross domestic product and gross national income. Similarly, the share of the value-added on the (gross) production does not change quarter-toquarter in standard time. Nevertheless, covid-19 pandemia and governmental measures have brought substantial structural changes, and these relations have been broken. I C T activities and health services are going stronger, manufacturing remains stable, and services like accommodation and food service activities were almost entirely closed due to the lockdown. Furthermore, the sectorial change (from non-financial institutions to government institutions, from market to non-market production) influences G D P and current taxes. Several recent papers focus on the post-covid-19 economic development and the perspective of individual industries. Kotz et al. [4] aim at the productivity and growth: they mention the potential of pandemic-related productivity acceleration potential in several industries like healthcare, construction, ICT and retail of 1.2-3.0% in years 2019-2024, driven by telemedicine, operational efficiency, industrialisation, digital construction, warehouse automation and e-commerce. A similar forecast is presented by Mischke et al. [6], who predict a potential to accelerate annual productivity growth by about one p.p. to 2024. They mention that this acceleration is twice comparing to the pre-pandemic rate of productivity growth. Bloom et al. [1] bring a detailed bottom-up analysis and prognosis of the impact of covid-19 on productivity in the United Kingdom. The authors consider micro-drivers of macro productivity, use business survey results, and review many papers within the area of COVID-19 impact on productivity changes. They conclude the total factor productivity will be reduced by up to 5% in 2020 Q4 and by around 1% in 2022 and beyond. Our paper aims to analyse the impact of structural changes in 2020 on some ratio indicators like labour productivity and average wages (wage per F T E , wage per hour). While the cited authors analyse and forecast the development of individual industries or the economy as a whole, we try to calculate the impact of changes in the economy's 1 Prague University of Economics and Business, Department of Economic Statistics, nám. W. Churchilla 4, 130 67 Prague, fischerj@vse.cz 2 Prague University of Economics and Business, Department of Economic Statistics, nám. W. Churchilla 4, 130 67 Prague, kristyna.vltavska@vse.cz 95 structure. As the primary method, we use index decomposition, particularly the decomposition of total indicators into the levels effect and substitution effect. The paper is divided as follows. Chapter 2 explains preliminary data available four months after the end of the reference year 2020 and presents the methodology based on the index decomposition. Chapter 3 brings the results and their brief discussion. 2 Data and Methodology 2.1 Data In spring 2021, about five months after the reference period, limited data sources from the national accounts are available for the last reference year. We do not have data based on annual surveys but just the preliminary data from quarterly national accounts. Using quarterly data implies limited interpretation strength of the results. For our analysis, we use the data from the quarterly time series: supply side of G D P (structure of G D P sources by N A C E activities), nominal wages and salaries and data on employment ( F T E and hours worked). As an example of original data, we present gross value added ( G V A ) in table 1. This indicator is close to the gross domestic product (GDP), but we consider it better for an industrial analysis as indirect taxes like V A T do not influence it. The structure of G V A and its change between 2019 and 2020 is described in table 2. Employment can be measured as a number of persons or as a number of hours worked. The number of persons is more accurate while the number of hours worked is more reliable. Furthermore, we can use total employment (employees + self-employed) or just the employees. For the analysis of labour productivity, we use total employment (self-employed persons also contribute to the economic output like G V A ) . However, we consider just employees to analyse wages and salaries because self-employed persons receive mixed-income and not wages. A B+C+D+E F G+H+I J K G V A 2019 (current prices) 111,331 1,516,825 291,555 966,055 305,100 217,294 G V A 2020 (current prices) 109,212 1,473,134 306,728 894,923 320,935 210,271 G V A 2020 (previous years prices) 116,655 1,409,086 281,356 854,460 309,820 211,310 L M + N O+P+Q R+S+T+U Total G V A 2019 (current prices) 483,701 387,387 799,881 110,537 5,189,666 G V A 2020 (current prices) 495,071 361,556 864,901 102,394 5,139,125 G V A 2020 (previous years prices) 468,581 351,600 810,673 97,188 4,910,729 Table 1 Gross value added, mil. C Z K , source: Czech Statistical Office3 . Note: A - Agriculture, forestry and fishing, B+C+D+E - Manufacturing, mining and quarrying and other industry, F - Construction, G+H+I - Trade, transportation, accommodation and food service, J - Information and communication, K - Financial and insurance activities, L - Real estate activities, M + N - Professional, scientific, technical and administrative activities, O+P+Q - Public administration, education, health and social work, R+S+T+U - Other service activities year/industry A B+C+D+E F G+H+I J 2019 2.1 29.2 5.6 18.6 5.9 2020 2.1 28.7 6.0 17.4 6.2 year/industry K L M + N O+P+Q R+S+T+U 2019 4.2 9.3 7.5 15.4 2.1 2020 4.1 9.6 7.0 16.8 2.0 Table 2 Gross value added by N A C E activities (%), source: Czech Statistical Office, own computation. 3 https://www.czso.cz/csu/czso/hdp_ts 96 2.2 Methodology Firstly, we use the simple analysis of the contribution to growth (CTG): we can compute the direct impact of individual N A C E activity i to the change in the gross value added according to the following equation: C T G i = { z W i ' " T 1 Q Q ( 1 ) Secondly, we can compute fundamental ratio indicators: • labour productivity as the G V A in 2019 prices to the total number of hours worked, • average hourly wages as a ratio of nominal wages and salaries to the hours worked of employees, • average monthly wages as a ratio of nominal wages and salaries to the number of employees. Finally, we will decompose the total change in ratio indicators (labour productivity, average hourly wages and average monthly wages) to the effect of changes in individual industries (levels effect) and changes in industrial structure (substitution effect). The methodology of the decomposition is described in detail by Fischer et al. [2]. The authors also bring some terminology remarks using by Lippe [5], Shorrocks [7] and others. Moreover, the decomposition of labour productivity in a particular industry is fully described in [3]. For labour productivity, the essential formulas are as follows. Levels effect: Hf=l ^P2020,t^^2019,t j(L) _ 2ir=i #^2019,1 _ Hi=l lP2020,iHW2o19ii L E 2ir=i fy>2019,1^^2019,1 2f=l 'P2019,iHW2019,; Z™=1 #^2019,1 Substitution effect: Hf=l fo2020,i^M/ 2020,i ,(P) _ H"=l #^2020,1 Hf=l ^2020,1^^2019,1 Yi=1HW2019i l SE ~ v n i„ u w W where lpt,i... the specific labour productivity for activity i in time t, HWt,i... the total number of hours worked in activity i and time t. The total change in labour productivity can be decomposed as follows: 'lp l LE • l SE ' The decomposition formulas for average wages and salaries are based on the same principle. 3 Results and Discussion Table 3 shows the contribution of industries to the decline in G V A . One can see that two industries (mining & manufacturing, trade & transportation & accommodation & food services) contribute by two thirds to the G V A decline. On the other hand, there is a positive impact of public services (0.19 p.p.), agriculture (0.09 p.p.) and ICT (0.08 p.p.). These results are critical for estimates of tax revenues. While manufacturing and trade belong to commercial industries, public services do not. Hence, we can expect that the relation between G D P and tax revenues will get worse. Table 4 describes year-on-year changes in productivity ( G V A per hours worked). The total change is +0.5 with differentiation between industries (from -2.10% in construction through +3.72% in finance and insurance to 97 +6.39% in agriculture). Average hourly wages (Table 5) increased by 5.70% (much higher than the labour productivity!). The hourly wages increased in all industries, from 2.51% in agriculture to 13.11% in real estate activities. Average monthly wages (Table 6) increased by 1.58%, from -3.87% in trade, transportation, accommodation and food services to +12.90% in real estate. The results comply with the economic situation. Table 7 brings the results of the decomposition. We can explain the total increase in labour productivity (+0.5%) by the structural changes (+0.6%). The effect of changes in specific rates of productivity was marginal. The opposite effects occurred in average monthly wages: levels effect contributes by 1.3% to the total increase of 1.6%. Finally, the total change in hourly wages and salaries (+5.7%) is explained by the change in levels by +5.2%. Structural change contributes by 0.5%. A B+C+D+E F G+H+I J K L contribution 0.09 -1.87 -0.18 -1.94 0.08 -0.10 -0.26 M+N O+P+Q R+S+T+U G V A Taxes on products Subsidies on products GDP contribution -0.62 0.19 -0.23 -4.85 -0.70 0.05 -5.60 Table 3 Contribution of industries to variation in Gross value added year/industry A B+C+D+E F G+H+I J K 2019 358 570 285 389 934 1,489 2020 381 570 279 383 917 1,545 year/industry L M+N O+P+Q R+S+T+U Total 2019 1,939 415 370 296 490 2020 1,922 408 377 299 492 Table 4 Hourly labour productivity ( G V A per total employment of hours worked), C Z K year/industry A B+C+D+E F G+H+I J K 2019 190 241 199 206 399 426 2020 195 249 204 213 424 470 year/industry L M+N O+P+Q R+S+T+U Total 2019 204 244 270 191 241 2020 231 251 295 204 255 Table 5 Average hourly wage of employees, C Z K year/industry A B+C+D+E F G+H+I J K 2019 29,080 33,890 30,435 30,931 59,892 61,738 2020 29,572 33,616 30,547 29,735 63,772 65,537 year/industry L M+N O+P+Q R+S+T+U Total 2019 30,485 35,291 38,671 28,193 35,046 2020 34,417 35,732 41,360 27,479 35,599 Table 6 Average monthly wages, C Z K 98 Labour Monthly wages and Hour wages and productivity salaries salaries Changes in specific rates (levels effect) -0.1 1.3 5.2 Structural change (substitution effect) 0.6 0.2 0.4 Total change 0 5 L 5 5/7_ Table 7 Index decomposition (%) We present the very first flash estimate of structural changes on key economic indicators. N o data from annual sources are available for this early analysis, so the interpretation strength is limited. Interpretation obstacles due to the measurement constraints occur: the short-term statistics and the quarterly national accounts are based on some crucial assumptions. One of the basic assumptions is that the stable economy's structure in a short period. The validity of this assumption is a question. 4 Conclusion Pandemia of covid-19 brought challenges for economic analysts and the data measurement as well. This paper tries to use preliminary data for the preliminary estimates of structural changes on some ratio indicators like labour productivity and average wage. In 2020, which was the first pandemic year, labour productivity increased by 0.5 % year-on-year, fully explained by the structural changes. Average monthly wages increased by 1.6 %, almost entirely explained by the levels changes. Average hourly wages, despite the pandemia, increase by 5.7%. Structural changes contributed to the growth by just a tenth (+0.5%) while wages within industries increased by 5.2%. We plan to improve and refine the analysis using additional data for 2020, which we expect to become available in the following months and years. In addition, we plan to analyse some national accounts data like gross output, G D P , G N I and taxes and examine changes in their mutual relations. References [1] Bloom, N . et al. (2020). The impact of C O V I D - 1 9 on productivity. NBER Working Paper Series. N B E R , Cambridge, December 2020. [2] Fischer, J., Flusková, H . & Vltavská, K . (2020). At-Risk-of-Poverty Rate or Social Exclusion in Visegrad Countries 2005-2017: Impact of Changes in Households' Structure. Štatistika, 4, 351-364. [3] Fischer, J., Vltavská, K . , Doucek, P. &Hančlová, J. (2012). The Influence of Information and Communication Technologies on Labour Productivity and Total Factor Productivity in the Czech Republic. Politická ekonómie, 5, 653-674. [4] Kotz, H . H . , Mischke, J., Smit, S. (2021). Pathways for productivity and growth after the C O V I D - 1 9 crisis. VOX, CEPR Policy Portal. M a y 2021. https://voxeu.org/article/pathways-productivity-and-growth-after- covid-19-crisis [5] Lippe, P. (2007). Index Theory and Price Statistics. Essen: Peter Lang. [6] Mischke, J. et al. (2021). W i l l productivity and growth return after the C O V I D - 1 9 crisis? Special Report. McKinsey, March 2021. https://www.mckinsey.com/industries/public-and-social-sector/our-insights/will- productivity-and-growth-return-after-the-covid-19-crisis. [7] Shorrocks, A . F . (2013). Decomposition procedures for distributional analysis: a unified framework based on the Shapley value. J Econ Inequal, 11, 99-126. 99 Productivity analysis in the Mexican food industry Martin Flegl1 , Carlos Alberto Jimenez Bandala2 , Isaac Sanchez-Jua- rez3 , Edgar Matus4 Abstract. Food industry represents an important part in the Mexican economy, including more than 400,000 companies and a Gross Domestic Product of around 16 billion of Mexican pesos in 2019. For that reason, this paper has the objective to analyze its productivity using data from the 2014 and 2019 Economic Censuses related to 2013 and 2018 economic indicators. The paper presents results of a productivity analysis o f 1,672 municipalities from 32 Mexicans states grouped i n eight regions using Data Envelopment Analysis. The results indicate significant differences between regions and, also, an important growth of the productivity i n 2018. Keywords: Data Envelopment Analysis, Food Industry, Regional Development, Economic Asymmetries, Regional Polarization. J E L Classification: C44, E23, L 6 6 A M S Classification: 90-08, 90C05 1 Introduction The food industry i n Mexico represents 4.6% of the national economy [9]. In the last trimester of 2020, the food industry generated 4.35 billion of Mexican pesos (1 M X N is approximately .05 U S D , thereafter pesos) i n Gross Domestic Product (GDP), representing a growth of 5.88% compared to the same period of the previous year. A s Figure 1 shows, the G D P of the Food industry has been constantly increasing during the last almost 20 years, reaching its highest value in 2019 with 16.9 billion of pesos. In 2019, the whole industry included 433,370 economic units (companies), where the highest number of the economic units were registered in Estado de Mexico (27,070), Oaxaca (21,493) and Puebla (17,958). The biggest gross production is reported i n Jalisco (2.25 billion of pesos), followed by Estado de Mexico (1.92 billion of pesos) and Guanajuato (1,43 billion of pesos). Finally, the industry employs 1.9 million of workers (47.4% of males and 52.6% of females), with an average monthly salary of 4,370 pesos [4], ^ -=f in r-- « c\ o —' f-l f i -J >r, '•O r- » G\ O — n C i T ' ^ ' C r - W - O ' i O ^ G i O ^ ^ C i O i C O O O O O O O O O O — — — — — — — — — — o-l Figure 1 Evolution of the G D P i n billions of pesos in the food industry i n Mexico. Constant 2013 prices (own elaboration based on data from [9]). The Mexican economy is characterized by the fact that many of the companies are micro, small and medium companies (MSMEs). The size of the companies is defined by a combination of the number of workers and total sales; M S M E s have less than 250 workers and have annual sales of less than 250 million Mexican pesos [8]. In 1 Tecnologico de Monterrey, School of Engineering and Sciences, Calle Puente 222, Coapa, Arboledas del Sur, Tlalpan, 14380, Mexico City, Mexico, martin.flegl(g>.tec.mx. ORCID: 0000-0002-9944-8475 2 Universidad La Salle Mexico, Facultad de Negocios, Benjamin Franklin 47, Col. Condesa, 06140, Mexico City, Mexico, carlos jimenez(g)lasalle.mx. ORCID: 0000-0003-4431-0054 3 Universidad Autonoma de Ciudad Juarez, Department of Social Sciences, Avenida Universidad S/N, Zona Chamizal, Ciudad Juarez, 32310, Chihuahua, Mexico, isaac.sanchez(g>uaci.mx. ORCID: 0000-0002-1975-5185 4 Universidad La Salle Mexico, Facultad de Negocios, Benjamin Franklin 47, Col. Condesa, 06140, Mexico City, Mexico, ma- caedOO(g)gmail.com. 100 the food industry, there were 420,862 (97.11%) companies with 0-10 employees, 9,312 (2.15%) companies with 11-50 employees, 1,134 (.26%) companies with 51-100 employees and 2,062 (.48%) companies with 101+ employees [4]. It should be noted that the Mexican economy is one of the most unequal in the American continent [5] [6]. The North of the country represents a more developed industry, while the development i n the South lags behind. This has been explained historically by a lack of investment resources i n the South, but also by the existence of internal colonialist structures, in which the South transfers value to the North [10]. In this case, there are two hypotheses of such difference. The first hypothesis is linked to a duality that is defined by socio-cultural elements that suppose a low labor performance i n the South and a lack of industrial productive vocations. This duality has justified the non-intervention of the State to encourage productive chains i n the South of the country, which is why it has been assigned an eminent primary economic task with the intention of sending agricultural products to the North for processing [12]. The second hypothesis assumes that Southern industries being more labor intensive, transferring value to Northern industries. However, companies from the South would be just as productive as those from the North and, therefore, require support and investment to grow just like those from the North [14]. In this sense, the objective of the paper is to analyze the productivity in the Mexican food industry with an application of the Data Envelopment Analysis model. The secondary objective is to verify whether we can observe difference between Northern and Southern regions of the country. The analysis includes two periods 2014 and 2019 to observe how the productivity in the industry changed within a time. We selected the food industry because it is the branch of manufacturing that is closest to the primary sector, which is the most developed sector i n less advanced regions and, therefore, is a branch with a lower demand for capital than other branches such as metallurgy, chemicals, or electronics. 2 Materials and methods 2.1 Data Envelopment Analysis The Data envelopment Analysis (DEA) allows to evaluate several decision-making units ( D M U ) regarding their capabilities to convert multiple inputs into multiple outputs [3]. Each D M U can have m different input quantities to produce different outputs. If the model assumes variable returns to scale, you can use the so-called B C C model [2]. The B C C output-oriented model for D M U 0 is formulated as follows: Maximize s rVro - U0 (1)q = ^ u r (2) r=l subject to s m ^ ur yr ; - ^ ViXij - u o < 0 , j = 1,2,..., n, r=l i=l m ^ ViXi0 = 1, i=l \i-r>v i ^ £ , £ > 0, u0 free in sign. where xtj is the quantity of the input i of the DMUj, yrj is the amount of the output r of the DMUj, and ur and vt are the weights of the inputs and outputs i = 1,2,..., m , j = 1,2,..., n , r = 1,2,..., s and e is the so-called nonArchimedean element necessary to eliminate zero weights of the inputs and outputs. D M U is 100% efficient i f q = 1, i.e., there is no other D M U that produces more outputs with the same combination of inputs, whereas D M U is inefficient i f q > 1. 2.2 Data For the analysis we used the economic indicators related to the Mexican food industry from the 2014 and 2019 Economic Censuses carried out by Institute Nacional de Estadística y Geografia [7] [8]. Each censuses includes information linked to manufacturing, commercial and service activities from companies operating i n Mexico. The 2014 Economic Census refers to the data for 2013 and the 2019 Economic Census refers to data for 2018. In the food industry, we included the information related to the following subsectors in the food industry: agriculturerelated services; preparation of animal feed; grinding grains and seeds and obtaining oils and fat; manufacture of sugars, chocolates, sweets and the like; preservation of fruits, vegetables and prepared foods; manufacture of dairy 101 products; slaughter, packing and processing of meat from cattle, poultry and other edible animals; preparation and packaging of fish and shellfish; preparation of bakery products and tortillas; other food industries; and branches grouped by the principle of confidentiality. This information is linked to the Mexican municipalities as it is not possible to identify companies due to the confidentiality of the Economic Censuses. Moreover, to be able to compare the productivity between 2013 and 2018, we only included municipalities that appear i n both Economic Censuses. In the end, the analysis includes 1,672 municipalities that represents 67.91% of the whole Mexico. Table 1 displays the division of the municipalities among the 32 Mexican states5 . These 1,672 municipalities include information from 164,558 economic units in 2013 and from 189,590 economic units in 2018. Moreover, the results from both Censuses are comparable as we used constant prices of 2018 for the 2014 Economic Census. State # of municipalities State # of municipalities Aguascalientes 9 (11) - 8 1 . 8 2 % Morelos 28 (33)- 84.85% Baja California 5 (5) - 100.00% Nayarit 19 (20) - 95.00% Baja California Sur 5 (5) - 100.00% Nuevo Leon 3 1 ( 5 1 ) - 60.78% Campeche 11 ( 1 1 ) - 100.00% Oaxaca 218 (570) - 38.25% Chiapas 74 (122) - 60.66% Puebla 156 (217) - 7 1 . 8 9 % Chihuahua 34 (67) - 50.75% Queretaro 16 (18)- 88.89% Ciudad de Mexico 16 (16) - 100.00% Quintana Roo 7 ( 1 1 ) - 63.64% Coahuila 25 (38)-65.79% San Luis Potosi 42 (58) - 72.41% Colima 10 (10) - 100.00% Sinaloa 16 (18)- 88.89% Durango 30 (39)-76.92% Sonora 33 (72) - 45.83% Estado de Mexico 114 (125)-91.20% Tabasco 17 (17) - 100.00% Guanajuato 41 (46)-89.13% Tamaulipas 26 (43) - 60.47% Guerrero 67 (81)-82.72% Tlaxcala 49 (60) - 81.67% Hidalgo 67 (84) - 79.76% Veracruz 159 (212) - 7 5 . 0 0 % Jalisco 108 (125) - 86.40% Yucatan 75 (106)-- 70.75% Michoacan 104 (113)-92.04% Zacatecas 42 (58) - 72.41% Table 1 Division of the municipalities among the Mexican states. 2.3 Structure of the model The input part of the D E A model summarizes the resources of each municipality i n the food industry: • Personnel: Hours worked by the personnel i n thousands of hours (HWP). This variable represents the labor factor of the production, therefore, a greater number of hours worked by the personnel for the same level of production would indicate lower productivity. • Material: Raw materials and materials in millions of pesos ( R M M ) , Number of economic units (TMEU). These variables indicate the material inputs of the production necessary for the transformation. Higher productivity is associated with less use of materials and economic units. • Finance: Total expenditures i n millions of pesos (TE), Total personnel remunerations in millions of pesos (TPR). These variables represent the salary expenses of the industry and all expenses used in the production. Therefore, it is implicit that the higher the expenditure with the same level of production, the lower the productivity is. The output part of the D E A model includes: • Total gross production in millions of pesos (TGP). This variable measures the economic results of each municipality in terms of volume. The selection of the inputs and outputs follows the common structure of D E A models in the agricultural productivity analysis [1] [11] [15]. We used the B C C output-oriented model as the intention is to analyze the productivity level of each municipality related to their economic results. The B C C model is used as we consider a direct competition in the food industry. Finally, we used M a x D E A 7 Ultra software for all the calculations. The importance of the inputs and outputs with e = .5, which best balanced the model, is as follows: H W P 2.59%, R N M 6.67%, N E U 3.96%, T E 83.60%, T P R 3.18%, and 100% incase of the 2014 model, and H W P 4.46%, R N M 19.11%, N E U 5.74%, T E 63.69%, T P R 6.99%, and T G P 100% in case of the 2019 model. 5 Mexico is divided into 2446 municipalities and Mexico City (Ciudad de Mexico) is divided into 16 parts. 102 3 Results The average productivity of the Mexican municipalities i n 2013 is .3498 with standard deviation of .156 and 30 municipalities reached the productivity of 1.0 (representing 1.79% of the analyzed sample). This result indicates very low productivity in the food industry in Mexico. A s Figure 2 in the Appendix shows, we cannot identify a region (state) with very high productivity. The municipalities with the 1.0 productivity are placed across Mexico. To understand a little bit more the obtained results, we divide the municipalities according to their geographical dependence6 . Table 2 shows that the highest average productivity i n the food industry in 2013 is reported in the Southeast region (.4003), one of the biggest considering the number of municipalities, but it is also a region with the highest variability measured by the standard deviation (.1895). What is more, the Southeast region is the only one evaluated above the country average. O n the other hand, the West region reported the lowest average productivity in Mexico (.3179) with the lowest variability (. 1090). Applying the Games-Howell test7 , the differences in productivity between the regions are statistically significant (p < .001). More specifically, the average productivity of the municipalities i n the Southeast region is statistically higher compared to the rest of the regions (considering the confidence level of 99%). The rest of the differences are not statistically significance, except the difference between the East region and the Center North (p = .083) and West (p = .012) regions. Regions Mean Std. Deviation JV Center North .3238 .1464 150 Center South .3324 .1305 158 East .3490 .1528 431 Northeast .3248 .1390 82 Northwest .3360 .1540 123 Southeast .4003 .1895 377 Southwest .3440 .1475 110 West .3179 .1090 241 Average .3498 .1557 1,672 Table 2 Average productivity by geographical regions in 2013 In 2018, the average productivity of the Mexican municipalities increased by +.2343 up to .5841 with a standard deviation of .1163. In this case, 36 municipalities reached the 1.0 productivity (representing 2.15% of the analyzed sample). A s Figure 3 in the Appendix illustrates the improvement i n the productivity of the industry can be seen all around the country, which resulted that the difference between 2013 and 2018 is statistically significant (p < .001). The best evaluated region is now Northeast (.6342) whose average productivity increased by +.3094, followed by the Northwest region (.6223, +.2863). The Southeast region that was evaluated as the best region i n 2013 is evaluated as the 3rd worst region in 2018, with the average productivity of .5729, because its productivity improved i n the smallest proportion (+. 1726). The worst evaluated region is the East region with the average productivity of .5586, with and improvement of +.2096 compared to 2013. Five out of the eight regions are evaluated above the Mexican average. The Games-Howell test indicates statistically significant differences between the regions. For example, the Northeast and Northwest regions compared to the rest of the regions (p < .001), and West and Center South regions compared to the East, Southeast and Southwest regions (confidence level of 95%) in almost all cases (only few exceptions can be observed). The changes in the productivity of the periods can be explained by the economic structure of Mexico itself. The Southern regions, eminently agricultural, send their largest production to the Northern regions for processing. In 2013-2014, the international oil prices increased the gasoline prices and, as the major transportation of goods i n Mexico is done by roads, it is a reason why the Northern regions, further away from agricultural production, were less productive than those in the South closer to the agricultural centers. With the above we can point out that developing industrial centers, particularly food centers, in agricultural areas would have positive results in productivity. This, together with an active industrial development policy, would have a positive impact on regional economic development and would combat regional asymmetries [13]. 6 Mexico is divided into eight geographical regions: Northwest (Baja California, Baja California Sur, Chihuahua, Durango, Sinaloa and Sonora), Northeast (Coahuila, Nuevo Leon and Tamaulipas), West (Colima, Jalisco, Michoacan and Nayarit), East (Hidalgo, Puebla, Tlaxcala and Veracruz), Center North (Aguascalientes, Guanajuato, Queretaro, San Luis Potosi and Zacatecas), Center South (Ciudad de Mexico, Estado de Mexico and Morelos), Southeast (Chiapas, Guerrero and Oaxaca) and Southwest (Campeche, Quintana Roo, Tabasco and Yucatan). 7 Games-Howell is a nonparametric test that does not assume equal variances and the same sample size. In our case, there are significant differences regarding the number of municipalities between the regions and the variances are different (Levene's test p < .001). All the statistical tests presented in this article are based this test. 103 Regions Mean Std. Deviation N Difference 2018-2013 Center North .5880 .1022 150 +.2642 Center South .6003 .0877 158 +.2679 East .5586 .1038 431 +.2096 Northeast .6342 .1318 82 +.3094 Northwest .6223 .1231 123 +.2863 Southeast .5729 .1410 377 +.1726 Southwest .5701 .1105 110 +.2261 West .6038 .0974 241 +.2859 Average .5841 .1163 1,672 +.2343 Table 3 Average productivity by geographical regions in 2018 4 Conclusions The objective of the paper was to analyze the productivity in the Mexican food industry using data from the 2014 and 2019 Economic Censuses. The results revealed that i n 2013, the highest productivity was reported i n the municipalities of the South regions of Mexico, which did not confirm the historical observation of a lower development of these regions i n Mexico. However, this could have been caused by the rise of the 2013-2014 international oil prices that negatively affected the transportation of goods in Mexico. Regarding the obtained results i n 2018, we can observe significant improvements i n the productivity i n the food industry across the whole country. But the improvements were higher in the municipalities of the Northern regions of Mexico, which correspond to the historical development of Mexican economy, where the Northern regions concentrate more developed industry. The further research could extend the analysis including data from level of investments i n the industry. This piece of information would explain whether higher productivity leads to higher investments in the Mexican food indus­ try. 5 Acknowledgements This research was carried out within the framework of the project "Patterns of success and failure in the economic evolution of businesses identified from data mining and artificial neural networks", A3-S-129311, of the C O N A C Y T - I N E G I Sector Fund. 6 References [1] Arita, S., and Leung, P.S.: A Technical Efficiency Analysis of Hawaii's Aquaculture Industry. Journal of the WorldAquaculture Society 45(3) (2014), 312-321. https://dx.doi.org/10.1111/jwas. 12124 [2] Banker, R. D . , Charnes, A . , and Cooper, W . W . : Some Models for Estimating Technical and Scale Inefficiencies i n Data Envelopment Analysis. Management Science 30(9) (1984), 1078-1092. http://dx.doi.org/10.1287/mnsc.30.9.1078 [3] Cooper, W . , Seiford, L., and Zhu, J.: Handbook on data envelopment analysis. Nueva York: Springer, 2011. [4] D a t a M E X I C O : Industria alimentaria [Food industry]. Secretaria de Economia [Ministry of Economy], 2021, [Online], available: https://datamexico.org/es/profile/industry/industria-alimentaria7vearSelectorGdp=timeOption0 [1 June 2021]. [5] Dávila, E., Kessel, G., and Levy, S.: E l sur también existe: un ensayo sobre el desarrollo regional de Mexico. Economia Mexičana. Nueva Epoca 11(2) (2002), 205-260. [6] Garcia-Almada, R.: Liberalization comercial, descentralización territorialy polarization económica en Mexico. Ciudad Juárez: Universidad Autonoma de Ciudad Juarez, 2012. [7] INEGI: Mexico - Censos Económicos 2014. Institute Nacionál de Estadística y Geografia [National Institute of Statistics and Geography], 2014, [online], available: https://www.inegi.org.rnx/prograrnas/ce/2014/ [1 June 2021]. [8] INEGI: Mexico - Censos Económicos 2019. Institute Nacionál de Estadística y Geografia [National Institute of Statistics and Geography], 2019, [online], available: httos://www.inegi.org.mx/rnrn/index.php/catalog/547 [1 June 2021]. [9] INEGI: Mexico - Sistema de Cuentas Nacionales. Institute Nacionál de Estadística y Geografia [National Institute of Statistics and Geography], 2020, [online], available: https://www.inegi.org.mx/sistemas/bie/ [1 June 2021]. 104 [10] Jimenez-Bandala, C. A . : Development i n Southern Mexico: Empirical Verification of the "Seven Erroneous Theses about Latin America". Latin American Perspectives 45(2) (2018), 129-141. https://dx.doi.org/10.1177/0094582X17736036 [11] Marcikic Horvat, A . , Mafkovski, B . , Zekic, S., and Radovanov, B . : Technical efficiency of agriculture i n Western Balkan countries undergoing the process of E U integration. Agricultural Economics 66(2) (2020), 65-73. https://doi.org/10.17221/224/2019-AGRICECON [12] Myrdal, G.: Economic Theory and Under-developed Regions. London: Gerald Duckworth, 1957. [13] Revilla, D., Garcia-Andres, A., and Sanchez-Juarez, I.: Identification of key productive sectors in the Mexican Economy. Expert Journal of Economics 3(1) (2015), 22-39. [14] Stavenhagen, R.: Seven erroneous theses about Latin America. New University Thought 4(4) (1965), 25-37. [15] Toma, E., Dobre, C , Dona, I., and Cofas, E.: D E A Applicability i n Assessment of Agriculture Efficiency on Areas with Similar Geographically Patterns. Agriculture and Agricultural Science Procedia 6 (2015), 704- 711. https://doi.Org/10.1016/i.aaspro.2015.08.127 7 Appendix Figure 2 Productivity in the food industry by municipalities in 2013 (own elaboration using GeoNames, M i crosoft tool). Figure 3 Productivity in the food industry by municipalities in 2018 (own elaboration using GeoNames, M i crosoft tool). 105 Evaluation and testing of non-nested specifications of spatial econometric models Tomáš Formánek1 Abstract. Spatial econometric models have generally non-nested specifications if they are based on different spatial setups (connectivity and weight matrices). For maximum likelihood estimators and non-nested models, the usual tests (likelihood ratio, Wald, et.c) gnerally cannot be used for model selection and/or testing. This article provides a structured discussion on estimation and evaluation (selection and/or testing) of nonnested spatial models. The distinction between model selection and model testing is important. While model selection algorithms approach all models symmetrically, the null and alternative models are treated differently for testing purposes. The empirical part of this paper provides an illustrative application of the evaluation methods discussed. Emphasis is given to models estimated by maximum likelihood approaches and to Vuong's test, which is derived from the Kullback-Leibler information criterion. Keywords: spatial model, model selection, non-nested models. J E L Classification: C23, C31, C52, E66 A M S Classification: 91B72 1 Introduction Spatial econometrics models address the existence of spatial dependency in observed data, as well as some general and theoretically defined interactions among variables. Should spatial effects be ignored or left without proper treatment, model estimation would lead to biased and inconsistent results. Typically, one would estimate parameters for a relevant and explicitly defined (prior) spatial dependency pattern - along with the "usual" model coefficients that describe macroeconomic dependencies [1]. In principle, spatial dependency is quite similar to autoregressive processes in time series. However, spatial dependency is neither "one-dimensional" (on a time axis) nor unidirectional (current observations being dependent on past values). In most economic applications, spatial units are interdependent and the strength of their interactions is defined in terms of distances, rather than oriented distances (although analyses with focus on core-periphery behavior and similar topics exist, where oriented distances are relevant). Generally speaking, spatial dependency is often described as a non-continuous function, decreasing with distance between spatial units. The possibilities of measuring distances and distinguishing neighbors (close, interacting, spatially dependent units) from distant (non-interacting units) are numerous, with a multitude of possible approaches and their combinations. For example, one may evaluate distances given as shortest connections between two nearest point of given regions (i.e. polygons on a map) or use distances between geographical center-points. Similarly, either L\ or L2 norms may be used for measurements and neighbors can be cast through the application of a common border rule (contiguity-based neighborhood definition), etc. Nevertheless, the true pattern (functional form) of spatial dependency is not known in most empirical cases. Hence, researches would typically have to consider several spatial structures (neighborhood definitions) in order to evaluate and select a "good" spatial setup for their analyses and/or predictions. Unfortunately, if two or more spatial econometric models are based on different spatial setups, they are generally non-nested. For the widely used likelihood-based estimators, the non-nested nature of spatial models based on alternative neighborhood definitions implies that most common statistics cannot be used for model selection and/or testing [8, 9]. This article provides a structured discussion on the specifics of estimation and testing of non-nested spatial models. The remainder of this paper is structured as follows: Section two describes key features and aspects of spatial model estimation and testing, along with references to fundamental literature. Section three provides an illustrative application based on unemployment dynamics in selected E U countries for the time period 2014 - 2019. Section four and the list of references conclude this contribution. 1 VŠE Praha, nám. W. Churchilla 4 Praha 3, Prague 130 67, Czech Republic, formanek@vse.cz 106 2 Estimation, evaluation and testing of spatial models To formalize and estimate spatial models, researches start by defining the underlying spatial structure. To classify any two spatial units as either neighboring or distant, one would typically use a connectivity matrix C. For example, individual units of the connectivity matrix may be cast as follows: 0 i f i = j , 0 i f dij>T, (1) 1 i f dij < T and i + j , where dij = dji is the distance between two spatial units. For c y = 1, units i and j are neighbors, i.e. are sufficiently close to interact and influence each other mutually, and vice versa. Here, symmetry of the (NxN) matrix C , (i.e. c y = cji) is implied from the use of distances dij. Zeros on the main diagonal of C indicate that spatial units are not neighbors to themselves by definition. Parameter T is a heuristically selected maximum distance (threshold) between two neighbors. The choice and changes in T can have significant impacts on the results of spatial model estimators, which use spatial structure as prior information. A s we deal with spatial units of non-zero surface area, we usually use representative points - centroids - to measure distances between regions. Alternatively, expression (1) can be evaluated using closest border-to-border aerial distances, based on transport infrastructure, etc. Spatial econometric models are based on a transformation of the connectivity matrix C from (1): a spatial weights matrix W is calculated by row-wise standardization of C. The transformation rule is relatively simple, each element of W is calculated as w y = Cij/ZZf=l dj s o t h a t a 1 1 row sums in W equal one: IZjLi Wij = 1 for Vi. Please note that W matrices are no longer symmetric. Besides the maximum neighbor distances as outlined in expression (1), different plausible approaches exist for construction of both matrices - C (common border rule, fe-nearest neighbors approach, etc.) and W (multiple standardization schemes can be applied). Detailed technical discussion of this topic is provided by [1]. A s we gather available prior information for the construction of a spatial regression model, the actual neighborhood structure is largely unknown. While there may be quite a few useful leads that analysts can use, the choice of r-parameter in (1) is arbitrary and different approaches and combinations of C and W construction are possible. LeSage and Pace [5] show that theoretically well-defined models can be estimated and tested with adequate precision, even if the spatial structure used for estimation is somewhat inaccurate (i.e. i f it slightly deviates from the actual yet unknown neighborhood definition). To summarize their findings, one can start from a cross sectional model with spatially dependent endogenous variable: y — AWy + X/5 + u, (2) where y is the /V x 1 dependent variable vector, X is the usual N xK matrix of regressors (includes the intercept element), u is the random element and W is the spatial weights matrix. Model parameter A is a scalar describing the strength of spatial dependency and fi is a vector of parameters (say, describing economic dynamics). Besides the commonly used specification (2) with spatial dependency on y, other types of spatial models exist - both in literature and in empirical applications [1, 5]. In most types of spatial models, the generalization from crosssectional to panel data is relatively straight forward, very similar to the case of non-spatial models. Finally, it should be noted here that jS-parameters are not the marginal effects. However, using W and the estimates of A and /?, direct and spillovers marginal effects can be calculated easily [3]. Under very general conditions [3], the maximum likelihood ( M L ) estimator may be used to produce estimates of all parameters of model (2): /?, A and random element variance cr2 . Assuming normal distribution of the error elements, M L function for equation (2) can be cast as LL(0) = - | log {ino-2 ) + log \IN - AW\ - ^ u'u , (3) where 0 - (/?, A, cr1 ), u - y - AWy - X/i and det(dufdy) - \IN - AW\ is the Jacobian. Using eigenvalues K of matrix W, the condition A e (min(/f)_ 1 , max(A-)- 1 ) must be fulfilled to ensure model stability [3]. In theory, maximization of (3) is based on first order conditions dLL/d/3 = X'(IN-AW)y-X'X/3 = 0, (4) dLL/dA = -tr UlN - AW)'1 ] W - u'Wy = 0, (5) 2crz dLL/dcr2 = - ^ + ^ u ' u = 0 . (6) 107 so that The equation system (4) - (6) has no convenient analytical solution. However, an accessible estimation algorithm can be used to produce coefficient estimates - shown next in expressions (8) through (14). Regularity conditions holding, M L estimate is asymptotically efficient and reaches the Cramer-Rao lower bound for variance, given by the inverse of the information matrix: [I(6>)]_ 1 = -E [d2 LL(0)/d0d0']~l • (7) To estimate the parameters of equation (2), we start by O L S estimation of auxiliary regressions: v = X / 3 0 + « o , (8) Wy = Xfid + ud, (9) 00= (X'X)-l X'y, 0d = ( x ' x y 1 x ' w y , eo = y - X0o, ed = Wy - X0d, where 0 and e denote estimated parameters and residuals respectively. B y means of the auxiliary regressions (8) and (9), we can re-write the residuals of model (2) as e(A) = e 0 - A e d . (10) Using e(A)'e(A) = e'0eo -2Ae'Qed+A2 e'ded, we can formulate a concentrated version of the log-likelihood function (3) that only depends on the A parameter: LL(A) = c + log\IN-AW\- | log [e(A)'e(A)] , (11) where c is a constant, independent of A. To maximize the M L function (11), the interval A e ( m i n ^ ) - 1 , max(A-)- 1 ) is split into multiple "tiny" intervals along A\ < A2 < • • • < Aq values and we evaluate (11) at each A. Arbitrarily accurate A estimates that maximize (11) can be produced by additional interval splitting and (11) evaluation. Subsequently, A is used to produce the remaining parameters of (2) as follows: 0 = 0O-A0d, (12) fr2 =N-1 e(A)'e(A), (13) var(«) = i l = &2 [(IN - AW)'(IN - AW)]~l . (14) Besides the approach described here, technical discussion and references to alternative estimators for different model generalizations are available from [1, 3]. 2.1 Model evaluation and testing Figure 1 illustrates a situation where we have two alternative spatial structures C that are (generally) plausible, yet distinct. If such structures are used to cast spatial models in the general form of equation (2), they would be non-nested, which follows from the nature of the R H S element AWy - even if the Xfi element of (2) does not change. After estimating such model specifications, the next logical step would consist of focusing on evaluation and comparison of such econometric models. Evaluation and comparison of models can be approached using two distinct paradigms: model selection and models testing. Those are two relevant yet conceptually different tasks. Generally speaking, model selection treats all models symmetrically, while testing approaches the null and alternative models differently. When model selection is performed, a definitive output is generated - a model is selected. However, hypothesis testing does not seek nor does it provide such outcome: rejecting the null hypothesis (model specification) does not imply acceptance of the alternative specification. Importantly, the choice of null hypothesis may be arbitrary and it may also greatly affect the interpretation of test outcome [8]. Also, there is a firm empirical motivation for the distinction between model selection and hypothesis/model testing. The selection paradigm is more pertinent as a decision making tool. Hypothesis/model testing is mostly used for inferential applications (e.g. for assessing validity of theoretically determined predictions). 108 Figure 1 Alternative neighborhood structures used for model estimation: distance-based with maximum distance between neighbors set at 250 k m (left) and contiguity-based (right) Both the hypothesis testing and model selection approaches are valid and relevant. Pesaran [8] points out that model selection algorithms are mainly based on statistics of model fit to the data (maximized log-likelihoods, minimized information criteria or sum of squared residuals). The empirical (illustrative) section of this article features both model evaluation approaches: Vuong's test (16) is used for model testing and model selection is based on maximized log-likelihood values. In general, various statistics can be used for comparing non-nested models. The J-test, JA-test, N-test and NT-test [8] can be listed among the most important and widely used statistics for non-nested models. However, those are suitable for OLS-estimated models. Their application extends to the instrumental variable regression but not towards ML-based estimators. Vuong [9] provides a feasible method (statistic) for testing non-nested ML-estimated spatial models. His approach is based on the Kullback-Leibler information criterion (KLIC). The K L I C criterion reflects a maximized loglikelihood value difference between a M L estimator applied to a misspecified model, say g{y\Z, fi) and to a true model h(y\Z, a). Formally, K L I C can be cast as: KLIC = E [logh(yi\zi,a)\hhtrue]-E [logg(yi\zi,fi)\hhtrue], (15) where zi is a full set of regressors, i - 1,..., N is used to identify individual observations, a and fi are model parameters. Even if the true model specification h(-) is unknown, K L I C can be used for testing. Using Vuong's test (16), we can compare two alternative functions - say go and gi - and determine whether they are equivalent or whether one of the functions is closer to the true (unobserved) specification. If expression (15) is produced for both go and g\, than taking the difference of K L I C for the two gi and go functions effectively eliminates the likelihood function of the true specification h. Hence, Vuong's test statistic V can be expressed as V = = VN (m/sm) , mi = log L , , i - log Li>0, (16) where L ^ i and Lj,o are likelihood functions of gi and go evaluated at a given observation Elements m and sm refer to sample means and standard deviation of m,-. Under the null hypothesis of both tested models being equally good (i.e. equally distant from the true model h), V asymptotically follows standard Normal distribution. Interestingly, Vuong's test has a directional interpretation: If g\ is substantially better than go (i.e. closer to h), V diverges and plim V - +oo (and vice versa). Additional technical discussion covering (among other topics) nested and partially nested g\ and go specifications is provided by [8, 9]. 109 3 Empirical illustration: unemployment dynamics The above discussion is illustrated using a spatial econometric model focused on labor market dynamics. For estimation purposes, panel data were collected for the following economies: Austria, Belgium, Czechia, Denmark, Germany, Hungary, Luxembourg, the Netherlands, Poland, Slovakia and Slovenia (110 N U T S 2 regions in total), with annual observations covering the period 2014 - 2019. A l l variables and geographic information was downloaded from Eurostat into R software for processing and estimation [2, 7]. Unemployment is modelled using the regional competitiveness paradigm [4]. Hence, unemployment is given as a function of G D P growth, industrial production prominence on the labor market, energy consumption and country-level effects (besides the N U T S 2 unobserved effects that are intrinsic to panel data models). The generalization from a cross-sectional model (2) to a panel data is discussed in detail by [3] and the regression equation used for modelling unemployment dynamics may be cast as follows: v = / l ( / r ®W)y + XB+u, (17) u = (iT ® IN) n + e where y is the vector of NT observations of the dependent variable and X is a NTxK matrix of regressors. Elements IT and IN are identity matrices with dimensions given by their subscripts, ij is a vector of ones and ® denotes Kronecker product. The elements W, A and /3 follow from equation (2), // are the unobserved individual effects and s is a spatially random error term of the model. Variables for our illustrative application are as follows: individual observations yu of the dependent variable reflect unemployment rates in percentage points (drawn from Eurostat's "lfst_r_lfu3rt" dataset) and matrix X contains the following regressors: l o g ( G D P ! f ) is the log-transformed real G D P per capita (Eurostat's "nama_10r_2gdp" is used for G P D and "ei_cphi_m" for inflation adjustment), RE_B_E{t is the relative employment in industrial sectors ( N A C E rev.2 sectors B to E, drawn from "Ifst_r_lfe2en2" dataset), variable \og{Engyit) is the log-transformed total energy consumption (measurements available only at the NUTSO-level) and NUTSOj is a vector of country-specific dummy variables (Austria is left out and serves as a reference/base unit). Table 1 Comparison of model estimates for alternative spatial structures Impact / A Estimate Std. Error z-value P r ( > |z|) (simulated) (simulated) (simulated) Distance-based spatial structure used, T = 250 km: log GDP_Direct_Imp -3.164 0.376 -9.597 0.000 log GDP_Indirect_Imp -10.381 2.609 -4.100 0.000 RE_B_E_Direct_Imp -11.519 2.444 -4.697 0.000 RE_B__E_Indirect_Imp -33.078 10.514 -3.229 0.001 A 0.776 0.031 25.365 0.000 Log likelihood (LL) -863.908 Contiguity-based spatial structure log GDP_Direct_Imp -4.360 0.401 -10.864 0.000 log GDP_Indirect_Imp -7.206 1.165 -6.210 0.000 RE_B_E_Direct_Imp -9.555 2.693 -3.539 0.000 RE_B_E_Indirect_Imp -15.791 5.011 -3.164 0.002 A 0.670 0.030 22.626 0.000 Log likelihood (LL) -891.724 The model was estimated in R by means of the splm package [6] and using two different spatial setups, as shown in Figure 1. The first spatial structure shown in Figure 1 (left) follows from expression (1) with maximum distances among neighbors (interacting units) set at T = 250 km. The alternative spatial structure is constructed along contiguity (common border) rules - two regions are considered as neighbours i f they share a common border. As discussed in previous sections, spatial models with two different spatial structures - and W matrices - are non-nested. Table 1 shows estimation results for the two non-nested spatial panel models based on equation (17) and differing in their spatial prior information only. Table 1 includes the estimated A coefficients, along with direct and indirect 110 (spillover) effects of the main regressors observed at N U T S 2 level. Due to space constraints for this contribution and given their limited interpretation, state-level (NUTSO) regressors are omitted from Table 1. However, all the estimation output is available from the author upon request, along with corresponding R code and data. From Table 1, we may clearly identify prominent differences among the estimated marginal effects. B y simply comparing the maximized log-likelihood values, one would slightly prefer and select the specification that features distance-based neighbors. However, Vuong's test based on expression (16) clearly favours the contiguity based structure, with V - -30.435 i f contiguity based model is used as the base specification. Reversing the null and alternative "roles" for the two spatial setups just flips the sign of Vuong's test with no consequence in terms of test interpretation. Based on the Vuong's test, we conclude that contiguity-based setup is significantly closer to the true specification (this conclusion holds at any reasonable significance level). 4 Conclusions In most empirical applications of spatial econometrics, spatial prior information is used to distinguish close units (neighbors) from distant units that are independent. However, spatial econometric models always face uncertainty with respect to the true yet unobservable spatial structure. This article provides a structured and relatively simple approach towards estimation and testing of non-nested spatial models that are based on alternative spatial structures. The discussion provided covers both model estimation methodology and Vuong's test for non-nested specifications, derived from the Kullback-Leibler information criterion. Acknowledgements Institutional research support provided by Faculty of Informatics and Statistics, University of Economics, Prague. Geo-data source: GISCO-Eurostat (European Commission), Administrative boundaries: © EuroGeographics. References [1] Anselin, L . (1988). Spatial econometrics: methods and models. Dordrecht: Kluwer. [2] Bivand, R. & Piras, G . (2015). Comparing implementations of estimation methods for spatial econometrics. Journal of Statistical Software, 63(18), 1-36. [3] Elhorst, J. P. (2014). Spatial Econometrics: From Cross-sectional Data to Spatial Panels. New York: Springer. [4] Formanek, T , and Husek, R. (2016). On the stability of spatial econometric models: Application to the Czech Republic and its neighbors. In A . Kocourek & M . Vavrousek (Eds.), Mathematical Methods in Economics (pp 213-218). Liberec: T U Liberec. [5] LeSage, J. P., & Pace, R. K . (2014). The biggest myth in spatial econometrics. Econometrics, 2(4), 217-249. [6] Millo, G., & Piras, G . (2012). splm: Spatial panel data models in R. Journal of Statistical Software 47(1), 1—38. [7] Pebesma, E . (2018). Simple features for R: standardized support for spatial vector data. The R Journal, 10(1), 439-446. [8] Pesaran, M . H . : Time Series and Panel Data Econometrics. Oxford University Press, Oxford, 2015. [9] Vuong, Q.H. (1989). Likelihood Ratio Tests for Model Selection and Non-Nested Hypotheses. Econometrica, 57(2), 307-333. I l l The link between DEA efficiency and return to assets Lukáš Frýd1 , Ondřej Sokol2 Abstract. The data envelopment analysis (DEA) is a standard tool in the analysis of determinants of the efficiency of economic agents. A two-stage efficiency analysis consisting of estimating D E A efficiency and then using it as dependent variable in regression with various determinants, is often used for this purpose. In this article, we focus on the relevance of this approach using empirical agricultural data. In particular, we show that the lagged D E A efficiency of agricultural companies is not correlated with return on assets indicator, and, similarly, lagged return on assets are not correlated with D E A efficiency. Low cross-correlation values indicate that a two-stage analysis using the D E A method can lead to misleading results. Keywords: data envelopment analysis, agricultural, cross-correlation J E L Classification: C50 A M S Classification: 90C90 1 Introduction The data envelopment analysis (DEA) is one of the most popular efficiency estimator. Due to its non-parametric approach and the possibility of multiple outputs, the D E A is widely used in analysis of efficiency determinants, see overview of D E A models by Emrouznejad and Yang [3]. The resulting efficiency estimates are often used in various two-stage efficiency analysis. In the first stage, the efficiency is estimated. In the second stage, the statistical significance of the variables affecting the estimated efficiency in the first stage is studied. The ability to estimate efficiency is demonstrated in simulation studies where efficiency is simulated using non-linear production functions. However, differently specified production functions lead to different efficiency estimates. Furthermore, D E A efficiency estimates are sensitive to the presence of outliers and measurement errors, etc., [2, 7]. The possibilities of correcting these shortcomings are studied with respect to the distribution of the overall D E A efficiency. The methods do not take into account changes in the order of effective units and even small differences in the definition of inputs have a significant impact on the final ranking of D E A units. In this case, the two-stage method can provide considerably misleading results. In this article, we focus on linking D E A efficiency and economic performance of agricultural companies - farms. We stem from the premise that efficient companies tend to follow better economics indicators trajectory in the long run than inefficient companies. If the D E A method is able to estimate the efficiency of a given farm, then there should be a positive relationship between efficiency at time t and economic results of the company at time t + h. At the same time, it can be assumed that economic results at time t - h can positively affect efficiency at time t. Specifically, we are examining the relationship between the lagged production efficiency of farms at present and their return on assets (ROA) and conversely between lagged ROA and D E A efficiency. The analysis is performed on the data of Czech agricultural enterprises in the period between 2008 and 2015 and the results suggest that there is an unexpectedly weak correlation. The structure of the paper is as follows. In Section 2, we describe the used dataset and the methodology used for obtaining efficiency estimates and correlation estimates. In Section 3, we discuss the estimated correlation between D E A efficiency and ROA. 1 Department of Econometrics, Prague University of Economics and Business, Winston Churchill Square 4, 13067 Prague, Czech Republic, lukas.fryd@vse.cz 2 Department of Econometrics, Prague University of Economics and Business, Winston Churchill Square 4, 13067 Prague, Czech Republic, ondrej. sokol @vse.cz 112 2 Data and methodology 2.1 DEA method Let «1 is the number of inputs, «2 the number of outputs and m + 1 the number of decision making units ( D M U ) . Consider • IQ € R " 1 is the input nonnegative vector for D M U o , • Oo € R " 2 is the output nonnegative vector for D M U o , • / e R m x " i is the input nonnegative matrix for the other D M U s , • O € R m x " 2 j s m e output nonnegative matrix for the other D M U s . As we need to estimate the efficiency of each unit, we run D E A optimization model. In particular, we use the linear approximation of the model [5] to compute the efficiency of given DMUo- Hladfk's model uses the efficiency scale from 0 to 2. Similarly to common D E A approaches, the higher score of production unit means, that the unit remains efficient for larger variation of all data. Equally, the lower score means that unit would be inefficient for larger variation of all data. The borderline of efficient units is equal to 1. The score is based on the largest allowable variation of all input and output data such that unit remains efficient or the smallest variation of data to become efficient in case of inefficient unit [5]. While the model was published only recently, it was already used in several empirical studies, see [4, 6]. The linear model is as follows 6* - max 6 U,V s.t. O Q « > 1 + 6, lf}v 0, where r — 1 + 6* is the resulting efficiency score of chosen D M U o - A s stated above, i f r e (0,1) then the D M U o is inefficient and i f r e [1,2) indicates that D M U o is efficient. In order to extract vectors of input and output weight, we can compute u := w/(l - 8) and v := v/ (1 - 5) with u and v represent the vectors of input and output weights, respectively. 2.2 Data We use a data set of 220 Czech farms in the period 2008 to 2015. The data set is balanced panel. The data come from Farm Accountancy Data Network (FADN). F A D N is an agriculture database, maintained under auspices of the European Commission. F A D N participation is voluntary for farms. Data from a sample of farms are sent to a national branch of F A D N (so-called liaison agent), which transmit the data to the global F A D N database and is responsible for the international comparability of the data, i.e., the methodology of the collecting the data shall be the same in every E U country. The survey does not cover all the farms in the Union but only those which due to their size could be considered commercial. In total, the F A D N sample consists of about 80 000 holdings and represent about 5 million farms using about 90 percent of the total utilized agricultural area and producing about 90 percent of agricultural marketable output. The database consists of ~ 5000 economic and other variables on an annual basis. We use 4 inputs and 3 outputs for the D E A estimation of the technical efficiency based on similar recent studies (see for example [1]). We also used this approach in [4]. Our list of inputs consist of 1. Total labor input in annual work units (AWU), 2. Total utilized agricultural area in ha, 3. Depreciation and interest paid in Euro, 4. Total intermediate consumption in Euro, and the outputs are following 1. Total output crops and crop production in Euro, 2. Total output livestock and livestock products in Euro, 3. Other output in Euro. Other output is calculated as the sum of all other outputs. 113 2.3 Cross-correlation of DEA efficiency and ROA Once we have the estimated efficiency for each farm for each year, we calculate the sample correlation between D E A farm efficiency and return on assets (ROA). Denote r! > f the D E A efficiency of farm i in time / and ROAit return on assets of farm i in time t. Zt in,, - r\) (ROAiJ+h -ROAuh) C i,h = 7 77 (2) (n - l)srisRoAih where r; and ROAi^ are estimated means of rij and ROAij+h respectively, s^. and SROA/ sample standard deviations of r ! j f and ROAiJ+h respectively, n is the number of valid observations with respect to h which is the given lag. We consider all possible lags ranging from - 6 to 6, e.g. we compute Q,/j for all i and h = - 6 , . . . , 6. Naturally, for h approaching zero, we have significantly more observations than for very low or very high values of h. 3 Results Table 3 shows summary statistics for cross-correlation in individual delays. The quartile range (0.25 - 0.75) for lag = 1 is from -0.01 to 0.44. A similar result is obtained for lag = - 1 . The quartile ranges for the other lags are approximately in the range from - 0 . 2 to 0.1. The results show that efficiency and ROA are strongly contemporaneously correlated. Conversely, in the case of different lags, the correlation is very weak. Table 1 Cross-correlation summary statistics Lag (h) M i n . 1st Qu. Median Mean 3rd Qu. Max. -6 -0.66 -0.26 -0.09 -0.08 0.07 0.63 -5 -0.74 -0.31 -0.12 -0.12 0.04 0.49 -4 -0.60 -0.24 -0.06 -0.07 0.10 0.62 -3 -0.73 -0.23 -0.01 -0.04 0.14 0.60 -2 -0.80 -0.21 -0.01 0.00 0.24 0.74 -1 -0.77 -0.13 0.18 0.16 0.45 0.88 0 -0.91 0.42 0.67 0.57 0.82 0.98 1 -0.79 -0.01 0.26 0.21 0.44 0.82 2 -0.70 -0.20 0.01 -0.02 0.16 0.64 3 -0.70 -0.22 -0.04 -0.07 0.07 0.72 4 -0.74 -0.28 -0.10 -0.10 0.08 0.46 5 -0.50 -0.27 -0.10 -0.11 0.03 0.43 6 -0.54 -0.23 -0.10 -0.08 0.03 0.63 We show the estimated density of correlation between and D E A efficiency estimates and return on assets for individual farms for h = - 6 , . . . , 0 , . . . , 6 in Figure 3. Here, h < 0 represent the distribution of the D E A efficiency correlation at time t and ROA in the past - at time t + h. Conversely, h > 0 represent the distribution of the correlation for the D E A efficiency at time t and future ROA at time t + h. With zero lag, h = 0, a clear positive correlation between the two variables can be seen. Most farms achieve a correlation in the range from 0.5 to 1. This result is expected, because the inputs are strongly correlated with the ROA. Hence, D E A efficiency should be strongly correlated with ROA. However, in the case of any lag, / i ^ O w e can see a significant drop in estimated correlation. The positive, but still rather minor, correlation can be seen only for lag (-1,1). For higher absolute h, cross-correlation estimates are in the range from -0.5 to 0.5 and interquartile range from -0.3 to 0.2. The mean of correlation is negative for \h\ > 2, although we cannot reject the null hypothesis of 0. This is indeed a surprising result as we would expect a positive correlation. 114 Figure 1 Estimated density of correlation between and D E A efficiency estimates and ROA for various lag h. h = - 6 h = - 5 h = - 4 Correlation Correlation Correlation h = 0 115 4 Conclusion The topic of the work is the suitability of using the D E A method as an estimator of efficiency with regard to its frequent use in the two-step method. We start from the premise that more efficient companies should achieve better economic results in the medium and long term than inefficient companies. We specifically analyze the correlation between D E A efficiency of Czech farms and the ROA indicator. The results show that the correlation between D E A efficiency and ROA reaches a correlation above 0.5 only contemporaneously. In contrast, the cross-correlation for the delay h + 0 is very low. Acknowledgements The work was supported by the Czech Science Foundation under grant 20-17529S and by the Internal Grant Agency of Prague University of Economics and Business under Grant F4/34/2020. References [1] Davidova, S., and Latruffe, L.: Relationships between technical efficiency and financial management for czech republic farms. Journal of Agricultural Economics 5 8 (2007), 269-288. [2] Dyson, R. G., Allen, R., Camanho, A . S., Podinovski, V. V., Sarrico, C. S., and Shale, E . A . : Pitfalls and protocols in dea. European Journal of operational research 132 (2001), 245-259. [3] Emrouznejad, A . , and Yang, G.-L: A survey and analysis of the first 40 years of scholarly literature in dea: 1978-2016. Socio-Economic Planning Sciences 61 (2018), 4—8. [4] Fryd, L., and Sokol, O.: Relationships between technical efficiency and subsidies for czech farms: A two-stage robust approach. Socio-Economic Planning Sciences (2021), 101059. [5] Hladfk, M . : Universal Efficiency Scores in Data Envelopment Analysis Based on a Robust Approach. Expert Systems with Applications 122 (2019), 242-252. [6] Holy, V.: The impact of operating environment on efficiency of public libraries. Central European Journal of Operations Research (2020), 1-20. [7] Simar, L . , and Wilson, P. W.: A general methodology for bootstrapping in non-parametric frontier models. Journal of applied statistics 27 (2000), 779-802. 116 The geography of most cited scientific publications: Mixed Geographically Weighted Regression approach Andrea Furkova1 Abstract. The paper analyses spatialheterogeneity of top-level scientific publications of the European regions and try to answer the question which regions or groups of regions are the most innovative in this sense.We supposed that the responseof innovation output (most cited scientific publications) to a change on innovation inputs (R&D expenditure and human capital) might be not homogeneous across allEuropean regions. In addition, we hypothesize that there is still gap between post-socialist and "western" countries in terms of elite publications. Mixed geographically weighted regression (MGWR) model was used as a main tool for examining our research questions. M G W R model can produce parameter estimations that have global character and other parameters that have local character in accordance with observation location. W e found out that the both innovation input parameters vary significantly across the European area and the gap between post-socialist and "western" countries in terms of elite publications was confirmed. Keywords: Mixed geographically weighted regression, spatial heterogeneity, scientific publications, innovation JEL Classification: 031, R12 AMS Classification: 91B72 1 Introduction It is evident that the main objective of Research & Development (R&D) policy is to increase innovation outcomes. However, the problem arise when we want to measure the level of innovation activities and technological progress. Following the concept of the Regional Knowledge Production Function (RKPF) model (see e.g., [8]), two types of indicators are usually considered, i.e., technological innovation inputs and technological innovation outputs. Traditionally, R & D expenditure and human capital are recognized as significant innovation determinants. On the other hand, the number of patent applications, number of scientific publications and citations are accepted as innovation outputs. In this paper, we raise a different approach to evaluation of scientific activities. W e turn our attentionto the "elite publications", i.e., scientific publications that are among the top 10% most cited publications worldwide and we will consider it as a proxy for an innovation output of the region. These top-level publications are considered as a measure for the efficiency of the research systemas highly cited publications are assumed to be of higher quality. This indicator is also the part of a composite indicator, the Regional Innovation Index -RII (see [5]) which is oneof the few options forthe comparative assessment of the performance of European innovation systems at the regional level. This paper will try to analyse which European regions or groups of regions have the largest share in the production of the top-level scientific publications and therefore are the most innovative regions in this sense.Wesupposethat the responseof innovation output (most cited scientific publications) to a change on innovation inputs (R&D expenditure and human capital) might be not homogeneous across allEuropean regions. In addition, we hypothesize that there is still gap between post-socialist and "western" countries in terms of elite publications. Mixed geographically weighted regression (MGWR) model seems to be a suitable tool for eramining our hypotheses. This model is a combination of linear regression model and geographically weighted regression (GWR) model; therefore, M G W R model could produce parameter estimation that had global parameter estimation, and otherpara meter that had local parameter in accordance with its observation location. The structure of the paper is as follows: section 2 provides data and study area descriptions and brief theoretical backgrounds of the study; empirical results are presented and interpreted in section 3. Main concluding remarks contain section4 and the paper closes with references. 1 University of Economics in Bratislava. Faculty of Economic Informatics, Department of Operations Research and Econometrics, Dolnozemská 1/b, 852 35 Bratislava, andreafurkova@euba.sk. 117 2 Methodology The first part of this section provides an overview of a study area and description of the data. The second part of this section briefly introduces M G W R model relevant for the subsequent empirical analysis. 2.1 Data description and study area The real distribution of most cited scientific publications of European regions in 2019 is presented in Figure 1. The map shows 238 regions of 23 the E U member states, Norway, Serbia2 and Switzerland at different NUTS (Nomenclature of territorial units for statistics)levels, i.e., at NUTS 1 or NUTS 2 levels. Our innovation analysis will include 220 observations because of isolated observations (island regions); the data set reduction was done. According to RII, the regions presented in Figure 1, have been classified into four performance groups, i.e., groups of high, strong, moderate and low performers. Figure 1 indicates strong geographical performance differences. Scientific publications among the top-10 % most cited seems to be less spread within countries but more across countries. Many regions in Northern, Western and Central European countries such as Denmark, Norway, Sweden, Finland, Ireland, France, Germany, Belgium, and Austria are ranked as strong performers. Elite scientific publications are produced by the United Kingdom, Switzerlandand the Netherlands, where the majority of regions consist of high performers. While, there might be a relatively small variety among regions in many European countries, for instance Greek regions show the highest level of variety with regard to the top 10% most cited publications. There can be found a high performer region and also alow performer regions. It is interesting to mention a Portuguese autonomous region Madeira which is the only Southern European region represented in the top 10 group of European regions. Different trend can be seen as for the rest of Southern European regions and Eastern European regions. Theseregions are usually classified as low ormoderate performers. Figure 1 The distribution of scientific publications among the top 10% most cited publications worldwide Source: author's elaboration based on the RIS 2019 [5] It is obvious that the geographicalposition of the region and properties of the neighbourhood occupying akey role for creation of innovation in given areas. The geographical aspect also plays an important role in case of production of scientific publications, especially with regard to the most valued scientific publications. W e hypothesize that the response of innovation output (most cited scientific publications) to a change on innovation inputs (R&D expenditure andhuman capital) might be not homogeneous across all European regions and we assume that there is still a gap between post-socialist and "western" countries in terms of elite publications. Thus, the problem of s patial heterogeneity as one of the spatial effects may be present in connection with the modelling of regional innovation activities. For this reason, we decided to apply M G W R model, which provides local parameters estimations while some parameters may be global. The next section briefly introduces this model. 2 For Serbia, official NUTS codes are not yet available and therefore unofficial codes will be used (see [5]). 118 Table 1 gives a description of all variables (innovation output and innovation inputs) used in ourempirical analysis. The selection of the data was influenced by the fact that at regional levels, the relevant data are limited. A brief reasoning for inclusion of the variables under consideration is provided in Table 1. Data related to variables PUB and EXP are obtained from RIS 2019 [5]. Variable HRST comes from regional Euros tat statistics database [9]. Category Definition Form Abbr. Dependent rariaMe: Scientific publications among the top 10% most The number of scientific publi- Normalized3 PUB cited publications worldwide cations among the top 10 % most cited publication worldwide per total number of scientific publications. Explanatory rariables: R & D expenditure in the public sector A l l R & D expenditures in the Normalized EXP Reasoning: government sector and the R & D expenditure represents one of the major driv- higher education sector as perers of economic growth in a knowledge-based centage of GDP. economy. R & D spending is essential for making the transition to a knowledge-based economy as well as for improving production technologies and stimulating growth. Human resources in science and technology HRST as percentage of active Normalized5 HRST (HRST4 ) population. Reasoning: For scientific activities, human resources represent a knowledge base, which is a source of ideas. Regions of post-socialist countries (PSOC) DUMPS0C =1, if region belongs Binary Reasoning: t 0 p s o c o t h e r s = 0 Former political system of the country may influence innovative activities of the regions. Table 1 Model variables description 2.2 Mixed Geographically Weighted Regression Model First, let us pay a brief attention to the Geographically Weighted Regression (GWR) model. The goal of G W R methodology is to obtain local linear regression estimates for each point in the space, i.e., for each observation i e {1,...,N} we deal with different vector of local parameters ^ ( w ^ v j . Coordinates (w,,v; ) represents the longitude and latitude of observation i. G W R method requires the spatial kernel function and its bandwidth selection. Next, N dimensional diagonal weight matrix W,is constructed such that W, = K (d,., h), where K( ) is a spatial kernel function, d, is a distance vectorbetween the central point and all neighbours, and his a bandwidth or decay parameter (seee.g., [2]).The G W R model can be expressed as: y. = f3v (ui,vi;h)Xv +et, V7 e {l,...,N), (1) where y is a vector of dependent variable, h is a bandwidth parameter that allows to define the local subsampb around the coordinates of each point (w,,v; ) using a given distance kernel K( ), Xv represents ky explanatory variables with spatially varying coefficients (/?v ) and s is an error term. The parameters of the G W R model are estimated by the weighted least squares approach and the estimation of the parameters in each location i is given by (see, e.g., [6]; [4]; [1]): 3 The data pro videdby RIS 2019 are already normalized T he minmax procedure was used and the maximum normalised score is equal to 1 and the minimum normalised score is equal to 0. For more details regarding normalising RII data see [5]. 4 HRST are people who fulfil one or other of the following conditions: (1) have successfully completed a tertiary level education; (2) not formally qualified as above but employed in a scientific and technical occupation where the above qualifications ate normally required (see [10]). 5 The minmax procedure was used. DUMPS0C 119 j3(ul,vl) = (xT WlX)' (xT W,y) (2) G W R model defined by formula (2) seems to be not sufficient for socioeconomic variables that have global effects and are independent from individual location. In addition, it appears inadequate for local categorical variables, since spatially varying parameters associated with such variables may have no meaning. For such situations, mixed G W R (MGWR) model was developed. This model can be formulated as follows [4]: where Xc represents kc explanatory variables with constant coefficients (/3C ), and Xv represents ky explanatory variables with spatially varying parameters ( fiv). It should be noted that k = kc +ky . A l l remaining terms of model (3) were defined above. Already Fotheringham et al. [3] dealt with the issue of M G W R estimation defined in (3). They proposed seven-step estimation; however, this approach has appeared somewhat intensive in terms of computation. In [7], less demanding, a two steps methodology based on partial linear models can be found. Geniaux and Martinetti also used this methodology and we will follow this approach in our empirical analysis. For more details, see [4]. The distribution of innovation processes (represented by most cited scientific publications worldwide) across the European regions suggests that, the strength of the influence of particular determinants of innovation may vary in given locations. Thus, the problem of spatial heterogeneity should be considered. Next, we will assume that the expected value of most cited scientific publications is a function of R & D expenditure in the public sector and Human resources in science and technology. In addition, we hypothesize that there is still a gap between postsocialist and "western" countries in terms of elite publications. For this reason, the model includes global dummy variable reflecting political history of the region and global interactive dummy variable with human resources variable. The process of estimating of a M G W R model starts with weighting scheme selection. W e decided for Gaussian weighting scheme with fixed6 bandwidth parameter h (see Table 2) calibrated by cross-validation optimization procedure. The selected results of M G W R estimation (minimum, lower quartile, median, mean, upper quartile, maximum) in comparison to OLS estimation are presented in Table 2. The evidence for spatial heterogeneity is already supported by basic statistics. For instance, parameter estimate of HRST (Human resources in science and technology) is varying even from negative values -0.0361 up to 0.6970 with median value 0.2460, while the global OLS parameter estimate is 0.3320. The minimum and maximum values of estimated M G W R parameters indicate how varied the influence of a given innovation input may be in a particular region. A s for global OLS estimation, we can see that all parameters are statistically significant except the parameter associated with EXP variable. This was unexpected but at the same time, we can see based on the local regressions that this factor was significant in up to 54.55% of cases. Consequently, R & D spending in public sector seems to be essential for producing and improving regional scientific activities. The M G W R estimation results reveal that the most important determinant of most cited publications is HRST variable. Its significance was confirmed in 97.27% of cases. Based on the M G W R model, we also examined the differences between post-socialist and "western" countries in terms of elite publications and the question whetherthe effect of human resources varies by political history of the region. The results verified our assumptions that the political history of the region still matter and that the effect of human resources varies by political history of the region. Both global dummy variables are statistically significant and they have expected negative signs. The overview of the estimated local parameters is presented in the form of box and significance maps in Figure 2 and Figure 3. 6 According to preliminary estimates and analyses, we concluded to prefer the model with fixed weighting scheme to the model with adaptive one. yi = PCXC + J3V (ui,vi;h)Xv +ui, Vie {!,..., N} (3) 3 Empirical Results 120 M G W R (Gaussian kernel functions with fixed bandwidth, A=3.793) Global (OLS)Min. First Quart ile Median Mean Third Quartile Max. Percent of significant cases at 95 % Global (OLS) 0.0353 0.3218 0.3413 0.3310 0.3563 0.4704 100% 0.3556 (0.0000) ft (EXP) -0.1363 -0.0034 0.0855 0.0713 0.1220 0.4171 54.55% 0.0001 (0.9976) (32 (HRST) -0.0361 0.1442 0.2460 0.2704 0.3762 0.6970 97.27% 0.3320 (0.0000) &(DUMPSOC) -0.1465 (0.0260) - -0.1504 (0.0000) Pt (DUMPSOCHRST > -0.0848 (0.5332) - -0.2157 (0.0032) A I C -476.852 -395.826 R2 - 0.6701 Table 2 Summary of M G W R estimation Note: /^-values in parenthesis. Source: own calculations in RStudio Figure 2 Human resources in science and technology: box map of local parameter estimates (left) and significance map (right) Figure 3 R & D expenditure in the public sector: box map of local parameter estimates (left) and significance map (right) 121 4 Conclusion In this paper, M G W R approach has been exploited as a tool for analysing spatial heterogeneity of top -level scientific publications of the European regions. Based on the M G W R estimation, we found out that the both innovation input parameters vary significantly across the European area. The greatest effect of R & D expenditure (highest parameter values, see Figure 3 (left)) is evident for regions that are classified as low and moderate performers in terms of most cited publications (see Figure 1). These are mainly regions with post-socialistic history, regions of Scandinavian countries, and regions of Spain, Greece and south Italy. Almost opposite situation applies to the human resources variable as we recorded low parameter values (see Figure 2 (left)) for already mentioned regions and high parameter values are evident for high and strong performers in terms of most cited publications. In addition, we found out that the effect of human resources varies by political history of the region. Despite the fact that the countries of Central and Eastern European countries have undergone a difficult process of post -socialist transformation, theseregions are still lagging. Our results invoke local R & D policy implication notregion wide policy implication. Acknowledgements This work was supported by the Grant Agency of Slovak Republic - V E G A 1/0193/20 "Impact of spatial spillover effects on innovation activities and development o f E U regions" and VEGA 1/0211/21 "Econometric Analysis of Macroeconornic Impacts of Pandemics in the World with Emphasis on the Development of E U Economies and Especially the Slovak Economy". References [1] Anselin,L. & Rey, S.J. (2014). Modern Spatial Econometrics in Practice. Chicago: GeoDa Press LLC. [2] Chocholata, M . (2020). Spatial Variations in the Educational Performance in Slovak Districts. Statistics and Economy Journal, 100(2), 193-203. [3] Fotheringham, A.S., Brunsdon, C , & Charlton, M . (1999). Some notes on parametric significance tests for geographically weighted regression. Journal of Regional Science, 39(3), 497-524. [4] Geniaux, G. & Martinetti, D. (2018). A new method for dealing simultaneously with spatial autocorrelation and spatial heterogeneity in regression models. Regional Science and Urban Economics, 12, 74-85. [5] Hollanders, H , Es-Sadki, N . & Mekelbach, I. (2019). Regional Innovation Scoreboard 2019. [online] Available at: https://ec.europa.eu/growth/sites/growth/files/ris2019.pdf [Accessed 1 Mar. 2020]. [6] LeSage, J. P. (1999). The theory and practice of spatial econometrics. Available at: http://www.spatial-econometrics.com/html/sbook.pdf. [Accessed 20 Feb. 2018]. [7] Mei, C , He, S. & Fang, K. (2004). A note on the mixed geographically weighted regression model. Journal of Regional Science,44(\), 143-157. [8] Moreno, R., Paci, R. & Usai, S. (2005). Spatial Spillovers and Innovation Activity in European Regions. Environment and Planning A: Economy and Space, 37(10), 1793-1812. [9] http://ec.europa.eu/eurostat/. [Accessed 1 Mar. 2020]. [10] https://ec.europa.eu/eurostat/cache/metadata/en/hrst_esms.htm.fAccessed 1 Mar. 2020]. 1 2 2 Bilevel Linear Programming under Interval Uncertainty Elif Garajova1 , Miroslav Rada2 , Milan Hladik3 Abstract. Bilevel linear programming provides a suitable mathematical model for many practical optimization problems. Since the real-world data are often inaccurate or uncertain, we consider the model under interval uncertainty, in which only the lower and upper bounds on the input data are available and we assume that the uncertain coefficients can be perturbed independently within the given intervals. Building on the theory of interval optimization and bilevel linear programming, we study the basic properties of bilevel interval linear programs from both a theoretical and a computational point of view. In our study, we focus on the main problems solved in interval optimization, such as computing the range of optimal values, checking the existence of feasible and optimal solutions and testing unboundedness of a scenario in the interval program. Keywords: bilevel programming, interval uncertainty, optimality J E L Classification: C44, C61 A M S Classification: 90C70 1 Introduction Throughout the recent years, bilevel programming models [5, 2] have been successfully applied in solving a wide range of practical optimization problems. In such models, we consider a hierarchical structure of decision making consisting of two levels represented by two nested optimization problems—the leader (upper-level) problem and the follower (lower-level) problem. Mathematically, we solve a problem in the form min f{x, y) x,y s.t. ix,y)€X, y e argmin{g(x,y) s.t. (x,y) e Y}, y for the given constraint sets X, Y. In this paper, we focus on the bilevel programming models with a linear objective function and linear constraints on both levels [1, 9]. Although this is perhaps the easiest special case, it is still difficult to tackle and several decision problems related to bilevel linear programming were, in fact, proved to be NP-hard [6]. Since uncertainty is an ever-present issue in real-world optimization problems, attention has also been devoted to exploring bilevel models with inexact, vague or imprecise data. Here, we adopt the approach of interval programming, assuming that only lower and upper bounds on the inexact data are known and that the values can be perturbed independently within these bounds. While the topic of single-level linear programming with interval data is quite well-studied (see e.g. [4, 15, 10] and references therein), only a handful of works are available for bilevel interval programming problems [11, 12, 14]. We derive several results on the theoretical and computational properties of bilevel interval linear programs. First, we build on and revise the former results [3, 13] on computing the best and the worst value, which is optimal for some scenario of the interval problem (this is also known as the problem of computing the optimal value range). Then, we prove that the decision problems of checking existence of feasible and optimal solutions are NP-hard for bilevel interval linear programming. Furthermore, we also show NP-hardness of checking unboundedness of at least one scenario of the interval program. 1 Charles University, Faculty of Mathematics and Physics, Dept. of Applied Mathematics, Malostranské nám. 25, Prague, Czech Republic; Prague University of Economics and Business, Dept. of Econometrics, nám. W. Churchilla4, Prague, Czech Republic, elif@kam.mif.cuni.cz 2 Prague University of Economics and Business, Dept. of Econometrics & Dept. of Financial Accounting and Auditing, nám. W. Churchilla 4, Prague, Czech Republic, miroslav.rada@vse.cz 3 Charles University, Faculty of Mathematics and Physics, Dept. of Applied Mathematics, Malostranské nám. 25, Prague, Czech Republic; Prague University of Economics and Business, Dept. of Econometrics, nám. W. Churchilla 4, Prague, Czech Republic, hladik@kam.mif.cuni.cz 123 2 Bilevel Interval Linear Programming Let us first introduce the essential notions and notation of interval programming and bilevel programming used throughout the paper. Note that all inequality relations on the set of matrices and vectors are understood element- wise. Interval data. Let the symbol IK. denote the set of all closed real intervals. Given two real matrices A, A e R m x " satisfying A < A, we define an interval matrix A e I R m x " as the set A = [A, A] - {A € R m x n :A0 s.t. A\x + B\y >b\, (1) y e a r g m i n { a r y s.t. B2y > b 2 - A2x), y>Q where the coefficients of the bilevel program belong to the respective intervals (Ai e A i , A2 e A 2 etc.). A specific bilevel linear program in the set is called a scenario. For simplicity of notation, we also write the former interval problem in the concise form with interval coefficients as min c r x + d r y x,y>0 s.t. A i x + B i ; y > b i , (2) y e argmin{ar y s. t. B 2 y > b 2 - A 2 x } . y>Q Dependency problem. Note that the formulation (2) of a bilevel interval linear programming problem is not the most general, since we only consider constraints expressed by inequalities with non-negative variables on both levels. When dealing with interval coefficients in the programs, it is not always possible to apply the standard transformations to convert a given problem into the desired form and programs in different forms may have to be treated separately (see [8] for details). Feasibility and optimality. For a given solution (x, y) e R " i + " 2 to be feasible for the bilevel program (1), we need to ensure that it satisfies both the upper-level and the lower-level constraints, i.e. that it belongs to the constraint region S = {(x,y) € R " 1 + " 2 : AlX + B i y > bu A2x + B2y > b2, x,y > 0}. Furthermore, the vector y has to be a rational response of the follower to the leader's choice x (i.e., the vector y has to be an optimal solution of the lower-level problem for the fixed x): M ( x ) = {y € R " 2 : y € a r g m i n { a r y s.t. B2y > b2 - A2x, y > 0 } } . Then, the set of all bilevel feasible solutions, also known as the inducible region, is the set of all pairs (x, y) e S such that y e M{x). For the interval programming problem, we consider the feasible and optimal solutions in the weak sense: a given solution (x, y) e R " 1 + " 2 is (weakly) feasible, i f it is a feasible solution for at least one scenario of the bilevel interval linear program. Analogously, a given (x, y) is (weakly) optimal, i f it is optimal for some scenario of the BILP. 124 Example 1. Consider the bilevel interval linear program mm —x—y x,y>0 s.t. y e argmin{ [-1, l]y s.t. 3x - 2y < 12, 2x + y < 15, -3x + 5y < 10}. y>Q (3) The polygon forming the constraint region of problem (3) is depicted in Figure 1. To find the weakly optimal solutions, we need to examine the possible scenarios of the interval program. In this case, there is only a single interval coefficient in the lower-level objective ay = [-1, l]y. We can observe that for any choice of the objective coefficient a e [-1,0), the optimal solution of the follower's problem will be on the upper boundary of the polygon. Similarly, for the values a e (0,1], the optimal solutions (and thus also the points of the inducible region) lie on the lower boundary. For a - 0, the entire polygon is optimal. The corresponding optimal solutions from the inducible regions in the 3 cases are (5,5), (6,3) and (5,5), respectively. Therefore, the weakly optimal solution set of B I L P (3) is {(5,5), (6,3)}, with the best optimal value - 1 0 and the worst optimal value - 9 . 1 2 3 4 5 6 Figure 1 The constraint region of bilevel interval linear program (3). The bold lines depict the inducible region of the program for the lower-level objective a < 0 (left) and a > 0 (right), the square vertices represent the optimal solutions. 3 Properties of Bilevel Interval Linear Programs Optimal value range. One of the main problems solved in all areas of interval programming is the so-called optimal value range problem, whose goal is to compute the best and the worst possible value that is optimal for some scenario of the given interval program. In the previous works [13], the authors proposed to compute the optimal values for a restriction of scenarios, for which some of the choices of interval coefficients are fixed: mm x,y>0 as the best-value program, and, mm x,y>0 c r x + dj y s.t. Aix + Biy > b_x (4) y e argmin{ar y s. t. B2y > b2 ~ ^2*}> y>Q —J c x + d y s.t. A,x + B,y > b\ (5) y e argmin{ar y s.t. B2y > b2 - A_2x}, y>Q as the worst-value program. The idea is similar to the computation used in single-level interval linear programming [4] and exploits non-negativity of the variables to find the extremal values. It should, however, be noted that these scenarios do not yield the best possible and the worst possible optimal value of the BILP, in general. This discrepancy is caused by the fact that expanding or reducing the lower-level feasible set may be beneficial or detrimental to the leader's objective value, depending on the specific objective function. We illustrate the issue through the following example. 125 Example 2. Consider two instances of the bilevel interval linear program (note that both x and y are onedimensional) min dy x,y>0 S. t. X < 2, (6) y € argmin{ -y s. t. x - y > -2, -2x - y > [-8, - 5 ] }, y>0 where the coefficient d in the upper-level objective function is either o ! = l o r a ! = - L The two extremal scenarios of the interval program (with right-hand-sides - 8 and - 5 ) are depicted in Figure 2. Let us first examine the instance with d = - 1 , i n which the upper-level objective is the same as the lower-level objective. In this case, the optimal solutions for the two extremal scenarios are (2,4) and (1,3), respectively, with the optimal values - 4 and - 3 . The best optimal value was achieved with the largest constraint set and the worst optimal value with the smallest constraint set, as proposed in (4) and (5). However, we will see that this is not always the case. Consider now the instance with d = 1, where the upper-level and lower-level objectives are opposite. Here, the optimal solutions for the two extremal scenarios are (0,2) and (2,1), respectively. The best optimal value 1 was achieved in for the smallest constraint set, while the worst optimal value 2 was achieved for the largest constraint set. 1 2 3 4Figure 2 The two extremal scenarios of B I L P (6) with the largest constraint set (left) and the smallest constraint set (right). The inducible regions are highlighted by the thick black line. Example 2 shows that the choice of the lower-level coefficients in computing the best or the worst optimal value cannot be pre-determined, since it depends on the specific objective function considered in the problem. However, at the upper level, the choices can be made as proposed in the former work. Proposition 1. The best optimal value of BILP (2) can be computed as the best optimal value of the bilevel interval linear program min cT x + dT y *,y>0 s.t. Aix + ~B~iy>bv (7) y € argmin{ar y s.t. B2V > D2 - A2X}. y>0 Proposition 2. The worst optimal value of BILP (2) can be computed as the worst optimal value of the bilevel interval linear program min c x + d y x,y>0 s.t. A^x + B^y > b\, (8) y e argmin{ar y s. t. B2V > l>2 - A2X}, y>0 Both results can be proved in the same way as for single-level interval linear programming problems (see e.g. [4]), using non-negativity of the variables. 126 Computational complexity. Let us now examine the computational complexity of some decision problems related to bilevel interval linear programming. It is interesting to note that the form of a B I L P considered in this paper (inequality constraints with non-negative variables) turned out to be the easiest in single-level linear programming in the sense that several of the generally NP-hard problems are easily solvable for programs in this special form. However, since bilevel linear programming is difficult even with real coefficients and no uncertainty present in the model, there is little hope that the considered interval problems would be easy to solve. Indeed, we show that this is the case. First of all, let us consider the feasible and optimal solutions. A natural question is to ask whether a given B I L P even has a feasible (or optimal) solution for at least one scenario. It can be observed that this question leads to an NP-hard problem, in both cases, because checking the existence of an optimal solution is already NP-hard for the single-level interval linear programs [7]. Thus, we can prove NP-hardness of the considered decision problems simply by taking a B I L P in the form min 0 r x + 0 r y x,y>0 T (9) s. t. y e argmin{a y s. t. B2y > D2 - Ox}. y>0 Now, B I L P (9) has a feasible solution i f and only if the lower-level interval linear programming problem has an optimal solution. This yields the following result: Proposition 3. Checking whether there exists a weakly feasible solution (or a weakly optimal solution) of a bilevel interval linear program is an NP-hard problem. Furthermore, we can also show that checking whether some feasible scenario of the B I L P has an unbounded objective value is an NP-hard problem, as well. Again, we utilize a reduction from the problem of checking the existence of an optimal solution to a single-level interval linear program. In this case, we consider a B I L P in the form min - x x,y>0 , T , (10) s. t. y e argminja y s. t. B2y > D2 - Ox}, y>0 which is unbounded if and only if it is feasible, i.e. i f and only i f the lower-level program has an optimal solution. Thus, we obtain the desired result. Proposition 4. Checking whether there exists a feasible scenario, in which the value of the upper-level objective function is unbounded, is an NP-hard problem. Although all three of the considered decision problems are NP-hard in the general case, there still may be special classes of BILPs (that are not fully interval), for which at least some of the problems can be efficiently solved. 4 Conclusion We examined some of the basic properties of the bilevel linear programming problem with interval data. For the optimal value range problem, we have shown through a counterexample that the formerly proposed method does not always consider the scenarios yielding the best and the worst optimal values and we have revised the derived results accordingly. From a complexity-theoretical point of view, we have proved that three of the decision problems connected to the properties of bilevel interval linear programs are NP-hard, namely the problem of checking the existence of a weakly feasible or a weakly optimal solution and the problem of checking unboundedness of at least one scenario of the program. Acknowledgements The authors were supported by the Czech Science Foundation under Grant P403-20-17529S. E . Garajová and M . Hladík were also supported by the Charles University project G A U K No. 180420. 127 References [1] Bard, J. F.: A n Efficient Point Algorithm for a Linear Two-Stage Optimization Problem. Operations Research 31 (1983), 670-684. [2] Bialas, W., and Karwan, M . : On two-level optimization. IEEE Transactions on Automatic Control 27 (1982), 211-214. [3] Calvete, H . I., and Galé, C.: Linear bilevel programming with interval coefficients. Journal of Computational and Applied Mathematics 236 (2012), 3751-3762. [4] Chinneck, J. W., and Ramadan, K.: Linear programming with interval coefficients. J Oper Res Soc 51 (2000), 209-220. [5] Dempe, S., Kalashnikov, V., Pérez-Valdés, G . A . , and Kalashnykova, N . : Linear Bilevel Optimization Problem. In: Bilevel Programming Problems: Theory, Algorithms and Applications to Energy Networks (Dempe, S., Kalashnikov, V., Pérez-Valdés, G . A . , and Kalashnykova, N . , eds.), Energy Systems. Springer, Berlin, Heidelberg, 2015, 21-39. [6] Deng, X . : Complexity Issues in Bilevel Linear Programming. In: Multilevel Optimization: Algorithms and Applications (Migdalas, A . , Pardalos, P. M . , and Värbrand, P., eds.), Nonconvex Optimization and Its Applications. Springer U S , Boston, M A , 1998, 149-164. [7] Garajová, E., and Hladík, M . : Checking weak optimality and strong boundedness in interval linear programming. Soft Computing 23 (2019), 2937-2945. [8] Garajová, E., Hladík, M . , and Rada, M . : Interval linear programming under transformations: Optimal solutions and optimal value range. Cent Eur J Oper Res 27 (2019), 601-614. [9] Hansen, P., Jaumard, B . , and Savard, G . : New Branch-and-Bound Rules for Linear Bilevel Programming. SIAM Journal on Scientific and Statistical Computing 13 (1992), 1194-1217. [10] Hladík, M . : Optimal value range in interval linear programming. Fuzzy Optim Decis Making 8 (2009), 283-294. [11] L i , H . , and Fang, L.: A n Efficient Genetic Algorithm for Interval Linear Bilevel Programming Problems. In: Ninth International Conference on Computational Intelligence and Security. 41—44. [12] L i , H . , and Fang, L . : A n Evolutionary Algorithm Using Duality-Base-Enumerating Scheme for Interval Linear Bilevel Programming Problems. Mathematical Problems in Engineering 2014 (2014), e737515. [13] Mishmast Nehi, H . , and Hamidi, F : Upper and lower bounds for the optimal values of the interval bilevel linear programming problem. Applied Mathematical Modelling 39 (2015), 1650-1664. [14] Ren, A . , and Wang, Y . : A cutting plane method for bilevel linear programming with interval coefficients. Annals of Operations Research 223 (2014), 355-378. [15] Röhn, J.: Interval linear programming. In: Linear Optimization Problems with Inexact Data (Fiedler, M . , Nedoma, J., Ramík, J., Röhn, J., and Zimmermann, K , eds.). Springer U S , Boston, M A , 2006, 79-100. 128 An Efficiency Comparison of the Life Insurance Industry in the Selected OECD Countries with Three-Stage DEA Model Biwei Guan 1 Abstract. In this paper, we use the three-stage data envelopment analysis model to evaluate the efficiency score of 12 life insurance markets from O E C D and make a comparison of them. In the first stage, we used the basic D E A model, and in the second stage, we use stochastic frontier analysis slack regression to remove the impact of environmental effects and statistical noise on the efficiency score. After the adjustment according to the second stage, we recalculate the efficiency score of each market. We find the environmental factors have little effect on the German, Ireland and Italy life insurance market, they perform great. But the environmental factors have a heavy effect on Belgium, Greece and Hungary. After removing the influence of environmental factors, the technical efficiency of Belgium, Greece, and Hungary decreased sig­ nificantly. Keywords: Panel data, Three-stage D E A model, Life-insurance, O C E D countries J E L Classification: C67, G15, G22 AMS Classification: 60H99 1 Introduction and Literature Review In recent years, efficiency measurement related to the insurance industry was popular and attracted many regulators' and investors' attention. Eling and Luhnen [7] mentioned from 2000 to 2010, more than 90 studies focused on the efficiency measurement of insurance industry. Kaffash et al. [13] pointed out that from 1993 to 2018, 132 studies on the application of D E A in insurance industry were published. Nowadays, the number of research on this topic has continued to grow. For the measurement of efficiency, there are two main methods: stochastic frontier analysis (SFA) and data envelopment analysis (DEA). In the beginning, Farrell [9] introduced the basic D E A model to evaluate the efficiency of modern companies; on this basis, Charnes et al. [2] and Banker et al. [1] introduced the C C R model and B C C model respectively. Later, some researchers pointed out that the traditional D E A model ignored the influence of environmental effects and statistical noise on decision-making units (DMUs). Fried et al. [10] proposed a three-stage D E A model to eliminate the influence of the above two factors. The purpose of this paper is to compare the efficiency of 12 O E C D life insurance industries from 2013 to 2019 through the three-stage D E A model. In the second stage, the stochastic frontier analysis (SFA) slack regression is helped to eliminate the influence of environmental effects and statistical noise on decision-making units, to obtain more accurate efficiency score. This paper is divided into four sections. Section 2 is data and methodology, here we will introduce the specific information of the three-stage D E A model, and explain the source of the data and why it was chosen. Section 3 is the empirical results, in this part, the efficiency score of each insurance markets will be shown, and the comparison of these results. The last part is the conclusion, which includes the main contribution and key findings of this paper. In previous efficiency studies, the types and numbers of insurance markets selected are different. Diacon [4] applied the D E A model to analyze the technical efficiency (TE) of the 6 O E C D general insurance markets; Diacon et al. [5] studied the pure technical efficiency (PTE) and scale efficiency (SE) of 15 O E C D life insurance markets through the D E A model; Davutyan and Klumpes [3] studied the T E , P T E and S E of 7 O E C D life insurance markets and non-life insurance markets under the D E A model. Huang and Eling [12] pointed out that the relationship between solvency and efficiency of insurance firms is positive; Hardwick et al. [11] mentioned that the cost efficiency (CE) of the life insurance companies is directly proportional to the existence of audit committees and inversely proportional to the existence of external directors; Yakob et al. [15] concluded that risk management is significantly related to investment management efficiency (IME). Therefore, efficiency measurement plays an important role in the analysis of the insurance industry and is the basis of all deeper analyses. In this paper, we will calculate the value of TE, PTE, and S E of 12 selected O E C D life insurance industries. In the second stage of the D E A model, in addition to the inputs and outputs variables required in the traditional model, we also need to 1 VŠB - Technical University of Ostrava, Department of Finance, Sokolská tř. 33, Ostrava 70200, Czech Republic, biwei.guan@vsb.cz 129 select environmental variables. Huang and Eling [12] also selected the ratio of shareholder equity to assets, liabilities to liquid assets ratio, and premiums to surplus ratio as indicators of the insurance market regulatory environment. In the existing research, the choice of input variables and output variables are also very different. Kaffash et al. [13] found that as output variables, "premiums" accounted for 50.82%, "losses and incurred losses" accounted for 22.13%, and "investment income" accounted for 21.31%, while as input variables, "the number of employees", "capital debt", "equity capital" and "materials and business services" accounted for 60.72%, 49.18%, 37.7%, and 32.79% respectively. The variables we selected are slightly different from the above variables, which we will explain in detail in the next chapter. 2 Data and Methodology 2.1 Data This paper selects the relevant data of 12 O E C D life insurance markets from 2013 to 2019, the data mainly from O E C D (2014-2020). The input variables generally used in the previous studies are mainly divided into three categories, namely labor input, capital input, and other material input. (Eling and Jia [6]; Eling and Schaper [8]; and Eling and Luhnen, [7]). However, we did not find the number of employees only in the life insurance market, thus, in this paper, the number of companies, debt capital and equity capital are chosen as the input indicators. When considering output variables, we can consider the social functions of insurance industry. One of the very important function undertaken by insurance is risk protection. In this paper, we choose the sum of net income plus gross technical provisions and the total investments as the output variables. In the selection of environmental variables, we consider the macroeconomic environment of the whole market and the industry environment of the life insurance market itself. We choose the growth of G D P , insurance density and market share as related variables. In the Three-stage D E A model, we will use S F A slack regression in the second stage to remove the impact of environmental effects and try to adjust the selected D M U s to the same external environment. Table 1 presents the sample summary statistics. From Table 1, we can see that in the input variables, the degree of dispersion of the debt capital is very large; in the output variables, the degree of dispersion of both is very large; in the environment variables, the degree of dispersion of insurance density is largest. Through the observation of the results in Table 1, it is not difficult to find that the maximum values of most indicators are far higher than their average values. If we observe the original data, we will find that this is because the input and output values of the German life insurance market are much higher than those of other markets, which is also the main reason for the high dispersion of various indicators. A t the same time, we also find that equity capital is much lower than debt capital, while the two output variables are close. Unit Min Mean Max Std. dev. Panel A : Input variables Number of companies 1 2 35.10 93 24.61 Debt capital Million U S dollars 2761 215989 1264290 329010 Equity capital Million U S dollars 167 6927.70 25254 7321.54 Panel B: Output variables Total investments Million U S dollars 1407 197433 1427913 341112 Net income + Technical provisions Million U S dollars 2225 194739 1206903 304032 Panel C: Environmental variables Insurance density US dollars 141 4953.74 48768 10536 Market share % 0.10 1.41 8.40 1.74 Growth of G D P % -3.241 2.556 25.163 3.203 Number of observations 672 Table 1 Summary of Sample Statistics of 12 O E C D Life Insurance Industries 2.2 Three-Stage DEA Model Data envelopment analysis is suitable for the evaluation of complex multi-output and multi-input problems. We can use D E A model to calculate many kinds of efficiency scores. In this paper, we mainly focus on T E , P T E and S E of the life insurance industry. Table 2 describes these three kinds of efficiency in detail. 130 Term Description Decomposition Technical Efficiency T E reflects the ability of a manufacturer to maximize output under a given input, the returns to scale are fixed. T E = S E x P T E Pure Technical Effi- P T E reflects the production efficiency of the inputs of the D M U P T E = T E / S E ciency at the optimal scale, the returns to scale can be changed. Scale Efficiency S E reflects the gap between the actual scale and the optimal production scale. S E = T E / P T E Table 2 D E A Efficiency Terms The First Stage: Calculate Efficiency using Unadjusted Input or Output Variables In this paper, we select the input-oriented B C C model (Banker et al. [1]) and input-oriented C C R model (Charnes et al. [2]) to calculate the required efficiency. The assumption of the C C R model is that in the production process, the scale return is fixed. When the input changes in proportion, the output should also change in proportion. For the input-oriented C C R model, there are the following constraints: min 8 (1) v./. 2!' / /.v, <8x„, (2) 2 " / ' - . V ' . ^ (3) kj >0,i = 1, 2, ..., n;j = 1, 2, 3, n; r = 1,2, n (4) where Xy represents the i-th inputs on the j-th D M U , yri represents the r-th outputs on the j-th D M U , they are scalar vectors, here are three inputs, two outputs and 12 D M U s ; Xj is a scalar vector, and 8 is an input radial measure of technical efficiency. Among them, the optimal solution is 8 *, 1-6 * represents the maximum input that can be reduced without reducing the output level at the current technical level. A larger 8 * means a smaller amount of input can be reduced, which means higher efficiency. When 0 * = 1, it means that D M U is in a technical effective state currently. B C C model has almost the same constraints as the C C R model. The only difference is that in the B C C model, there is also a constraint on X, which can basically ensure that manufacturers of similar size are compared with manufacturers that are not valid, rather than manufacturers with large gaps. The constraint is as follows: The Second Stage: Adjusting Input or Output Variables with SFA Slack Regression When using S F A slack regression to regress the slack variables in the first stage, we need to consider whether to adjust the input and output variables at the same time or to adjust only one of them. Fried et al. [10] proposed that this depends on the type of oriented we choose in the first stage. In this paper, we choose the input-oriented, so in the second stage, we only adjust the input variables. In addition, Fried et al. [10] mentioned that we should perform a separate regression for each different slack variable, which allows environmental variables to have different effects on different slack variables. We can construct the following S F A slack regression functions: Sni= f (2l;Bn) + v n i + pni (6) 1=1,2,...,!; n=l,2,...,N (7) where Sni is the slack value of n-th inputs on i-th D M U ; Zt represents the environmental variables, /?„ represents the coefficient of environmental variables; vni represents the statistical noise and fini represents the managerial inefficiency. v~N (0, av 2 ) is the random error term, it can represent the influence of statistical noise on input slack variables; //~./V+ (0, cr^2 ) can represents the influence of managerial inefficiency on input slack variables. As mentioned earlier, using S F A slack regression helps to eliminate the influence of statistical noise and environmental effects. Therefore, we need to adjust the input variables. The adjustment formula is as follows: = Xm + [max ( / (Z£ ; /?„)) - / (Z£ ; /?„)] + [max(vni) - vni] (8) i=l,2,...,I; n=l,2,...,N (9) where represents the adjusted input variables; Xni is the original input variables; [max (f (Zj,'/?„)) -/(^<'7^„)] represents the adjustment of input variables based on the environmental effects; [max(vnj)-vni] represents the adjustment of input variables based on the statistical noise. To calculate the statistical noise, we have the following formulas: E(pi\e) = crtx (10) 131 ff. = ^ (11) (7 rj = Jo* + ^ 2 (12 ) (13) E[vni\vni + [ini] = Sni - f(Zi,Rn) - E[[ini\vni + [ini] (14) The Third Stage: Calculate Efficiency using Adjusted Input or Output Variables In this stage, the adjusted input variables are re-applied to the D E A model of the first stage to obtain a new efficiency score. Compared with the results of the first stage, the adjusted results will be more accurate, because all D M U s are adjusted to the same external environment, and the impact of statistical noise is proposed. 3 Empirical Results In the first stage, using D E A P 2.1 can help us get the initial efficiency score of TE, PTE, and SE; then in the second stage, using Frontier 4.1 can help us adjust the input variables, in the second stage, we use the input orientation and select the cost function; in the third stage, we use the adjusted input variables and use D E A P 2.1 recalculate the adjusted efficiency score. There are 12 life insurance industries as the D M U s , the period is seven years, from 2013 to 2019. When we use D E A P 2.1 to calculate the efficiency score, we first need to choose the specific model to use. We selected 12 decision-making units for seven years. Unlike the direct calculation of section data, when we use panel data, we have two options. We can split the panel data into cross-section data and calculate the efficiency score respectively and summarize them; or use the Malmquist model, directly use the panel data. The Malmquist model can be used to measure productivity change, the productivity change can be decomposed into technical change and technical efficiency change. Through the Malmquist model, we can also get T E and P T E of each market every year, and then we can calculate S E . But there is a problem: when we use the Three-stage D E A model, in the second stage, when we adjust the input variables, we need to use the input slacks. Using the Malmquist model can not get input slacks. Thus, we choose the first way to deal with the application of panel data in D E A P 2.1. Table 3 (a) presents the initial efficiency score of the D M U s . (a) Initial efficiency score from stage 1 (b) Adjusted efficiency score from stage 3 TE PTE SE TE PTE SE Belgium 0 961 0 992 0 969 0 .527 0 .999 0 .528 Denmark 0. 952 0..977 0..974 0 .932 0 .987 0 .944 Finland 0. 993 0. 996 0..997 0 .908 0 .997 0 .911 Germany 1 1 1 1 1 1 Greece 0. 832 0..977 0. 844 0 .291 0 .996 0 .292 Hungary 0. 978 1 0. 978 0 .232 1.000 0 .232 Ireland 0. 995 0..997 0..997 0 .996 0 .998 0 .999 Italy 0. 971 0. 978 0. 993 0 .968 0 .979 0 .988 Luxembourg 0. 982 0. 985 0..997 0 .946 0 .995 0 .951 Poland 0. 983 0. 984 0. 998 0 .753 0 .989 0 .762 Portugal 0. 954 0. 956 0. 998 0 .761 0 .970 0 .784 Spain 0. 933 0. 959 0..973 0 .823 0 .968 0 .850 Table 3 Summary of Efficiency Score from Stage 1 and Stage 3 From Table 3 (a), we can see that all the initial efficiency scores of Germany are 1, which shows that in the German life insurance markets, input resources are not wasted, and all inputs are completely and effectively converted into output. Among them, Greece has the lowest T E and SE, Portugal has the lowest P T E ; the P T E has the lowest degree of dispersion, indicating that the P T E of each industry is not very different. One of the reasons is when calculating PTE, returns to scale are variable. In the second stage, we use S F A slack regression to analyze the three input variables separately and calculate the new input variables after eliminating the influence of environmental effects and statistical noise. We compare the original input variables with the adjusted input variables, and the results are shown in the three graphs (a, b, c) in Figure 1. 132 30(x><,; 2500% 2000% 1500% 1000% 500% 1 Uiii.. n 111 Ill ml 450% 400% 350% 300% 250% 200% 150% 100% 50% 1 lllll 1 ll ., Il 1 i llll il III! Jill 5000% 4000% 3000% 2000% 1000% 1 III 1 llll u l l l l l N i l - lllll _ llll J" J* s 0.1 0.039 >0.1 Table 4 Verification of classical assumptions and cointegration relations of estimated models. Tabulated are p-values of the performed tests. Finally, estimated models are presented graphically in Figure 5. For the Czech Republic, critical increase of wages in 2017-2019 is relatively well described. For the Slovak Republic, it is visible, that both models produce very similar estimate up to the year 2008. From this year, estimate based on investment to machinery shows visibly better results. Nevertheless, the critical wage increase happened in 2004-2006 is described in acceptable way by both models. Another possible model was suggested by reviewer of this paper. When we add real GDP per capita to the model (2), this variable appears as significant. For both countries, Felling remains significant with positive parameter. Spurious regression was discounted in these models. Further, we made attempt to model the relationship between forest workers wages and volume of total felling by vector autoregression model. Because of nonstationarity of original time series we differentiate them and estimate V A R model on such differences. For the Czech Republic we estimate acceptable V A R ( l ) model with three significant parameters. With F-test p-value 0.028 we verify Granger causality from Felling to Wages with positive impact. For the Slovak Republic we did not find V A R model with significant parameters; we tried to employ Investment variable as the third time series with the same result. This can be caused by relatively short time series unsuitable for the V A R model. Vice versa, detected Granger causality in the case of the Czech Republic should not be overrated for the same reason. 139 j I 1 1 1 1 1 1 1 1 1 500 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 2 t Figure 5 Original and estimated average wages expressed in 2015 prices for total forestry in the Czech Republic (left graph) and for state forestry sector in the Slovak Republic (right graph). 4 Conclusions Based on the achieved results, it can be summarized that we have empirically demonstrated the relation between the volume of logging and the wages of forest workers in the Czech Republic. A n analogous analysis for the Slovak Republic at least partially confirmed this relationship and, in addition, pointed out the importance of involving investment in machinery for a quality explanation of the development of forest workers' wages. This partial discrepancy is caused, among other things, by the different onset and consequences of calamities in the forestry of both countries. In conclusion, we can state that we have estimated a significant mechanism that will allow a more accurate simulation of the impact of major disasters on economic indicators of forestry. Acknowledgements This paper presents the results of the research supported by Czech Science Foundation grant GA18-08078S. References [1] Bartoš, L . , Máchal, R & Skoupý, A . (2009). Possibilities of using price analysis in decision making on the use of harvester technology in forestry. Acta univ. agric. et silvic. Mendel. Brun., LVII(4), 31-36. [2] Brooks, C . (2009). Introductory econometrics for finance. Cambridge: Cambridge University Press. [3] D i Fulvio, F , Abbas, D., Spinelli, R., Acuna, M . , Ackerman, P. & Lindroos, O. (2017). Benchmarking technical and cost factors in forest felling and processing operations in different global regions during the period 2013-2014. International Journal of Forest Engineering, 28(2), 94-105. [4] Erber, A . (2018). Causes of low wages in forestry [in Czech]. Zemědělec 17/2018. [5] Fanta, A . & Šišák L . (2014). Analysis of structure development of employment in the Czech forestry sector from the 1950s to the present. Zprávy lesnického výzkumu, 59(3), 160-166. [6] Greene, W. H . (2020) Econometric analysis. Eighth edition, Global edition. Harlow: Pearson. [7] Hampel, D . & Janova, J. (2014). Simulation of data for reforestation system. In Conference Proceedings of 32nd International Conference Mathematical Methods in Economics (pp. 251-256). U P v Olomouci. [8] Hampel, D., Janova, J. & Kadlec, J. (2015). Estimation of Cost and Revenue Functions for Reforestation System in Drahanska Highlands. In Proceedings of the International Conference on Numerical Analysis and Applied Mathematics 2014. Melville: American Institute of Physics (AIP). [9] Hampel, D . & Viskotová, L . (2020). Actual Revision of Cost and Revenue Functions for Reforestation System in Drahanska Highlands. In MME 2020: Proceedings (pp. 148-153). M E N D E L U v Brně. [10] Janova, J. & Hampel, D . (2016). Optimal managing of forest structure using data simulated optimal control. Central European Journal of Operations Research, 24(2), 297-307. [11] Kmenta, J. (2011) Elements of econometrics. Second edition. A n n Arbor: University of Michigan Press. [12] Lutkepohl, H . (2005). New introduction to multiple time series analysis. Berlin: New York. 140 Efficiency evaluation of the health care system during COVID-19 pandemic in districts of the Czech Republic Jana Hanclova1 , Lucie Chytilova2 Abstract. The article deals with the assessment and evaluation of the performance of the health care system in managing the C O V I D - 1 9 pandemic, which reflects the situation in 77 districts of the Czech Republic at the time of the 3rd peak (March 6, 2021) with 9130 confirmed positive cases. Data envelopment analysis (DEA) is used for this research with 4 inputs (population, incidence in the previous 14 days, the incidence in the previous 14 days at age 65+, capacity of test facilities) and desired output (number of recovered patients) and undesirable output (number of deaths). The D E A model includes non-radial measures and non-proportional changes in output variables. The results document that it is most appropriate to use the D E A model M 3 with the desired reduction in deaths and an increase in the number of recoveries. For this model, 35% of districts in the Czech Republic has been effective. The main problem with failure to be effective was the high number of COVID-19-related deaths. Keywords: COVID-19, Data envelopment analysis, districts, efficiency, health care system, undesirable output. J E L Classification: C61,100, C44 AMS Classification: 90B90 1 Introduction The onset of the COVID-19 epidemic was December 31, 2019 in China. The degree of infection varies from country to country, from district to district. The first case appeared in the Czech Republic (CR) on March 1, 2020. COVID-19 has since resulted in a high number of deaths and the confirmed cases in the Czech Republic. Figure 1 presents the daily increments of confirmed infected cases from August 27, 2020 to M a y 25, 2020. Figure 1 also presents the development of the epidemiological situation as a seven-day moving average (red dashed line) with 3 peaks. The analysis of this article will focus on the latest peak around March 3, 2021. Measuring the C R ' s C O V I D - 1 9 response performance is an extremely important challenge for health care policymakers. Also, people and governments in the Czech Republic have been challenged by COVID-19 and its consequences. Social distancing and personal protective measures became the primary means of controlling the spread of COVID-19. There is a number of research questions that researchers are looking for answers to. C O V I D - 1 9 Figure 1 Development of the number of infected in the Czech Republic [https://onemocneni- aktualne.mzcr.cz/api/v2/covid-19] 1 VSB-Technical University of Ostrava, Faculty of Economics, Department of Systems Engineering, Sokolska tr. 2416/33, 702 000 Ostrava, Czech Republic, iana.hanclova@vsb.cz. 2 VSB-Technical University of Ostrava, Faculty of Economics, Department of Systems Engineering, Sokolska tr. 2416/33, 702 000 Ostrava, Czech Republic, lucie.chvtilova@vsb.cz. 141 In this study, we will focus on assessing and evaluating the efficiency of COVID-19 response performance in the districts in the Czech Republic during the 3r d peak period on March 6, 2021 with around 9130 infected confirmed cases. What are the factors and the structure of the health management system of the COVID-19 pandemic with a focus on the use of data envelopment analysis (DEA)? Hamzan, Y u and See in [4] examined the relative efficiency level of managing COVID-19 in Malaysia using network data envelope analysis. A network process consists of 3 subprocesses - community surveillance, medical care I (health care associated with detected positive people) and medical care II (care associated with severe patients requiring intensive care). Dlouhy in [3] used a simple D E A to evaluate the health system efficiency in O E C D countries. The researched system included 3 inputs (physicians, nurses, hospital beds) and 2 outputs (population, life expectancy). The authors documented that health resources have to be mobilized in a short time. In conclusion, most of the articles that examine the efficiency of systems during the COVID-19 pandemic are at the country level, respectively. States (USA), focus mainly on efficiency in hospitals and the structure corresponds to simple and network D E A systems. This article aims to examine the efficiency of COVID-19 response performance in the 77 districts in the Czech Republic during March 2021. The results of this study fill gaps in the literature in terms of assessing the performance of the health system in managing COVID-19 in small districts using D E A with non-radial measures and non-proportional changes. The rest of the paper is organized as follows. Section 2 introduces the D E A basic methodology, including nonradial measures with non-proportional changes. Section 3 describes the data and Section 4 presents the results. Section 5 includes conclusions and options for future research. 2 Data Envelopment Analysis For modelling the district system (j = 1,2...., N), we will consider a general conceptual D E A model with R desirable outputs y and S undesirable outputs b. We also assume the set of / multiple inputs x. The multiple-output production technology with emphasis on input-specific technology can be described as Tl(x) = {(y ,b):x can produce (y ,b)}. To assess the efficiency of COVID-19 response performance, we will use the distance directional function (DDF), which Chung et al in [1] introduced as the joint production of desirable output y and undesirable output b: DT(x,y,b,g\gh ) = mp{/3\(y,b) + {3(g\gb )GTl(x)}, ( " where the nonzero vector (gy ,gb )is the direction vector and /?expresses the intensity of the increase in the desired production while reducing unwanted production and is referred to as the scaling factor. A t the same time, Toloo and Hanclovain [6] set the condition: D [x,y, b, gy ,gb ) > 0 if and only if (y,b)e 7j (x). This D D F function moves the joint production (y, b) along the direction(gy ,gb ) to place it on the production frontier. Zhou et al showed in [7] that the radial method for measuring efficiency can be overestimated if we have non-zero slack variables, and therefore introduced non-radial measures. Toloo et al in [5] and Chytilova and Hanclova in [2] used the D E A model with non-radial measures, where the direction vector is g = (gv , gy ,gb )' = (-•*„, y„, ~b0)'. In our paper, we will use output-oriented DEA models with non-proportional changes in outputs using a general scaling vector yff' = (J3*, 0y , fib ), we can formulate a D E A model (2) to determine the efficiency of COVID-19 responsive performance in general: (2) 142 z =DT (x,y,b,gx ,gy ,gb )=max{w*'f + w> '/3> +wh '/3h } = P* N s.t. ^ ^ < ( 1 - ^ X i=l,...,I j=i X V , > ( l + / ? ) y r o r = l,...,R j=i fjX]bs]<(\-l3':)bS0 s = l,...,S j=l N VRS:^j=^ ^ 0 \/i,r,s where vector w'= (w*, wy , wb )is the normalized weight vector and we assume that the weight of all inputs, desirable and undesirable outputs is gradually 1/3. Model (3) is similar to the additive D E A model in the sense that both attempt to identify the potential slacks in inputs and outputs as much as possible. The non-radial directional distance function is based on T^: DTi{x,y,b,g\g ,gh ) = wV{w' p-.\{x,y,b) + gdiag{P) sT2}, (3) In the empirical study, we will focus on the comparison of 3 variants of the D E A model (2): • Model M l : g = (g*, g y , g 6 ) ' = (0, 0, - b G ) ' A w = (wx , wy , wb )' = (0, 0, 1)'; • Model M 2 : g = (g*, g y , g 6 ) ' = (0, y 0 , 0)' A w = (wx , w^, wb )' = (0, 1, 0)'; • Model M 3 : g = (g*, g y , g 6 ) ' = (0, y 0 , - b G ) ' A w = (wx , w^, w b ) ' = (0, \ , \)'. A l l M l - M 3 models use non-radial measures and non-proportional changes. Model M l is a D E A model-oriented only to decrease undesirable outputs, model M 2 is oriented only to increase desired outputs, and model M 3 is oriented only to increase undesirable outputs and decrease desired outputs. To evaluate the efficiency, we will have a total beta index and sub-indices fi*, f5y , f5b . The index ft = 0 indicates the effective unit, and the higher the beta, the less effective the unit (district). Evaluation using the y-b performance index IBPI can generally be determined on the basis of relation (4) according to the publication of authors Zhou et al in [7] in the case of the output-oriented model with undesirable output: g - / n ( i + / r ) Y B P I = — ( 4 ) The numerator in (4) represents the average proportion by which the undesirable output can be reduced, while the denominator expresses the degree to which the desired output can be increased. The YBPI index is standardized between 0 and 1, YBPI = 1 means that the district is located at the frontier of best practice. 3 Data To determine the efficiency of COVID-19 response performance, we will use the D E A model (2) in the three mentioned variants of models M l - M 3 . The D E A model included 4 inputs (population, total incidence in the previous 14 days, the incidence of people over 65 in the previous 14 days, and testing capacity of the facility in the district) and 1 desired output (number of recovered patients) and 1 undesirable output (number of deaths associated with COVID-19). Table 1 summarizes the description of the individual variables in the D E A model. The source of the P O P variable is the Czech Statistical Office, the source of data for indications IN14 and IN65 is closed data sets for predictive modelling, which were provided to us after sending an official request for access to the Ministry of Health of the Czech 143 Republic. The C A P data source is the open data set Přehled odběrových míst3 in the testing section. The source of data for both outputs Y and B are open data sets Anti-epidemic System of the Czech Republic (PES)4 . This study is devoted to the analysis of 3 variants of D E A models for 77 districts of the Czech Republic as of March 6t h , 2021. For inputs IN14 and IN65, the sum of daily cases for the previous 14 days is cumulated, i.e. from February 20, 2021 to March 6, 2021. Conversely, for both outputs Y and B , the sum of the number of persons for 14 days later, i.e. from March 6, 2021 to March 20, 2021. I/O ID Variable Description Unit per district POP population number of persons number of persons inputs IN14 incidence in the last 14 days number of confirmed positive cases in the last 14 days number of persons IN65 incidence in the last 14 days for 65+ number of confirmed positive cases at the age of 65+ in the last 14 days number of persons C A P testing capacity total maximum capacity of testing facilities number of persons desirable output Y recovered patients number of patients recovered over the next 14 days number of persons undesirable output B deaths number of deaths in the next 14 days number of persons Table 1 Description of variables in the D E A model [Source: ČSÚ3 , M Z Č R4 ] 4 Results A l l three variants of the model (3) were optimized using the G A M S software. The first part of the analysis is devoted to the comparison of the values of the average value B E T A _ M 1 , B E T A _ M 2 and B E T A _ M 3 for individual models M l , M 2 and M 3 . A zero-beta value means that the district is located at the frontier of best practice, i.e. it is efficiency. — BETA_M1 ^ — B E T A M 2 — 8ETA_M3 Benesov Náchod Figure 2 Comparison of levels B E T A _ M 1 to B E T A _ M 3 [own calculation] 3 https://www.czso.cz/csu/czso/pocet-obvvatel-v-obcich-k-112021 4 https://onemocneni-aktualne.mzcr.cz/api/v2/covid-19 144 For the variant of the M l model, where only a reduction in the number of deaths is considered, 27 efficiency units (35%) where indicated. For the variance of M 2 , where an increase in the number of recovered patients was allowed, there were 28 efficiency units (36%) and in comparison, with M l , the district of Hradec Králové was also effective. In the case of the M 3 model, where it was desirable to increase the number of cured patients and reduce the number of deaths, 27 districts (35%) lie on the efficiency frontier. A t the same time, 27 districts were efficient for all examined models M l to M 3 . Figure 2 shows the level of inefficiency (beta) of districts according to the models M l to M 3 . For the M 3 model, the districts of Cheb (with a beta value of 0.532), Znojmo (0.382), České Budějovice (0.379), Hodonín (0.374), Trutnov (0.344), Most (0.338), Svitavy (0.330), Louny (0.327), Česká Lípa (0.327), Sokolov (0.324) and Ostrava město (0.313) belonged to the quartile with the worst efficiency in managing the C O V I D - 1 9 pandemic (at the time of the third peak in the number of infected persons in the Czech Republic around March 6, 2020). The main reason for not reaching the efficiency limit was the problem of the high number of deaths of patients compared to efficiency units. Here, in the future, it will be necessary to make further detailed analysis and look for the facts that caused this. Further efficiency analysis was performed based on the y-b performance of the IBPI index from Equation (4). This index expresses the ratio of the average decrease in the number of deaths to the average increase in the number of recovered patients. ^ — y b p i M J ^—ybp<_M2 ^— ybpi_M3 Benešov Náchod Figure 3 Comparison of Y B P I _ M 1 to Y B P I _ M 3 [own calculation] Figure 3 shows the level of this index for inefficiency districts for the 3r d of the COVID-19 pandemic. The IBPI index is almost identical in terms of level M l and M 3 , which is documented by a statistically significant correlation of 0.999. The number of effective districts corresponds to the previous analysis with beta coefficients. Furthermore, we will again focus on the least efficiency quartile according to the Y B P I for the M 3 model, which prefers a reduction in death and an increase in the number of recovered patients. The least efficiency districts managing the COVID-19 pandemic were confirmed by the districts of Cheb (0.252), České Budějovice (0.270), Hodonín (0.284), Znojmo (0.288), Chomutov (0.336), Most (0.229), Česká Lípa (0.0.356), Svitavy (0.365), Louny (0.387), Ostrava-město (0.393), Trutnov (0.398), Karviná (0.403) and Břeclav (0.419). In comparison with the evaluation of the least efficiency quartile according to beta, the districts of Karviná and Břeclav were included in the place of the district of Sokolov, which is also related to the calculation of quartile boundaries. 5 Conclusion The presented paper is devoted to assessing and evaluating the management of the pandemic situation COVID-19 at the time of the peak (according to the number of infected persons) in the districts in the Czech Republic. A n analysis of the data envelope with models of output-oriented, non-radial measures and non-proportional changes 145 of output variables were used for this research. The obtained results document that the proposed models can be used in practice with the intention of prevention in other pandemic situations. The results showed that of the three proposed variants of the models, the most suitable model is the M 3 model, which aims to reduce the number of patients who have died, but also to increase the number of recovered patients. In the evaluation of the health care system at the time of the peak of the COVID-19 pandemic (March 6, 2021), 35-36% of effective districts were demonstrated for all variants of the M l to M 3 models. The average beta inefficiency was rated 0.158 for the M 3 model. The analysis of the results of the M 3 model further showed that in order to improve the efficiency of the health care system, the main problem is to reduce the number of deaths, which is confirmed by the average inefficiency. The need to reduce undesirable output is significant compared to the average inefficiency of increasing the number of recovered patients. The empirical study also pointed to a group of districts in the "worst" quartile, i.e. with the worst level of efficiency, where policymakers, the Ministry of Health and other institutions need to pay attention to the prevention of possible similar pandemics. The obtained results also have their limits. This is mainly an analysis at the district level in the Czech Republic at the time of the peak of the COVID-19 pandemic and the possibility of obtaining relevant data files. Further research in this area will focus on the possibility of extending the proposed D E A models to the D E A network group, spatially orienting to the state level due to more accessible data, examining the homogeneity of service units, reallocation of resources and development of health care infrastructure. Acknowledgements This research was supported by the Czech Science Foundation within the project G A 19-13946S and the Student Grant Competition (SGS) within the project SP2021/51. References [1] Chung, Y . H., Faro, R. and Grosskopf, S. (1997). 'Productivity and undesirable outputs: A directional distance function approach'. Journal of Environmental Management, 51(3), 229-240. [2] Chytilova, L . and Hanclova J. (2019). 'Estimating the environmental efficiency of European cross-countries using a non-radial general directional distance function'. Proceedings of 37h International Conference on Mathematical Methods in Economics (MME 2019), České Budějovice, Czech Republic, pp. 541-546. [3] Dlouhý, M . (2020). 'Health System Efficiency and the COVID-19 Pandemic'. Proceedings of 38th International Conference on Mathematical Methods in Economics (MME 2020), Brno, Czech Republic, pp. 80-84. [4] Hamzan, N . M . , Y u , M . M . , See, K . F . (2021). 'Assessing the efficiency of Malaysia health system in C O V I D - 19 prevention and treatment response'. Health Care Management Science, 24, pp. 273-285. [5] Toloo, M . , Allahyar, M . and Hanclova, J. (2018). ' A non-radial directional distance method on classifying inputs and outputs in D E A : Application to banking industry'. Experts Systems with Applications, 92, pp. 495- 506. [6] Toloo, M . and Hanclova, J. (2020). 'Multi-valued measures in D E A in the presence of undesirable outputs'. Omega International Journal of Management Science, 94, pp. 1-11. [7] Zhou, P., A n g , B.W., Wang, H . (2012). 'Energy and C 0 2 emission performance in electricity generation: A non-radial directional distance function approach'. European Journal of Operational Research, 221 (3), pp. 625-635. 146 Analysis of uneven distribution of diseases COVID - 19 in the Czech Republic Jakub Hanousek1 Abstract. The C O V I D - 19 pandemic affected more, or less each of us. The goal of this article is to measure regional uneven distribution in 77 districts in the Czech Republic. The data cover a period since March 2020 to March 2021 and comes from Institute of Health Information and Statistics of the Czech Republic. The regional variations are measured by dual D E A models with unwanted outputs. Data envelopment analysis is a method based on a linear programming that measure efficiency of production units. The advantage of this method is an optimalization weights of each criteria (inputs, outputs) to maximize a score from each unit. The production unit in this paper is a one region in the Czech Republic. There are 77 units. In order to stop the spread of disease, the Government of the Czech Republic applied national prohibitions. The results of this study show us that the regional variation of spread of diseases are huge. The regional prohibition probably would be preferable and more effective in this situation. Keywords: C O V I D - 19, D E A , unwanted outputs J E L Classification: C44 AMS Classification: 90C15 1 Introduction Disease C O V I D - 19 is a new disease. This disease is caused by a new type of coronavirus S A R S - C o V - 2 . Coronavirus S A R S - C o V - 2 was first detected at the end of the year 2019 in the chinease city Wuhan. S A R S - C o V - 2 have been widespread througout the word during the year 2020. The Czech Republic is the one of the most affected country in the world by S A R S - C o V - 2 . The goal of this article is to meassure inequalities in spread S A R S - C o V - 2 in the 77 regions of the Czech Republic. Inequalities are meassure by data evelopment analysis ( D E A ) [1,2]. D E A is based on the theory of linear programming and estimates the production frontier as the piecewise linear envelopment of the data. The production units whith lies on a production frontier are effective. Production unit which are lies under the production frontier are inneficient. The production units which are effective have score 1. Inneficient production units in output oriented model have score > 1. Score means how much proportionally increase outputs to became production unit effective [4]. The input is a population in the region. The oputputs are number of infected people with S A R S - C o V - 2 and number of death people with S A R S - C o V - 2 . Effective production units in this article are the most affected regions in the Czech Republic. The inneficient production units are regions with better epidemic situation. The score at inneficient production units means how to proporcionally allow icrease outputs (infected and death). The production frontier in this article represent situation that all regions were same as the most affected regions. 2 Methods D E A was developed by Charnes, Cooper and Rhodes in 1978 [3] and constructs the production frontier and evaulates the technical efficiency of production units. The production unit uses a number of inputs to produce outputs. The technical efficiency of the production unit is is defined as the ratio of its total weighted output to its total weighned input or, vice versa, as the ratio of its total weighned input to its total weighned output. D E A model permits each production unit to choose its input and output weights to maximize its technical efficiency score. A technically efficient production unit is able to find such weights that the production unit lies on the production frontier [5]. The production frontier represents the maximum amounts of output that is produced by given amounts 1 University of Economist and Business, Prague, Faculty of Informatics and Statistics, Department of Econometrics, Winston Churchil Sq., Prague, Czech Republic, e-mail: xhanj52@vse.cz 147 of input (the output maximization D E A model) or, alternatively, the minimum amounts of inputs required to produce the given amount of output (the input minimization D E A model). This article deals with two D E A models. The first model is output maximization model with constant revenue from scale. The second model is output oriented model with variable revenue from scale [4]. The effective production units represent in this article the worst districts. The districts which are most effected with virus. 2.1 Output oriented model with constant return of scale Maximize: (p + £ ( e V + e V ) , XX + s~ = x q , Subject to: Y^-s+ =(pqyq, X,s\s" > 0 . (pq is a variable which represents efficiency rate of a production unit. £ is an infinitesimal constant. The infinitesimal constant £=10 8 . eT = (1,1,1...,1). S + a n d s'are vectors of additional variables. X is a matrix of inputs. Yis a matrix of outputs. X = (Ä[,Ä2,...,Än) is a vector of weights which are assign to productions units. Weights are the variables in a model [4]. 2.2 Output oriented model with variable constant of scale Maximize: (pq + £ ( e T S + + e V ) , Xk + s~ = x q , q w m Y ^ " S + = ( I W (2) Subject to: v ' eT X = 1, X,s+ ,s" > 0 . (pq is a variable which represents efficiency rate of a production unit. £ is an infinitesimal constant. The infinitesimal constant £=10 8 . eT = (1,1,1...,1). S + a n d s'are vectors of additional variables. X is a matrix of inputs. Yis a matrix of outputs. 'k = (Al,A1,...,An) is a vector of weights which are assign to productions units. Weights are the variables in a model. eT X = 1 is a condition of convexity [4]. 148 3 Aplication The application of the described methods is illustrated on the data of the Czech Republic. The data comes from Institute of Health Information and Statistics of the Czech Republic. The Czech Republic is divided into 77 regions and in 2021 it had 10.44 mil. Inhabitants [6]. The number of infected people with virus S A R S - C o V - 2 since March 2020 to March 2021 was 1,403,809 in the Czech Republic. The number of death people with virus S A R S - C o V - 2 since March 2020 to March 2021 was 24,331 in the Czech Republic [7]. The data are presented in Table 1. District Population Number of in­ Number fected death Praha 1,268,796 159,523 2,220 Praha západ 131,231 18,920 159 Příbram 112,816 16,700 153 Rakovník 54,993 7,965 143 Benešov 95,459 16,923 327 Beroun 86,160 12,532 124 Kladno 158,799 23,540 347 Kolín 96,001 17,020 254 Kutná hora 73,404 10,911 186 Mělník 104,659 15,917 280 Mladá Boleslav 123,659 19,591 331 Nymburk 94,884 15,479 255 Praha východ 157,146 24,896 227 České Budějovice 186,462 22,556 359 Český Krumlov 60,516 6,434 137 Jindřichův Hradec 90,604 13,374 264 Písek 69,843 10,412 143 Prachatice 50,010 5,754 145 Strakonice 69,786 9,208 193 Tábor 101,115 13,434 323 Domažlice 59,926 8,588 170 Klatovy 85,726 12,580 249 Plzeň město 188,045 28,199 422 Plzeň jih 62,389 9,891 194 Plzeň sever 74,940 12,554 179 Rokycany 47,458 7,678 166 Tachov 51,917 8,449 190 Cheb 90,188 13,714 536 Karlovy Vary 115,446 15,002 416 Sokolov 89,961 14,234 382 Děčín 128,834 17,717 329 Chomutov 122,157 13,438 320 Litoměřice 117,278 16,158 275 Louny 85,191 10,903 202 Most 111,775 12,982 300 Teplice 125,498 14,693 233 Ústí nad Labem 118,228 15,532 222 Česká Lípa 100,756 14,093 277 Jablonec nad Nisou 88,200 15,490 251 Liberec 169,878 29,266 364 149 Semily 73,605 11,968 156 Hradec Králové 162,661 29,631 356 Jičín 79,702 12,969 189 Náchod 109,550 20,685 412 Rychnov nad Kněžnou 77,829 13,769 170 Trutnov 118,174 24,662 461 Chrudim 103,199 17,359 235 Pardubice 168,423 28,584 362 Svitavy 103,245 14,081 246 Ústí nad Orlicí 136,760 21,223 297 Havlíčkův Brod 94,217 14,413 277 Jihlava 110,522 13,833 230 Pelhřimov 71,914 10,087 182 Třebíč 111,693 12,525 209 Ždár nad Sázavou 117,219 15,501 202 Blansko 105,708 12,495 317 Brno mesto 385,913 39,935 692 Brno venkov 206,300 24,718 372 Břeclav 112,828 12,292 297 Hodonín 153,225 18,755 392 Vyškov 88,154 11,313 276 Znojmo 111,380 12,433 300 Jeseník 38,779 4,185 115 Olomouc 230,408 29,768 539 Prostějov 107,859 14,102 201 Přerov 130,082 18,050 257 Šumperk 121,299 13,035 263 Kroměříž 105,569 14,008 264 Uherské Hradiště 141,467 18,577 261 Vsetín 142,420 18,259 308 Zlín 190,488 26,310 457 Bruntál 92,693 10,404 263 Frýdek Místek 207,756 27,305 525 Karviná 256,394 30,945 598 Nový Jičín 148,074 18,281 270 Opava 174,899 27,279 398 Ostrava mesto 326,018 37,820 735 Table 1 Data The results from model 1 and model 2 are in table 2. The most affected regions have score 1. The virtual outputs represent how much will increase outputs to become the regions to same level as the most affected regions. For example, region Praha has score 1.66. Region Prague have 159,523 infected people and 2,220 death people with COVID-19. If we work with constant return of scale region Praha will be increase to 264,788 infected and to 4,950 death people with COVID-19. 150 Eff. Score con­ Eff. Score variastant return of virtual out­ virtual out- ble return of virtual out­ virtual ou scale put 1 put2 scale put 1 put2 Praha 1.66 264,788 4,950 1.00 159,523 2,220 Praha západ 1.45 27,387 512 1.38 26,192 481 Příbram 1.41 23,544 440 1.40 23,375 439 Rakovník 1.44 11,477 215 1.19 9,488 197 Benešov 1.16 19,695 381 1.13 19,131 370 Beroun 1.43 17,981 336 1.35 16,973 327 Kladno 1.41 33,140 619 1.25 29,424 523 Kolín 1.18 20,035 375 1.14 19,337 369 Kutná hora 1.40 15,319 286 1.27 13,910 274 Mělník 1.37 21,842 408 1.35 21,416 405 Mladá Boleslav 1.32 25,807 482 1.29 25,305 469 Nymburk 1.28 19,802 370 1.23 19,068 364 Praha východ 1.32 32,795 613 1.17 29,230 521 České Budějovice 1.73 38,913 727 1.45 32,666 565 Český Krumlov 1.87 12,057 257 1.66 10,667 227 Jindřichův Hradec 1.39 18,554 366 1.34 17,885 353 Písek 1.40 14,576 272 1.25 13,054 259 Prachatice 1.62 9,328 235 1.33 7,635 192 Strakonice 1.52 13,985 293 1.39 12,827 269 Tábor 1.43 19,223 462 1.42 19,058 458 Domažlice 1.43 12,259 243 1.24 10,672 218 Klatovy 1.39 17,538 347 1.33 16,742 331 Plzeň město 1.39 39,244 734 1.16 32,851 568 Plzeň jih 1.29 12,805 251 1.14 11,264 228 Plzeň sever 1.25 15,639 292 1.14 14,278 281 Rokycany 1.22 9,401 203 LOO 7,678 166 Tachov 1.20 10,132 228 1.01 8,570 193 Cheb LOO 13,714 536 LOO 13,714 536 Karlovy Vary 1.38 20,680 573 1.29 19,279 535 Sokolov 1.15 16,343 439 1.12 15,993 429 Děčín 1.52 26,887 503 1.46 25,797 479 Chomutov 1.73 23,313 555 1.62 21,777 519 Litoměřice 1.51 24,475 458 1.51 24,447 457 Louny 1.63 17,779 332 1.54 16,740 323 Most 1.66 21,586 499 1.64 21,277 492 Teplice 1.78 26,190 490 1.74 25,520 472 Ústí nad Labem 1.59 24,673 461 1.59 24,668 461 Česká Lípa 1.47 20,664 406 1.44 20,261 398 Jablonec nad Nisou 1.19 18,407 344 1.13 17,463 336 Liberec 1.21 35,452 663 1.05 30,722 540 Semily 1.28 15,361 287 1.17 13,958 275 Hradec Králové 1.15 33,946 635 1.01 29,876 529 Jičín 1.28 16,633 311 1.19 15,422 301 Náchod 1.08 22,362 445 1.07 22,209 442 151 Rychnov nad Kněžnou 1.18 16,242 304 1.09 14,972 293 Trutnov 1.00 24.662 461 1.00 24,662 461 Chrudim 1.24 21,537 403 1.21 21,065 399 Pardubice 1.23 35,149 657 1.07 30,552 538 Svitavy 1.53 21,546 403 1.50 21,076 399 Ústí nad Orlicí 1.34 28,541 534 1.26 26,840 489 Havlíčkův Brod 1.35 19,475 374 1.31 18,872 363 Jihlava 1.67 23,065 431 1.65 22,824 429 Pelhřimov 1.49 15,008 281 1.34 13,552 268 Třebíč 1.86 23,309 436 1.84 23,105 434 Ždár nad Sázavou 1.58 24,463 457 1.58 24,433 457 Blansko 1.57 19,662 499 1.57 19,600 497 Brno město 2.02 80,537 1,505 1.33 52,959 918 Brno venkov 1.74 43,053 805 1.42 34,991 596 Břeclav 1.74 21,408 517 1.69 20,799 503 Hodonín 1.64 30,737 642 1.41 26,378 551 Vyškov 1.47 16,661 406 1.43 16,162 394 Znojmo 1.70 21,145 510 1.66 20,694 499 Jeseník 1.67 6,974 192 1.00 4,185 115 Olomouc 1.62 48,084 899 1.22 36,258 657 Prostějov 1.60 22,509 421 1.57 22,185 418 Přerov 1.50 27,147 507 1.44 26,058 479 Šumperk 1.89 24,646 497 1.84 23,925 483 Kroměříž 1.57 21,970 414 1.54 21,635 408 Uherské Hradiště 1.59 29,523 552 1.47 27,392 497 Vsetín 1.63 29,722 556 1.51 27,504 498 Zlín 1.51 39,753 743 1.25 33,014 573 Bruntál 1.66 17,267 436 1.63 16,928 428 Frýdek Místek 1.57 42,938 826 1.21 32,903 633 Karviná 1.71 52,897 1,022 1.21 37,479 724 Nový Jičín 1.69 30,902 578 1.54 28,166 507 Opava 1.34 36,500 682 1.15 31,311 548 Ostrava město 1.77 67,128 1,305 1.16 44,023 856 Table 2 Results of modul 1 and model 2 The total infected people in our period are 1,433,809 and number of total death people are 24,331. The variations between regions are large. The worst results in model with constant return of scale have regions Trutnov and Cheb. The three regions with the best results in model with constant return of scales are regions Brno - město, Šumperk and Třebíč. The regions Trutnov and Cheb lies on the production frontier. If we moved all regions to product frontier in model 1, the total infected people would increases to number 2,139,809 and the total death people increases to 42,089. The worst results with model variable return of scales have regions Praha, Rokycany, Cheb, Trutnov, Jeseník. The three regions with the best results in model with variation return of scales are regions Šumperk, Třebíč and Břeclav. If we moved all regions to the product frontier in model 2, the total number of infected people would increases to 1,833,845 and the total death people increases to 35,102. The results from model 1 are figure on figure 1. 152 Figure 1 Map with results of model 1. The most affected regions with C O V I D - 1 9 are on the map of the Czech Republic have red color. The regions with the smallest incidence with C O V I D - 1 9 have green color. 4 Conclusions This paper focuses on measure of regional variation in spread of disease C O V I D - 1 9 between 77 regions in the Czech Republic. The variations were measured by product frontier model based on data envelopment analysis. Models in this paper work with one input and two outputs. The input is population in the q region. The first output is number of infected people with C O V I D - 1 9 in the ^region and the second output is number of people death with COVID-19 in the ^region. The regions which lie on the product frontier were the most affected regions with COVID-19. Variations were measured with two outputs oriented models. The first model works with constant return of scales and the second model works with variation return of scales. The second model show lover differences between regions and lower total infected and death people with C O V I D - 1 9 . If we moved all regions to product frontier calculated with model 1, the total infected people would increase from 1,433,809 to 2,139,809 and the total death people would increase from 24,331 to 42,089. If we moved all regions to product frontier calculated with model 2, The total infected people would increase from 1,433,809 to 1,833,845 and the total death people would increase from 24,331 to 35,102. The regional variations are large. The local prohibition in this situation could be more effective than national prohibition. Acknowledgements This work was supported by the project no. F4/42/2021 of the Internal Grant Agency, Faculty of Informatics and Statistics, University of Economics, Prague. 153 References [1] Dlouhý, M . and Hanousek, J. (2019). A n assesment of Regional Variations: AnApplication to Polish Regions. In: 37 th International Conference on Mathematical Methods in Economics, České Budějovice: University of South Bohemia i n ČeskéBudějovice, Faculty of Economics, pp. 281-286 [2] Dlouhý, M . (2018). Measuring Geographic Inequalities: Dealing with multiple Health Resources by Data Envelopment Analysis. Frontiers in Public health, 6(53), pp. 1-6. [3] Charnes, A.,Cooper, W . W.&Rhodes, E. (1978). Measuring the Inefficiency of Decision Making Units. European Journal of Operational Research, 2, 429-444. doi: 10.1016/0377-2217(78)90138-8 [4] Jablonský, J., Dlouhý, M.(2004): Modely hodnocení efektivnosti produkčních jednotek. Praha: Professional Publishing, pp. 183. [5] Kumbhakar, S. C , Lovell, C. A . K(2000). Stochastic frontier analysis. Cambridge: Cambridge Univer-sity Press, pp. 355. [6] Český statistický úřad, [online] Available at: https://www.czso.cz/csu/czso/pocet-obyvatel-v-obcich-k- 112019 [Accessed 28.3.2021] [7] Ministerstvo zdravotnictví České republiky [online] Available at: https://onemocneni-aktualne.mzcr.cz/covid-19/kraie [Accessed 31.3.2021] 154 Determinants of company indebtedness in the construction industry Jana Heckenbergerová1 , Irena Honková2 , Alena Kladivová3 Abstract. The aim of this paper is to reveal the determinants of indebtedness in the construction industry companies. The construction industry is a specific sector where payment morale is generally poor. It gradually negatively affects other companies in the following sectors. Finding the essential determinants of corporate indebtedness can prevent l i quidity problems. Based on a literature review, the following determinants were selected for analyses: share of fixed assets, interest rate, return on assets, size of the company and its age. Correlation analysis and multiple linear regression analysis have been chosen to determine the influence of the determinants within years 2016-2019. It was found that the generally recommended fixed asset share determinant was not an appropriate determinant and its possible effect on indebtedness was also proven to be insignificant. Surprisingly interest rates have also classified as insignificant. Significant determinants negatively affecting indebtedness for construction companies were determined as enterprise size and duration. The most important determinant was the return on assets with negative influencing outcome. Keywords: indebtedness, capital structure, return on asset, construction industry J E L Classification: M l A M S Classification: 62, 91 1 Introduction Construction industry is one of the key sectors of the economy. The share of the construction industry in the gross value added of the whole economy has been between 5% and 7% [11]. Therefore, it is considered as one of the important indicators of the development in the economy. This industrial sector was deeply affected by the last economic crisis in 2009 and 2010, as evidenced by the proportion of failed loans of up to 28 %, the highest of all branches of industry [6]. Construction sales accelerated significantly year-on-year growth in 2018, but still did not reach the level of 2008. The return on equity (ROE) was 16,47% in 2018 and it is still less than 22,57% from the pre-crisis period of 2008 [11]. Consequences of this crisis are linked with indebtedness and liquidity in this sector. This is gradually negatively affecting other enterprises in following branches. Identifying and analyzing factors affecting the indebtedness of construction companies could help with prediction of upcoming liquidity problems. Searching of mutual relations can confirm or deny the significance of analyzed determinants. Knowledge of significant factors affecting the indebtedness of companies can help creditors to evaluate the company rating. This eliminates further problems with the repayment of liabilities and secondary insolvency, and therefore it contributes to a healthy business environment. 1 University of Pardubice/Faculty of Economics and Administration, Institute of Mathematics and Quantitative Methods, Studentská 95, Pardubice, iana.heckenbergerova@ upce.cz 2 University of Pardubice/Faculty of Economics and Administration, Institute of Business Economics and Management, Studentská 95, Pardubice, irena.honkova@upce.cz 3 University of Pardubice/Faculty of Economics and Administration, Studentská 95, Pardubice, alena.kladivova@ student.upce.cz 155 2 Literature review and Problem statement As the essential determinant of the capital structure it is usually mentioned the tax costs and the tax shield. Other factors are based on sector standards and various costs, for example the weight average cost of capital ( W A C C ) and costs of financial distress. Another significant determinants are including, according to Křivská [9], profitability and stability of the company, the asset structure of the enterprise, the business sector, the management of the enterprise and its approach to risk, the structure of ownership and control over the enterprise, financial freedom, the amount of investment, the size of the enterprise, the goodwill and history of the enterprise, the requirements of the credit rating agencies. Marks [10] deals with the factors of the capital structure in their publication as well. They consider that the approach of shareholders or owners, their requirements for the dividend payout ratio, their relationship to credit and risk, corporate philosophy and the sector, the business life phase, have a major influence on the capital structure. Ručková [14] argues that the capital structure is mainly influenced by the focus of the company's business. She summarizes other factors in four areas: business risk, corporate tax position, financial flexibility, and managerial conservatism and aggressiveness. Singh [15] and Chen & Chen [4] in their researches confirm the importance of the profitability, size and volatility of the enterprise. Oztekin [12] observes a context between indebtedness and company size, tangible assets, and profitability. He states that the capital structure reflects the institutional environment in which it operates. Aulová and Hlavsa [2] focused on specific sector of Czech farms in their work. The size and asset collateral were identified as the most important determinants. Long-term indebtedness was most affected by size, asset collateral, tax shield and retained earnings. On the contrary, Viviani, J. [16] found out that there is no statistically significant dependence between indebtedness and the size of the enterprise, the structure of assets, the profitability of assets and the tax shield. Prášilová [13] found out in her research that the age of the company has a positive effect on the total indebtedness of Czech companies, and she observed a negative relationship with the profitability of assets. Only the share of fixed assets affected long-term indebtedness. In the I C T sector, it was found a negative relationship of total debt to the size of the enterprise and a positive relationship with the volume of retained earnings. Křivská [9] considers that larger enterprises generally show higher profits and that a higher level of liquid assets is less risky for investors. However, external influences, such as the level of the capital market, legislative processes, the economic policy itself and the mentioned above economic cycle or tax shield [7,8], affect total indebtedness as well. In previously mentioned studies, the significant relationship to the capital structure was proven only for some determinants. Obviously, a few of described characteristics overlap and complement each other. Since the most of analyses were sector-specific, it is problematic to generalize the results as each sector has its own specificities. In our study, the main goal is to determine the direct effects on construction industry indebtedness. Therefore based on above discussion, we selected the most common internal determinants: fixed asset share (SFA), interest rate (IR), return on assets (ROA), enterprise size (S) and duration (D). We neglected a lot of other determinants by the cause of one sector investigation only. External influences were excluded as well due to their general effect. 3 Source data and Methods The source dataset, available in the public register [1], consist unconsolidated financial statements of fifty companies in the construction industry in the Czech Republic within years 2016-2019. Crucial fact for companies' selection is that they have not been liquidated till 31s t August, 2020. Other entrance criterions as the legal form, size and duration, were not implemented. Our research starts with a technical financial analysis, which evaluates main sample characteristics of the total indebtedness. The correlation analysis and multiple linear regression analysis are utilized to determine the effects and significance of individual determinants. Powerful software tool Statistics 12 was helpful for our analyses, where the following abbreviations are used: Total Indebtedness (TI), Share of Fixed Assets (SFA), Interest Rate (IR), Return on Assets (ROA), Duration (D) and Size (S). In all presented results, normality is assumed and the significance level is pre-set to a = 0.05. 156 4 Results Although financial data from the construction industry have not yet reached the situation before the 2008 financial crisis [11], it is obvious that total indebtedness is already reaching recommended level. Sample distribution of total indebtedness, illustrated in Figures 1 and 2, is left skewed. This is confirmed by the average debt lower than the median as shown in Table 1. Moreover it is quite heavy tailed cause of a few companies with quadruple debt compared with average value. The average total indebtedness in the construction industry (51%) overreaches the recommended 40% level of total indebtedness. Nevertheless the median value of total indebtedness (41%) already corresponds to this recommended standard. A s in any sector, there are companies with almost zero indebtedness and conversely over-indebted companies with negative equity. Histogram zTI MME.S1a9v'200c Tl =200*0,1 034'normal(x;0,514;0,3726) Tl Figure 1 Histogram for the Total Indebtedness • Pitimer = 0,514 • PitimeriSmOdch = (0,1413,0,8866) XPiümer±1,96*SmOdch = (•0,2164,1,24431 Figure 2 Boxplot of the Total Indebtedness N Average Median Minimum Maximum Std Var.coef. Skew Kurtosis T l 200 0,514 0,418 0 2,068 0,373 72,504 1,484 2,886 Table 1 Sample characteristics of the Total Indebtedness At the beginning of the analyses, correlation matrix within the individual determinants was evaluated to see whether the explaining variables for the total indebtedness were appropriate. Obviously from Table 2, where significant correlations are marked as red, the appropriate determinants are the interest rate (IR), return on assets (ROA), the duration (D), and the size (S). These determinants do not correlate with each other significantly. However, the share of fixed assets (SFA) is not very suitable as a determinant of indebtedness, as it points to a significant correlation with other determinants return on assets, duration and size. 157 TI S F A IR R O A D S TI 1,000 -0,089 0,077 -0,187 -0,253 -0,225 S F A -0,089 1,000 0,110 -0,231 0,434 0,481 IR 0,077 0,110 1,000 -0,117 0,036 0,059 R O A -0,187 -0,231 -0,117 1,000 -0,089 0,067 D -0,253 0,434 0,036 -0,089 1,000 0,423 S -0,225 0,481 0,059 0,067 0,423 1,000 Table 2 Correlation matrix between the Total Indebtedness and selected determinants Table 2 also shows correlations between total indebtedness (TI) and the influencing variables: fixed asset share (SFA), interest rate (IR), return on assets (ROA), duration (D) and enterprise size (S). There are significant correlations between total indebtedness (TI) and return on asset assets (ROA), duration (D) and size (S). A l l these determinants negatively affect total indebtedness. The effect of the fixed asset share (SFA) is insignificant and rather negative, the impact of interest rate (RI) is insignificantly positive. It follows from the above that companies with higher return on assets have lower total indebtedness. It is also true that the older and larger the company the less indebted it is. However, behavior of the interest rate determinant (IR) is interesting as only an insignificantly positive correlation between this determinant and total indebtedness has been shown. To reveal direct correlation between TI and its determinants without added effects, partial correlations are evaluated and summarized in Table 3. Significant partial correlations are marked as red and they are showing similar results. They confirm the significant negative impact of return on assets (ROA) and duration (D), and the medium and statistically insignificant impact of the interest rate (IR) and size (S). It is verified again that the share of fixed assets (SFA) has almost no effect on the total indebtedness in the construction industry. Explaining variable Dependent variable IT Explaining variable Partial correlation S F A 0,019 IR 0,072 R O A -0,184 D -0,205 S -0,115 Table 3 Partial correlations of the Total Indebtedness and selected determinants The results of the multiple linear regression analysis, summarized in the Table 3, correspond to the previous conclusion as well. The return on assets (ROA) and duration (D) have the significant negative impact, while the share of fixed assets (SFA) and interest rate (IR) do not significantly affect total indebtedness (TI). Nevertheless, the value of the determination index R2 =0,12463333 is showing that regression model is unsuitable for further predictions. It seems that some significant explaining determinant of indebtedness is missing. Uncovering this mystery will by goal of our upcoming research. b Std ofb t(194) p-value bO 0,839 0,077 10,825 0,000 S F A 0,031 0,116 0,267 0,790 IR 1,874 1,852 1,012 0,313 R O A -0,436 0,168 -2,603 0,010 D -0,011 0,004 -2,920 0,004 S -0,001 0,000 -1,610 0,109 Table 4 Regression analysis coefficients and their significance (black-significant and red-nonsignificant) 158 5 Discussion and Conclusions The following describing variables have been chosen for the construction industry in the Czech Republic: fixed asset share (SFA), interest rate (IR), return on assets (ROA), size of enterprise (S) and duration (D). Empirical research results have confirmed that the variable of fixed assets share (SFA) is not suitable as a determinant of total indebtedness (TI), as it is influenced by other determinants: return on assets (ROA), duration (D) and size (S). This determinant has been also classified as insignificant in the research. Furthermore, another insignificant determinant interest rate (IR) has been identified. The finding that the interest rate is only increasing marginally with increasing indebtedness has been novel and surprising. The most important theories about the capital structure [3] claim that as indebtedness increases, the so-called costs of financial distress begin to infiltrate companies, when creditors (most often banks) demand a higher interest rate for higher risk. The determinant of the size of the enterprise (S) has been classified as medium-significant with a negative effect on total indebtedness (TI). Correlation analysis identified this determinant as significant, while partial correlation and regression analysis showed medium significance. This is caused by significant correlation between duration (D) and size (S) itself. Return on assets ( R O A ) and duration (D) have been determined as the most important determinants of the total indebtedness of construction companies. Both have had a significant negative effect on the indebtedness regardless to analyzing method. The fact that the longer a company operates on the market, the less indebted it is, is not surprising. Long-term businesses are mostly capital stronger than the newly established companies and therefore, they already have enough capital to cover their assets. For a similar reason, the size of the enterprise (S) is the indebtedness determinant as well. A large enterprise has sufficient equity capital and does not necessarily need debt to finance its activities. These two determinants, duration (D) and size of the enterprise (S), are highly correlated with each other and it is not recommended to use them both in one regression analysis. It has been confirmed that highly profitable companies have less total indebtedness. This fact is interesting in view of the effect of the tax shield. Economic theories generally recommend the involvement of debt for higher-profited enterprises. The interest on debt is a tax-efficient expenditure and it can reduce the tax base of profitable enterprises. It has been proven that if a company can borrow at an interest rate below the return on assets ( R O A ) the involvement of this debt increases the return on equity (ROE), i.e. leverage has a positive effect. To sum up, large, long-term highly profitable companies for construction industry do not adopt external capital even if they could benefit from a tax shield. Acknowledgements This contribution was supported by the Student Grant Competition No. SGS_2021_012 of University of Pardubice in 2020. References [1] A R E S (2020). Ministerstvo financí ČR. Available from: http://www.info.mfcr.cz/ares/ares_es.html.cz. [2] Aulová, R. & Hlavsa, T. (2013). Capital Structure of Agricultural Businesses and its Determinants. Agris on-line Papers in Economics and Informatics, 5, 23-36. [3] Brealey, R. & Myers, S.(2014). Teorie a praxe firemních financí. Brno:BizBooks. [4] Chen, S. & Chen L . (2011). Capital structure determinants: A n empirical study in Taiwan. African Journal of Business Management, 5, 10974-10983. [5] C Z - N A C E (2020). Český statistický úřad. Available from: http://nace.cz. [6] Honková, I. (2016). Use of External Sources of Financing i n the Construction Industry. Scientific papers of the University of Pardubice, 36, 42-54. [7] Hrdý, M . (2011). Does the Debt Policy Theoretically and Practically Matter i n Concrete Firm? Český finanční a účetní časopis, 7,19-32. [8] Kislingerová, E . (2013). Sedm smrtelných hříchů podniků: úpadek a etika managementu. Praha: C . H . Beck. [9] Krivská, R. (2009). Determinants of capital structure and its optimization. Dizertační práce, 54-57. 159 [10] Marks, K . (2009). The handbook of financing growth: strategies, capital structure, and M & A transactions. Hoboken: John Wiley. [11] Ministerstvo príimyslu a obchodu. Stavebnictví 2019. Available from: mpo.cz. [12] Oztekin, O. (2015). Capital Structure Decisions around the World: Which Factors Are Reliably Important? Journal of Financial and Quantitative Analysis, 50, 301-323. [13] Prášilová, P. (2012). Determinanty kapitálové štruktúry českých podniku. Ekonómie a Management, 1, 89- 104. [14] Rňčková, P. (2015). Finanční analýza: metódy, ukazatele, využití v praxi. Praha: Grada Publishing. [15] Singh, D . (2016). A Panel Data Analysis of Capital Structure Determinants: A n Empirical Study of Nonfinancial Firms in Oman. International Journal of Economics and Financial Issues, 6, 1615-1656. [16] Viviani, J. (2008). Capital structure determinants: A n empirical study of French companies in the wine industry. International Journal of Wine Business Research, 20, 171-194. 160 Robust Slater's Condition in an Uncertain Environment Milan Hladík1 Abstract. Slater's condition is, no doubt, an important regularity condition used in nonlinear programming. It states that the feasible set must contain an interior point. We analyse this condition in an uncertain environment. We assume that uncertainty of the input data has the form of intervals covering the true values; we assume no other information about the uncertainty is known. Then Slater's condition holds robustly if it is satisfied for each possible realization of the interval values. In particular, we investigate interval systems of linear equations and inequalities. Therein, Slater's condition has the form of strong solvability with strict inequalities. We present a finite characterization of this property and inspect its computational complexity - in some cases it is polynomial, but in some cases it is NP-hard. As an illustration, we apply our results in interval linear programming in the problem of testing boundedness of the optimal solution set. Keywords: linear programming, interval analysis, interval system, robustness, NP- hardness J E L Classification: C44, C61 A M S Classification: 90C05, 65G40, 15A39 1 Introduction Slater's condition is a constraint qualification appearing in optimality conditions in convex optimization, among many other situations. Roughly speaking, Slater's condition requires an existence of an interior feasible point. In this paper, we are concerned with Slater's condition in case the input data are uncertain. This is a common situation in many practical problems, including optimization problems. In particular, we deal with the problems, where uncertainty is represented by intervals. That is, the only information we have are upper and lower bounds on the true values. No other information (such as probability distribution or fuzzy shape) is known. Interval data. Interval data are represented by interval vectors and matrices; we denote them by boldface. A n interval matrix is by definition the set of matrices A = {A e R m x " ; A < A < A}, where A, A e R m x " are given matrices and the inequality is meant entrywise. Interval vectors are defined analogously. The corresponding terms are the midpoint matrix Ac and the radius matrix A A of A defined respectively as Ac = \(A+A), A A = i ( A - A ) . For more on interval analysis, see, e.g., the books [3, 7, 11]. Given an interval system, it is called weakly solvable if it is solvable for at least one realization of interval coefficients, and it is called strongly solvable if it is solvable for every realization of interval data. For instance, an interval system of linear inequalities Ax < b is weakly (strongly) solvable if Ax < b is solvable for some (for every) A € A and b e b. Next, a strong solution to an interval system is a point that solves every realization of the system. Clearly, if an interval system possesses a strong solution, then it is strongly solvable. The converse implication does not hold in general. The goal. We investigate Slater's condition of interval linear systems. Even though this constraint qualification is more used in nonlinear programming, we begin our investigation with the more simple case of linear constraints. 1 Charles University, Department of Applied Mathematics, Malostranské nám. 25, 118 00, Prague & University of Economics, Department of Econometrics, nám. W. Churchilla4, 13067, Prague, Czech Republic, hladik@kam.mif.cuni.cz 161 We say that Slater's condition holds robustly if it holds for every realization of interval data. This in turn leads to strong solvability of interval systems with strict inequalities. In particular, we focus on strong solvability of interval system Ax < b and interval system Ax — b, x > 0. We present necessary and sufficient conditions for strong solvability and for existence of strong solutions, and we analyze the computational complexity of the problem in question, too. Notice that systems of strict inequalities occur also in other situations than in Slater's condition. For more on this issue in the real case see, e.g., [2, 13]. Notation. For vectors a, e R " we use a _ b to denote a < b, a + b. We use e = ( 1 , . . . , l ) r for the vector of ones (with convenient dimension) and diag(s) for the diagonal matrix with entries s\,..., sn. The absolute value and the inequalities are understood entrywise. 2 Characterization and computational complexity In this section we show that the interval Slater's condition is easy to verify for interval inequalities, but can be computationally hard in general for equality constrained problems. The following characterization is a modification of the result by Rohn [14] on strong solvability of a system without strict inequalities. Theorem 1. An interval system Ax — b, x > 0 is strongly solvable if and only if the system ( A c + diag(s)AA)x = bc - diag(s)bA, x>0 (1) is solvable for each s e { + l } m . Proof. The interval system Ax — b, x > 0 is strongly solvable if and only if for each A € A and b e b the system Ax — b, x > 0 has a solution. The system Ax — b, x > 0 is equivalent (w.r.t. solvability) to system Ax - by = 0, x > e, y > 1. B y Farkas' lemma, it is feasible if and only if the system AT u - v > 0, -bT u - w > 0, -eT v - w < 0, v, w > 0 is infeasible for each A € A and b e b. It equivalently reads (A \ -b)T u _ 0. Thus Ax — b, x > 0 is strongly solvable if and only if the system (A | -b)T u _ 0 is not weakly solvable. B y [9], weak solvability of (A | -b)T u _ 0 is equivalent to solvability of ^ + A | d i a g ( , ) \ \ ^ - ^ d i a g ( s ) j for some s e { + l } m . B y Farkas' lemma again, we obtain the statement. • The exponential number in the characterization is not easy to avoid since the problem is intractable. To show it, we first present an auxiliary result, which is worth of stating it explicitly. Proposition 2. Checking weak solvability of Ax S 0 is an NP-hard problem even with interval entries in one row of A only. Proof. B y [3], checking solvability of \Ax\ < e, eT \x\ > 1 (2) is NP-hard. We claim that it is equivalent to checking solvability of the system IA x | < ey, y > 0, eT \x\ > y (3) with at least one inequality satisfied strictly. Obviously, if (2) has solution x*, then x* and y* :- 1 solve (3) as required. Conversely, let x*, y* be a solution to (3). If y* > 0, then -p-x* solves (2). If y* = 0, then eT \x*\ > y* — 0, and so we can put y* :- eT \x*\ and reduce this case to the previous one. Now, by the Gerlach characterization of interval inequalities [3, 6], system (3) describes the solution set of the interval system Ax - ey < 0, -Ax - ey < 0, -y < 0, [-e, e]T x + y < 0, which has the desired form. • 162 Now, we show that it is intractable to check i f there is a Slater point in each realization of an interval system in the form Ax = b, x > 0. That is, even when the intervals are situated in the right-hand side only, the problem is hard. Theorem 3. Checking strong solvability of an interval system Ax = b, x > 0 is co-NP-hard. Proof. Similarly as in the proof of Theorem 1, interval system Ax = b, x > 0 is strongly solvable if and only if the system (A | -b)T u ^ 0 is not weakly solvable. However, checking weak solvability of this interval system is NP-hard by Proposition 2. • In contrast to strong solvability, deciding on existence of a strong solution is a simple problem. The fundamental drawback is that a strong solution for interval equations exists in rare situations. Indeed, as the following observation shows, it exists only if there are no interval coefficients, just real values! Corollary 1. A vector x is a strong solution to an interval system Ax = b, x > 0, i f and only if Acx = bc, x > 0, A A = 0, & A = 0. Proof. B y [3, Thm. 2.16], a vector x is a strong solution to an interval system Ax = b, x > 0, i f and only i f it satisfies Acx = bc, AA\x\ =b& = 0. Since x > 0, the condition AA\x\ = 0 reads A A X = 0, which holds if and only if A A = 0. • For an interval system of linear inequalities, Slater's condition is characterized by an adaptation of the results of Rohn and Křeslová [15]. Theorem 4. The interval system Ax < b is strongly solvable if and only if the system Ax1 - Ax2 < b, x1 > 0, x2 > 0 (4) is solvable in variables xl , x2 . Proof. The interval system Ax < b is not strongly solvable if and only if there are A € A and b e b such that Ax < b is unsolvable. B y Farkas' lemma, equivalently, the system AT u = 0, bT u<0, u ^ 0 is solvable. Thus we have that the interval system AT u = 0, bT u<0, u^0 is weakly solvable. B y the generalization of the Oettli-Prager and Gerlach theorems [9], the solution set is described by AT u > 0, -AT u > 0, bT u < 0, u £ 0. B y Farkas' lemma again, the system (4) is unsolvable. • Theorem 5. Suppose that the interval system Ax < b is strongly solvable, and define x* := xl - x2 , where xl , x2 solves (4). Then x* is a solution to Ax < b for every A € A and b e b. Proof. Let A € A and b e b be arbitrary. Then Ax* = A(xl - x2 ) = Ax1 - Ax2 < Ax1 - Ax2 < b< b. • Corollary 2. A n interval system Ax < b is strongly solvable i f and only i f it has a strong solution. Adapting the results from [3]. we can also state several equivalent characterizations of robust Slater's points. Corollary 3. For a vector x e l " , the following conditions are equivalent: 1. x is a strong solution to Ax < b, 2. Acx + AA\x\ < b, 3. x = x 1 - x 2 , A x 1 - Ax2 < b, x 1 , x 2 > 0. Proof. We already have the equivalence 1 o 3 by Theorems 4 and 5. Equivalence 1 o 2 follows from the fact that x is a strong solution to Ax < b i f and only i f max^eA Ax < b. Notice that the value Acx + AA\x\ is the entrywise maximum of Ax subject to A e A. • 163 3 Consequences on boundedness of realizations of interval LP problems Besides K K T optimality conditions, strict feasibility is important in many other issues as well. Herein, we show some consequences in checking boundedness of optimal solution set of an interval linear programming (LP) problem. B y an interval LP problem we mean a family of L P problems / ( A , b, c) = min cT x subject to x e M(A, b), (5) where M(A, b) is the feasible set, A e A , b e b, c e c, and A, b, c are given interval matrix and vectors. B y S(A, b, c) we denote the optimal solution set corresponding to the particular realization (A, b, c) e (A, b, c). The set of all possible optimal solutions is then S = (J S(A,b,c). (A,b,c)e(A,b,c) We usually write (5) shortly as min cT x subject to x e M(A, b), and we distinguish three canonical forms min cT x subject to Ax — b, x > 0, (A) min cT x subject to Ax < b, (B) min cT x subject to Ax < b, x > 0. (C) In the real case, one can consider any canonical form with no harm on generality because they can be equivalently transformed to each other. However, in the interval case, this is no more true [5]. That is why we have to consider the forms separately. Indeed, we will see later on that the computational complexity differs. Interval linear programming was surveyed in [3, 8]. The optimal solution set in particular was addressed in [1,4,10,12]. We say that an interval L P problem is realization bounded if S(A, b, c) is bounded for every realization (A, b, c) e (A, b, c). Notice that we consider an empty set as a bounded set. Obviously, if S is bounded, then S(A, b, c) is bounded for every realization. The converse is not true in general, as the following example shows. It remains an open question, however, whether the converse implication is valid provided both primal and dual problems are strongly feasible. Example 1. Consider the interval L P problem min x subject to [0, l ] x = 1, x > 0. Each realization taking a positive value a e [0,1] has a unique optimal solution x = 1/a. Taking the value a := 0 e [0,1] results in an infeasible L P problem. Thus in total we have S = [1, co), which is unbounded despite the fact that the problem is realization bounded. Recall the characterization of bounded optimal solution set of a real-valued L P problem from [16]. Theorem 6. Suppose that both primal and dual problems are feasible. The optimal solution set is bounded if and only if the dual problem contains a feasible solution satisfying the inequalities strictly. Infeasibility of the primal problem produces no optimal solution, so the boundedness is preserved. Hence we obtain a sufficient condition for realization boundedness of interval L P problems. Corollary 4. A n interval L P problem is realization bounded i f for every realization the dual problem contains a feasible solution satisfying the inequalities strictly. The assumption can be checked by the methods presented in Section 2, where strong feasibility of various interval systems was discussed. In the following, we show consequences for the particular forms of an interval L P problem. 164 Type (A). For this class of interval L P problems, the condition from Corollary 4 is easy to check. Corollary 5. A n interval L P problem of type (A) is realization bounded if the system AT yl -AT y2 0 (6) is feasible. Example 2. Consider the interval L P problem from Example 1 min x subject to [0, l]x = 1, x > 0. We already observed that it is realization bounded. Indeed Corollary 5 confirms this because system (6), which reads l y 1 - O y 2 < 1, y \ y 2 > 0, is feasible. On the other hand, consider a variation of the above problem (see [3]) min -x subject to [0, l ] x = 1, x > 0; This problem is also realization bounded. Nevertheless, we cannot verify it by Corollary 5 because system (6), which reads l y 1 - Oy2 < - 1 , y \ y 2 > 0, is infeasible. Type (B). Herein, the condition from Corollary 4 can be hard to check since strong solvability of AT y = b, y < 0 is intractable; see Theorem 3. Type (C). Similarly as for type (A), realization boundedness is polynomially decidable. The proof of the following statement is a simple adaptation of that for Corollary 5. Corollary 6. A n interval L P problem of type (C) is realization bounded if the linear system A r y < c, y < 0 is feasible. 4 Conclusion Robust Slater's condition in the context of interval linear systems basically means strict feasibility of every realization of the interval system. For an interval system of linear inequalities, the robust Slater's condition has a favourable characterization by means of linear inequalities, which makes the condition easy to check. Moreover, if the condition holds true, then there is a point which is the Slater's point for each realization of interval data. In contrast, for an interval system of equations with nonnegative variables, the problem of checking robust Slater's condition is intractable. We presented a finite characterization by a reduction to 2m linear systems, where m is the number of equations. Here, one could be interested in a computationally cheap sufficient condition. Another research direction can be an extension of the presented results to a general interval system of mixed equations and inequalities. Let us also remind the open problem under which conditions the realization boundedness implies boundedness of the optimal solution set S. Acknowledgements The author was supported by the Czech Science Foundation under project 19-02773S. 165 References [1] Allahdadi, M . , and Nehi, H . M . : The optimal solution set of the interval linear programming problems. Optim. Lett. 7 (2013), 1893-1911. [2] Fajardo, M . D., Goberna, M . A . , Rodriguez, M . M . L., and Vicente-Pérez, J.: Even Convexity and Optimization. Handling Strict Inequalities. E U R O A T O R . Springer, Cham, 2020. [3] Fiedler, M . , Nedoma, J., Ramík, J., Röhn, J., and Zimmermann, K . : Linear Optimization Problems with Inexact Data. Springer, New York, 2006. [4] Garajová, E., and Hladík, M . : On the optimal solution set in interval linear programming. Comput. Optim. Appl. 72 (2019), 269-292. [5] Garajová, E., Hladík, M . , and Rada, M . : Interval linear programming under transformations: optimal solutions and optimal value range. Cent. Eur. J. Oper. Res. 27 (2019), 601-614. [6] Gerlach, W.: Zur Lösung linearer Ungleichungssysteme bei Störung der rechten Seite und der Koeffizientenmatrix. Math. Operationsforsch. Stat., Ser. Optimization 12 (1981), 41^43. In German. [7] Hansen, E . R., and Walster, G . W.: Global Optimization Using Interval Analysis. 2nd edition. Marcel Dekker, New York, 2004. [8] Hladík, M . : Interval linear programming: A survey. In: Linear Programming-New Frontiers in Theory and Applications (Mann, Z . A . , ed.), chapter 2. Nova Science Publishers, New York, 2012, 85-120. [9] Hladík, M . : Weak and strong solvability of interval linear systems of equations and inequalities. Linear Algebra Appl. 438 (2013), 4 1 5 6 ^ 1 6 5 . [10] Hladík, M . : Two approaches to inner estimations of the optimal solution set in interval linear programming. In: Proceedings of the 2020 4th International Conference on Intelligent Systems, Metaheuristics & Swarm Intelligence (Deb, S., ed.), ISMSI 2020. Association for Computing Machinery, New York, U S A , 99-104. [11] Moore, R. E., Kearfott, R. B . , and Cloud, M . J.: Introduction to Interval Analysis. S I A M , Philadelphia, PA, 2009. [12] Rada, M . , Garajová, E., Horáček, J., and Hladík, M . : A new pruning test for parametric interval linear systems. In: Proceedings of the 15th International Symposium on Operational Research SOR'19, Bled, Slovenia, September 25-27, 2019 (Zadnik Stirn et al., L., ed.). BISTISK d.o.o., Ljubljana, Slovenia, 506-511. [13] Rodriguez, M . M . L., and Jose, V . - P : On finite linear systems containing strict inequalities. J. Optim. Theory Appl. 173(2017), 131-154. [14] Röhn, J.: Strong solvability of interval linear programming problems. Comput. 26 (1981), 79-82. [15] Röhn, J., and Křeslová, J.: Linear interval inequalities. Linear Multilinear Algebra 38 (1994), 79-82. [16] Roos, C , Terlaky, T , and Vial, J.-P: Interior Point Methods for Linear Optimization. 2nd edition. Springer, New York, 2006. 166 Sensitivity of small-scale beef cattle farm's profit under conditions of natural turnover Robert Hlavatý1 , Igor Krejčí2 Abstract. We continue our long-term research focused on beef herd management optimization from the perspective of a small-scale farmer in the Czech Republic. We have built a linear programming model of beef herd development spanning a ten-year period under the constraints of limited farm capacity with the main variables being the heifer acquisition and selection of heifers for either rearing or fattening process. In this paper, we use the aforementioned approach to determine the sensitivity of farmer's profit to the subsidies, input costs and selling prices. This time, we specifically focus on the natural turnover of the herd, meaning that no acquisition of cattle is possible. The sensitivity analysis shows that from the three factors influencing the profit, it is the change in input costs that plays a crucial role in the farmer's profit. Keywords: beef cattle farm, linear programming, optimisation, sensitivity J E L Classification: C61, Q12 AMS Classification: 90C05, 90C90 1 Introduction Small farms, the focus of our research, represent the most common form of business in E U Agriculture [22]. The small farms model is the oldest kind of agriculture business model that retains the key role in the Common Agricultural Policy [14], [5]. The small size of the farms is the first aspect of our research. The second aspect shows the crucial difficulties of agriculture in general - the dependency on biological processes, long delays between action and reaction and seasonality [2]. From this perspective the cattle farming is considered one of the most complex due to the natural delays embodied in the relevant biological processes, especially breeding and fattening periods [20]. Moreover, the primary producers of meat commonly face the problems of low profitability and weak market position [19], [18], [7] and the Czech Republic is not an exception [28]. Authors who deal with cattle modelling typically examine the feeding and insemination strategies [21], [17], [3]. Another trend is based on genomic selection strategies [25], [11]. These strategies clearly aim at strong leverage points, however, these leverage points are commonly beyond the reach of the average farmer. Many of these politics are appropriate for the national level [1]. Other strategies for strengthening the position of the farmers and lowering the risk focus on diversification (commonly agritourism) [26], [23] or some kind of direct distribution of the products to final consumers [8], [15]. Despite the above-mentioned strategies prove to be efficient in many cases, they also require new skills and a different mindset, which is not typical for common farming [30]. Consequently, such requirement could result in need of changes in agricultural and rural education on individual and community level [12], [29]. Our current research is focused on different aspects. The goal is to examine the common practice of the Czech farmers, use the typical timing and common prices and identify the leverage points under conditions of the average farmer. We are not showing the benefits of a specific action and rather focus on common decision-making of the small-scale beef cattle farmers and show the benefits of optimal choices. The beginning of the whole research was based on interviews with small farmers and focus groups [24], [16]. We stayed in contact with the farmers through the whole modelling process and implemented their opinions and needs in the optimization model that describe the development of the beef herd throughout ten years period. The model outlines and brief analysis of optimal decision making were first presented by [9] and later fully described in extended form by [10]. One of the core leverage points was the optimal acquisition of heifers. In this paper we focus on the natural turnover, i.e. the model situation does not allow the purchasing of heifers and the growth of the herd is dependent only on the biological processes. We test how the changes of prices and subsidies influence the optimal distribution (fattening vs rearing) of the heifers in the growth phase of the business (where no limit of the herd size is considered). 1 C Z U Prague, Department of Systems Engineering, Kamýcká 129, 165 00 Prague, hlavaty@pef.czu.cz 2 C Z U Prague, Department of Systems Engineering, Kamýcká 129, 165 00 Prague, krejcii@pef.czu.cz 167 2 Materials and methods This section is divided into two chapters. The first describes the nature of the problem and introduces the individual variables and the second presents the linear programming model. 2.1 Problem description and variables The beef herd consists of different categories that must be taken into consideration. The categories are represented by variables in Table 1. Variable Description Ct Set of all calves in generation i of age £ (0,7] months cf Set of all heifer calves in generation i of age £ (0,7] months c? Set of all bull calves in generation i of age £ (0,7] months FHEU Set of all fattening heifers in generation i of age £ (7,24] months FJSULi Set of all fattening bulls in generation i of age £ (7,24] months BHEli Set of all breeding heifers in generation i of age £ (7,26] months P_HEl_Aij Set of all pregnant heifers in generation i, j-th pregnancy < 5 months, 7 = 1 P_HEI_Btj Set of all pregnant heifers in generation i, j-th pregnancy > 5 months, 7 = 1 P_COWtj Set of all cows in generation i, j-th pregnancy C a+Di Set of all calves born from generation i from y-th pregnancy Table 1 Variables and beef herd categories The generation is understood here as a set of all calvers born in the same month and all older stages of its lifespan. The generations are indexed by i. The pregnancies of heifers (or cows, consequently) are indexed by j. The dimension of variable indices i and j is not specifically set as our modelling approach does not require enumerating it explicitly. Each generation starts at the moment a calf Q is born (or set of calves in fact, however, for the sake of simplicity, each category will be referred to as a single animal in the description hereby). The calf Ct can be a female Cf or male Cf. The male calf Cf later grows into the mature fattening bull F_BULi which is later sold (slaughtered). The female calf can either become a fattening heifer F_HEIt which is slaughtered at the age between 18-24 months or it can become a breeding heifer B_HEIi. The breeding heifer becomes pregnant and turns into P_HEI_Aij, which denotes the first stage of the pregnancy (< 5 months) and then it enters the second stage of the pregnancy P_HEI_Bij (> 5 months) called a heavily pregnant heifer, resulting in calf C,i+1y birth. The two stages of pregnancy must be distinguished in the model due to the different costs involved. After the first birth, a new pregnancy soon occurs after a service period and the heifer becomes a mature cow P_COWi(j+1-) that is pregnant for (j + l ) t h time. The pregnancy results in another birth of the calf C(i+i)(/+i) and this process is repeated until n-th pregnancy. When any calf Cri+1y, V i , j = 1,..., n is born, a new generation is started and the same cycle begins. 2.2 Linear optimization model Concerning the variables and relationships between categories described in the previous subchapter, we construct the linear optimization model. The problem is generally described with the following optimization model: maximise P s.t. HkE0,Vk (1) Hk c M + The objective of the optimization problem is maximizing the profit P. Hk denotes the set of all variables that occur in month A: and involves all variables described in Table 1. is the polyhedral set of all constraint imposed on the problem. Note that all variables can attain real non-negative values and we do not assume integrality constraints in our model. This is because our approach is rather average-based and works with the entire categories of cattle instead of modelling single animals. This relaxation still allows observing the development of the beef herd without making the problem hard in terms of computational complexity. Before the detailed linear optimization model is described, it is necessary to introduce several parameters and cost coefficients that enter the model, as we describe them in Table 2. 168 Parameter Description Default" CO Calves' mortality 5% Heifer and cow culling rate 15% SFAT Monthly cost per cattle for fattening3 46 E U R scow Monthly cost per suckler cow, including the calve3 70 E U R SHEI_A Monthly cost per heifer up to 5 months of gestation3 39 E U R SHEI_B Monthly cost per heavily pregnant heifer3 42 E U R rSUB Subsidies on beef calvesb 138 EUR/calf rcow Yearly subsidies on beef cowsb 7 E U R / L i v e unit yLAND Yearly average land subsidies'3 344 EUR/ha pFAT Fattening bull unit sale price0 999 E U R pCOW Fattening heifer/suckler cow unit sale price0 740 E U R hsc Hectares per suckler cow b o 1.5 ha Table 2 Parameters of the beef herd (Sources:3 [13],b [4], [27],0 [24] and d [6]) A l l prices are based on the data from 2016. The costs are already cleared of the labour costs as the small-scale farms depend mostly on their own family workforce. The objective function P maximises the profit over all beef herd categories using the coefficients from Table 2. - I K (2) rSUBCH + QjCOW _ 1 4 , sFAT^p_HEl. i ; = 1 V + ( 1 9 * 12 *k S C *r L A N ° ~ 1 9 * B J i E I i + (5 * h. *hsc *rLAND ~5 * s "ELA )p - H E i - A n + (7 * T2 * H S C * R L A N ° ~ 7 * S H E ' B + ^PCOW ) p - H E I - B n + (l2 * * hsc * (rLAND + r c o w ) - 12 * scow + xj}P C0W ^ P _ C G W y + 1 ) + ^ ( r S U B C f + (pFAT - 14 * sFAT )F_BULi) i The first double sum in the objective expresses the costs and subsidies for female calves, heifers and cows, the second sum is related to male calves and bulls. The parameters in Table 2 are expressed as monthly/yearly ratings and it is necessary to add various multipliers into the objective function to capture the real-time existence of each category in the herd. A detailed explanation is provided by [10]. The objective is maximised with respect to constraint set which is formed of the following equations and ine- qualities: (l-oj)Ci = Cli + Cf,Vi (3) C f = C f , Vi (4) Cf = FJiEk + BJiEk, Vi (5) F_HEIi > 0.5 Cf, Vi (6) B_HEIt < 0.5 C",Vi (7) B_HEIt = P_HEI_AtJ = P_HEI_Bij,Vi; j = 1 (8) P_COWiU+1) = (1 - xP)P_HEI_Bij, Vi;j = l (9) P_COWi(j+1) = (1 - iP)P_COWiU+2),Vi ;j = 1 ...n - 2 (10) C?=F_BULhVi (11) Equations 3 and 4 express the even distribution of new-born calves to heifer calves Cf and bull calves Cf with the given mortality rate a). Each heifer calf Cf later becomes either fattening heifer F_HEIt or breeding heifer B_HEIi as shown by Equation 5. Inequalities 6 and 7 show the distribution to fattening and breeding heifers which may be different for each generation, depending on the current profitability of that decision. Equation 8 only shows the progress in pregnancy of heifers. Equations 9 and 10 express the culling procedure which is done after a birth occurs in the case of heifer or mature cow. Equation 11 merely shows the ageing process of young bulls Cf, into mature ones F_BULi. 169 3 Results and discussion We run the optimisation model described in the previous chapter for different scenarios. We used the OpenSolver extension for M S Excel in order to solve the problem and each run took approximately 120 seconds. The baseline scenario under conditions specified in chapter 2 results in the 10-years profit equal to 123,681 E U R . The strict constraint on the heifers' acquisition does not leave the farmer with big decision space. The herd develops similarly without the upward shifts typical for the situation when the farmer purchases heavily pregnant heifers. Despite the model is linear, the change of the profit does not necessarily change in a proportional way to the parameter changes. This happens when the change of the parameter results in the change of heifers' distribution. Scenario 10-year Profit (EUR) Heifers for fattening Heifers for rearing Relative profit to baseline Baseline 123 681 64.92% 35.08% X Selling prices +5% 141 502 64.92% 35.08% 1.14 Selling prices +10% 159 323 64.92% 35.08% 1.29 Selling prices -5% 105 859 64.92% 35.08% 0.86 Selling prices -10% 88 038 64.92% 35.08% 0.71 Subsidies +5% 149 396 50.00% 50.00% 1.21 Subsidies +10% 170 441 50.00% 50.00% 1.38 Subsidies -5% 105 376 67.77% 32.23% 0.85 Subsidies-10% 90 118 72.33% 27.67% 0.73 Prices of inputs +5% 93 574 67.77% 32.23% 0.76 Prices of inputs +10% 66 098 72.33% 27.67% 0.53 Prices of inputs -5% 160 245 50.00% 50.00% 1.30 Prices of inputs -10% 192 139 50.00% 50.00% 1.55 Table 3 Scenarios comparison The maximum share of the heifers for rearing is 50%. This distribution starts at the increase of the subsidies by 1.75% and decrease of the input prices by 1.9%. It is also worth mentioning that the change of the selling prices isn't necessary caused by the prices of the meat but it could be also caused by petter nutrition, i.e. heavier cattle for slaughter. Table 3 compares the scenarios with the stress on a different distribution of heifers. Figure 1 compares the influence of the changes on the profit. Because the growth and decrease of the selected parameters have different interpretation (growth of subsidies and selling prices is good but the growth of inputs' prices is undesirable from the producers' point of view), the change is called better and worse showing whether the change represents the improvement or deterioration of the farmers' situation. It is clear from the Figure 1 that the 10-years profit under conditions of the pure natural turnover sensitive on the prices of inputs at most. The possibility of heifers' acquisition allows the farmer to increase the size of the herd faster. Therefore, if the purchase of the heifers is possible, the sensitivity on subsidies and selling prices grows especially for the increase of the subsidies or prices of the sold product. 10% worse 200000 150000 1% better 5% better 5% worse subsidies • input prices selling prices 1% worse 10% better Figure 1 Impact of changes on profit 170 Two scenarios (subsidies -10% and input prices + 10%) show the behaviour that is typical for situations with low sustainability. It is also the weakness of this modelling approach that must be clearly expressed. Because the model has the 10-years perspective, the optimal distribution of heifers under conditions of such terminated process leads to the preference of heifers for fattening for the last two years when the ratio of the input costs and output prices is not favourable (this increased number of sold cattle and decrease the number of cattle, which are connected with the costs). The real farmer that wants to stay on the market would achieve even lower profit because this strategy isn't possible in the real life. It is also important to stress that the definition of the maximal herd size has also a significant impact on the optimal decisions. The farm is the system, where everything is interconnected. In comparison when the farmer could purchase heavily pregnant heifers [10], the dependence only on the natural turnover in the growth phase is extremely risky. One must understand that such phase is typically connected with the investment into farms capacities (stables, storage, land) commonly accompanied by the loan repay. Even the baseline scenario shows that the heifers' distribution leads to slower capacity utilization, which in other words is postponing the maximal annual profit but the market situation (parameter values) leads to a preference of fattening to maximize the 10-years profit. 4 Conclusion We presented the modelling approach based on the monthly cuts, which was beneficial when we compared the model behaviour with the real farms' development. In this paper, we have focused on the sensitivity of decision making regarding the heifers sorting. For this analysis we focused on three categories of parameters - subsidies, selling prices and prices of inputs. A l l categories consist of more model parameters but for paper purposes, we change all parameters from one category in a similar way. The modelling situation describes the farmer that does not want or cannot purchase heifers during the growth phase of the business. Under these conditions, the most crucial is the change of the input prices. The sensitivity testing showed that the subsidies and the prices of the inputs are close to the situation when the farmer prefers the fastest possible growth of the herd. References [I] Amer, P.R. (2012). Turning science on robust cattle into improved genetic selection decisions. Animal, 6(4), 551-556. [2] Behzadi, G . , O'Sullivan, M.J., Olsen, T.L., & Zhang, A . (2018). Agribusiness supply chain risk management: A review of quantitative decision models. Omega, 79, 21—42. [3] Canozzi, M . E . A . , Marques, P.R., De Souza Teixeira, O., Pimentel, C . M . M . M . , Dill, M . D . , & Barcellos, J.O.J. (2019). Typology of beef production systems according to bioeconomic efficiency in the south of Brazil. Ciencia Rural, 49(10), 1-9. [4] Czech Beef Breeders Association (2019). Sazby podpor pro rok 2019 [online]. Available from: http://www.cschms.cz/index.php?page=novinka&id=2745 [Accessed 24 A u g 2020]. [5] Eurostat (2018). Small and large farms in the EU - statistics from the farm structure survey [online]. Available from: https://ec.europa.eu/eurostat/statistics-explained/index.php/Small_and_large_farms_in_the_EU__statistics_from_the_farm_structure_survey [Accessed 27 Feb 2020]. [6] Eurostat (2020). E C U / E U R exchange rates versus national currencies [online]. Available from: https://ec.europa.eu/eurostat/tgm/table.do?tab=tableí&init=lí&language=ení&pcode=tec00033í&plugin=l [Accessed 29 Mar 2020]. [7] Fousekis, P., Katrakilidis, C. & Trachanas, E. (2016). Vertical price transmission in the U S beef sector: Evidence from the nonlinear A R D L model. Economic Modelling, 52, 499-506. [8] Govindan, K . (2018). Sustainable consumption and production in the food supply chain: A conceptual framework. International Journal of Production Economics, 195, 419—431. [9] Hlavatý, R. & Krejčí, I. (2020). Optimizing production in small-scale beef cattle farm. In S.Kapounek & H.Vránová (Eds.), 38th International Conference on Mathematical Methods in Economics conference proceedings (pp. 166-172). [10] Hlavatý, R., Krejčí, I., Houška, M . Moulis, P., Rydval, J., Pitrova, J. Pilař, L . Horáková, T. & Tichá, I. (2021). Understanding the decision making in small-scale beef cattle herd management through a mathematical programming model. International Transactions in Operational Research, [in Press] [II] Hong, J., Mei, C , Raza, S.H.A., Khan, R., Cheng, G . , & Zan, L . (2020). SIRT5 inhibits bovine preadipocyte differentiation and lipid deposition by activating A M P K and repressing M A P K signal pathways. Genomics, 112(2), 1065-1076. 171 [12] Husák, J. & Hudečkova, H . (2017). Conditions for Development of Rural Community Education in the Czech Republic. Journal on Efficiency and Responsibility in Education and Science, 10(3), 64-70. [13] Institute of Agricultural Economics and Information (2018). Costs of agricultural products [online]. Available from: http://www.iaei.cz/costs-of-agricultural-products/ [Accessed 5 Oct 2018]. [14] Khalil, C.A., Conforti, P., Ergin, I. & Gennari, P. (2017). Defining Smallholders To Monitor Target 2.3. of the 2030 Agenda for Sustainable Development. Rome, No. ESS / 17-12. [15] Kim, S., Lee, S.K., Lee, D., Jeong, J. & Moon, J. (2019). The effect of agritourism experience on consumers' future food purchase patterns. Tourism Management, 70, 144-152. [16] Koláčková, G , Krejčí, I. & Tichá, I. (2017). Dynamics of the small farmers' behaviour - scenario simulations. Agricultural Economics (Czech Republic), 63(3), 103-120. [17] Liu, J., Tian, K , Sun, Y . , W u , Y . , Chen, J., Zhang, R., He, T. & Dong, G . (2020). Effects of the acid-base treatment of corn on rumen fermentation and microbiota, inflammatory response and growth performance in beef cattle fed high-concentrate diet. Animal, 14(9), 1876-1884. [18] Lopes, R.B., Canozzi, M . E . A . , Canellas, L.C., Gonzalez, F.A.L., Corréa, R.F., Pereira, P.R.R.X. & Barcellos, J.O.J. (2018). Bioeconomic simulation of compensatory growth in beef cattle production systems. Livestock Science, 216, 165-173. [19] Manevska-Tasevska, G , Hansson, H . & Rabinowicz, E. (2014). Input saving possibilities and practices contributing to more efficient beef production in Sweden. Agricultural and Food Science, 23(2), 118-134. [20] Mayberry, D., Ash, A . , Prestwidge, D., & Herrero, M . (2018). Closing yield gaps in smallholder goat production systems in Ethiopia and India. Livestock Science, 214, 238-244. [21] Mourits, M . C . M . , Huirne, R . B . M . , Dijkhuizen, A . A . , Kristensen, A.R. & Galligan, D.T. (1999). Economic optimization of dairy heifer management decisions. Agricultural Systems, 61(1), 17-31. [22] Neuenfeldt, S., Gocht, A . , Heckelei, T. & Ciaian, P. (2019). Explaining farm structural change in the European agriculture: a novel analytical framework. European Review of Agricultural Economics, 46(5), 713— 768. [23] Pitrova, J., Krejčí, I., Pilar, L., Moulis, P., Rydval, J., Hlavatý, R., Horáková, T. & Tichá, I. (2020). The economic impact of diversification into agritourism. International Food and Agribusiness Management Review, 23(5), 1-22. [24] Poláková, J., Moulis, P., Koláčková, G . & Tichá, I. (2016). Determinants of the Business Model Change A Case Study of a Farm Applying Diversification Strategy. Procedia - Social and Behavioral Sciences, 220,338-345. [25] Raza, S.H.A., Khan, S., Amjadi, M . , Abdelnour, S.A., Ohran, H . , Alanazi, K . M . , A b d El-Hack, M . E . , Taha, A . E . , Khan, R., Gong, C , Schreurs, N . M . , Zhao, C , Wei, D., & Zan, L . (2020). Genome-wide association studies reveal novel loci associated with carcass and body measures in beef cattle. Archives of Biochemistry and Biophysics, 694, 108543. [26] Schilling, B.J., Attavanich, W . & Jin, Y . (2014). Does Agritourism Enhance Farm Profitability? Journal of Agricultural and Resource Economics, 39(1), 69-87. [27] Syrůček, J., Krpálková, L . , Kvapilík, J., and Vacek, M . , 2017. Kalkulace Ekonomických Ukazatelů v Chovu Skotu. Prague: Institute of Animal Science. [28] Syrůček, J., Kvapilík, J., Bartoň, L . , Vacek, M . & Stádník, L . (2017). Economic efficiency of suckler cow herds in the Czech Republic. Agricultural Economics (Zemědělská ekonomika), 63(1), 34-43. [29] Tomšíková, K , Hudečkova, H . & Tomšík, K . (2019). Enhancing Attractiveness of Secondary Agricultural Education in The Czech Republic. Journal on Efficiency and Responsibility in Education and Science, 12(4), 135-145. [30] Wesselink, R., Blok, V . , van Leur, S., Lans, T. & Dentoni, D . (2015). Individual competencies for managers engaged in corporate sustainable management practices. Journal of Cleaner Production, 106, 497-506. 172 The Use of Genetic Algorithm in Clustering of ARMA Time Series Vladimír H o l ý 1 , Ondřej S o k o l 2 Abstract. Time series clustering is a well-covered topic in the data mining literature. In this paper, we assume that each of a large number of time series follows one of several autoregressive-moving-average ( A R M A ) models. We propose to jointly assign time series to clusters and estimate the A R M A coefficients in each cluster by a genetic algorithm. We also simultaneously determine the number of clusters by minimizing the Akaike information criterion (AIC). We illustrate our approach in an application to weekly product sales of a retail drugstore and focus on the specification of a genetic algorithm. First, we investigate the suitability of a k-means solution based on a distance between the A R M A coefficients as an initial solution. Second, we study the influence of the genetic algorithm parameters such as the number of generations, the size of the population, the probability of mutation, and the ratio of elite individuals. Keywords: Time Series Clustering, A R M A Model, Genetic Algorithm, Retail Ana­ lytics J E L Classification: C32, C38, C63 AMS Classification: 62M10, 62H30 1 Introduction Xiong and Yeung [13,14] proposed a method for clustering of time series using a mixture of autoregressive moving average ( A R M A ) models. This approach belongs to the class of model-based time series clustering. The task is to assign each observed time series to one of several clusters with common A R M A dynamics and estimate A R M A parameters in each cluster. For this purpose, [13, 14] utilized an expectation-maximization ( E M ) algorithm. A drawback of this iterative method is that it finds only a local maximum of the likelihood function. Although [14] addressed this issue by using a stochastic variant of the standard E M algorithm, they still operated within the E M framework. We take another direction and adopt a genetic algorithm to tackle the optimization problem. A genetic algorithm is a nature-inspired metaheuristic that is able to escape local extrema. This approach can also simultaneously determine the number of clusters by maximizing the Akaike information criterion (AIC). In this paper, we study the impact of an initial solution and control parameters of a genetic algorithm. For other applications of genetic algorithms in clustering problems, see [3, 5, 7, 9]. For a literature review of nature-inspired metaheuristics in clustering, see [4, 6, 8]. For a literature review of time series clustering, see [L 12]. 2 Model Similarly to [13, 14], we assume that there are N time series of length T denoted as = (yi.i,..., y;,r), i - 1,..., N. Our setting can be simply extended to allow for different lengths of time series; however, we focus on the case with common T to ease the notation. Each time series i - 1,..., N belong to one of K clusters; we denote this assignment as /c; e { 1 , . . . , K}. Each time series i = 1,..., N also has the A R M A ( P , g ) structure P Q y>i,t = ojKi + ^ ipKijyitt-j + eKijeitt-j + eitt, e{J ~ N(0, c r | ) , t = 1 , . . . , T. (1) 7=1 7=1 Each cluster k therefore has its own parameters = 10 15 20 25 30 35 40 45 50 55 60 65 Number of Clusters Figure 1 A I C of the solution obtained by the k-means algorithm with various number of clusters. categories sales have similar dynamics - and should be stored in the same supply center. In other words, we aim to cluster product categories' time series of sales by their dynamics. We start with the k-means solution. Figure 1 shows the A I C of the solution for numbers of clusters K from 1 to 65. Note that the k-means method finds a solution according to the within-cluster sum of squares and not log-likelihood or A I C (which consequently causes a bumpy shape of the curve). The best solution is obtained for A' = 15 clusters with A I C 8427. Next, we investigate the performance of the proposed genetic algorithm. We set the number of generations to 1000, the size of the population to 1000, the probability of mutation to 0.01, and the ratio of elite individuals to 0.10. Figures 2 and 3 then show the achieved A I C for some changes in these parameter values. A s genetic algorithms are random in nature, we repeat each computation 300 times and report mean performance. Note that to be able to compare computational performance for different population sizes in the first plots of figures 2 and 3, we show standardized generations - each containing exactly 1000 candidate solutions. For the population size of 1000, standardized generation is the same as regular generation. For the population size of e.g. 200, standardized generation refers to every fifth regular generation. We assess the usability of the k-means solution as an initial solution for the proposed genetic algorithm. Figures 2 and 3 show that the algorithm with random initialization starts at much worse candidate solutions and requires a lot of generations to get closer to the algorithm starting with the k-means solution. A suitable initial solution can therefore significantly speed up computations. Finally, we assess the influence of the control parameters. The first plots of figures 2 and 3 show development of A I C for several population sizes. We can see that higher population sizes require more standardized generations to achieve better results. A s a rule of thumb, the number of generations should be higher than the population size. If this holds, however, the algorithm is not very sensitive to the ratio between the number of generations and the population size. The second plots of figures 2 and 3 show development of A I C for several mutation probabilities. Clearly, when the mutation probability is zero, the algorithm gets quickly stuck in a local optimum. When the mutation probability is too high, on the other hand, the algorithm converges much slower. In our case, the value of 0.005 offers the best performance. The third plots of figures 2 and 3 show development of A I C for several elites ratios. We omit the case with no elite individuals as it would not guarantee a non-increasing curve of A I C and would require a very different scale in Figure 3. Nevertheless, we can see that lower numbers of elite individuals lead to a slower convergence rate. The value of 10 percent is then sufficient for good performance. The best average A I C over the 300 repetitions is 8387 obtained for the size of the population 1000, the probability of mutation 0.005, and the ratio of elite individuals 0.10. The best solution overall has A I C 8380 and K — 10 clusters. This solution was found in several control parameter settings, including even the random initialization. This result emphasizes the stochastic nature of the algorithm and encourages to repeat computations several times to determine the stability of the solution. 175 Genetic Algorithm / Random Start - Population Size 8750 CD 8650 Ö ^ 8550 =S 8 4 5 0 < 8350 11 3\ 3[ 1 1 C 250 500 750 1000 Standardized Generation Genetic Algorithm / Random Start - Mutation Probability 8750 o JD 8650 Ö c o | j 8550 o t 8 4 5 0 < 8350 250 500 Generation 750 1000 Genetic Algorithm / Random Start - Elites Ratio 8750 lthe limit is taken along the real line from the left to 1. The general power series may have another radius of convergence, say 0 < R < +oo and the center of the circle of its convergence need not be the origin, but simple linear transformation allows us to consider the center at the origin and the radius equal to 1. However, we will see that another easier result on power series, Proposition 1 below, is in fact sufficient in the proofs. Proposition 1 (weak Abel's theorem). If a power series f(z) = _ ] n = o a n z n has non-negative coefficients and converges for any x £ [0;1), then 180 n=0 (4) no matter whether the limit and the sum are finite or infinite. Proof. For arbitrary N £ M and x £ [0;1) we have an = l i m y ccnxn < l i m > ccnxn < > an. n=0 x - > l - Z _ i n = 0 x - > l - Z _ i n = 0 ^—in=0 The first equality is by continuity of the polynomial (partial sum in the power series), all limits and infinite summations are defined (with possible value +00) by monotonicity. The two inequalities follow from the non-negativity of the coefficients an. We send N -> 00 and obtain (4). This is another show of the strength of the non-negativity which implies monotonicity and hence the existence of limits and infinite sums is guaranteed. 2 Wandering in the graphs G2 and G3 In this section we give the precise formulations and proofs of the well-known Polya's theorems on random walk. In the proofs we will see the precise role of power series (generating functions) and the fact that only weak Abel's theorem is employed rather than the general version of this theorem. We restrict ourselves on the cases d = 2 and d = 3 respectively for the step from the dimension 2 to the dimension 3 is the step where the quality changes. 2.1 Wandering in G2 Theorem 2 (wandering in G2). Take the origin o = (0; 0) as the initial vertex in G2 = ( Z 2 ; E). Then ln(p, G2) ln ln lim ————— = l i m — = l i m — = 1. (5) n^co dn(0, G2) n^co dn n^co 4 N In other words: random walk in G2 returns to the start point with the probability 1. Proof. The symbols in (5) have the meaning from (1). A s it was shown in the subsection 1.1 the fractions — are the partial sums of the series (3), which is now of the form • It means that we have to prove that We will consider these generating functions B(x) = 2n=o^*n and C(x) = I X o ^ * " (7) If we take the series (7) as formal power series it can be seen by a splitting a walk counted by bn in its A: + 1 returns to o into k segments of lengths n1, n2,..., nk; nt + n2 + — h nk = n, counted by c n i , c„2 ,..., c„f c that there is the relation B ( x ) = ^ = 0 C ( x ) f e = I - ^ . (8) However, both series in (7) have the radius of convergence at least 1 and hence the generating functions in (7) are the real functions defined certainly for x £ [0;1). To show that (6) holds it suffices to prove with respect to (8) that lim B(x) = +00. (9) To prove (9) we use the Proposition 1, i.e. we prove that 2y-=o~j= + 0 0 - We do it by computing bn. It is obvious that bn = 0 for odd n. For even lengths _ „ (2 n)i _ (2n\ y n fn\2 _ (2n\2 »2n - 2.7=0; ! ; ! ( n _ ; ) ! ( n _ ; ) ! - \ n ) 2.y=0 {j) - { „ ) • 181 The first equality follows by considering all positions of j steps of the walk to the right, which force the same number j of steps to the left and the same number n—j steps up and down. The possibilities are counted by the multinomial coefficient . n ^ n _ The last equality follows from the identity Yj=o (j ) = (^) • (2n\ 4 n The Stirling's asymptotic formula yields the asymptotics y J ~K-j= for n -> oo and a constant K. Hence we obtain % r ~K2 - which implies -L = \ (z n ) 4 " 2 n = + oo since the harmonic series is divergent. The claim (6) is now the corollary of the Proposition 1: lim x-»l- ,=„47 = 1 2.2 Wandering in Theorem 3 (wandering in G3). We sfar? wandering at the origin o = (0,0,0) in the graph G3 = (J?; E). Then L ( o , Go) L L n ^ c o a n ( 0 , l73 J n - t c o a n n ^ c o b" /« other words: random walk in G3 returns to the start point with the probability less than 1 and it disappears in infinity without return with a positive probability. Proof. The symbols in (10) have again the meaning from (1). We consider similarly as in the proof of Theorem 1. The form of generating function; holds it is sufficient to show that b c The form of generating functions is now B(x) = 2n=o^T*n a n d COO = 2n=o^T*n - Since the relation (8) 7 7 < + o o . (11) ;'=0 OJ The equality in (11) follows again from Proposition 1. We have clearly bn = 0 for odd n. For even members we find an upper bound: b M = irnyiyiklkl0l_;_"iU-y-«rO4 ~"^(3 ~"0.k.n-j-kj) * (12) ^ ( » ) * - " , a 3 - " G , ; z ) = ( 2 „ " ) ^ - " U . ; 0 , j . x+y+z=n i (m, m, m), n = 3 m (m + 1, m, m), n = 3 m + 1 , m 6 M 0 . In the first expression in (12) we counted as in (m + l , m + l,77i), n = 3 m + 2 the proof of Theorem 1, j is the number of steps in the walk to the right, k is the number of steps up, and n — j — k the number of steps back. The second expression is an algebraic rearrangement. In the third expression we used the fact that if at, a2,..., ocp are non-negative real numbers satisfying £ f = 1 oct = 1, then 2f=i a f ^ max cct. In our case we have at = 3 n (/ n — feja n c ^ 3 n = (1 + 1 + l ) n = S;'+fe (p — 1)! (q + 1)! for p > q + 2. B y the Stirling's formula for the factorial we have estimates with constants K, L 182 r n ) < K — , ( ) < i — • (13) Using the inequalities in (13) we obtain the final estimate for the even members of the series in (11) bin 1 1 -1 -§;n*n > Yo = 0, Yn ^ 1- with n o n _ negative coefficients satisfying the relation (8) as the formal power series, convergent for any x 6 [0;1). Further suppose that B2n > c / n f ° r a n Yn e N a n d a positive constant c. It implies that B(l):= £n=o Pn = + 0 0 - B y Proposition 1 l i m B(x) = +00 and thus l i m C ( x ) = 1 by (8). Hence C ( l ) : = £n=oKn = 1 again by Proposition 1. We have also for any x 6 (0; 1) the inequality Z + co ^—i+co 1 B n x n > c ) - x 2 n = c l n r . n = r / ^ n = o n l - * 2 "1 / 1 Finally for any N 6 M and x 6 ( V n ; 1), with respect to the assumptions y n < 1 and C(x) = 1 — we get ^—iW ^—i+co ^—,+oo 1 + 1 K n > > K n ^ n = > K n * " - / K „ X n > 1 - — — - - > n=0 'n=0 ^—'n=0 ^—'n=N+l o ( X j 1 — AC 1 X W + 1 1 x N + 1 (14) + C l n ( l - x 2 ) ~ l ^ > _ 1 c l n 2 _ l ^ ' 1 - x 3.2 The lower estimate of the speed of the convergence for the wandering in Gi We use the estimate (14) to obtain a lower bound for the sequence of fractions in Theorem 2 (wandering in G2). Proposition 2. For any N 6 M, N > 2, 1 > ^ > 1 - * " \ . (15) 4W 0.21nW-0.16 expJV /9 Proof. The symbols Zn , fcn, c n are defined in the subsection 1.1. We set yn = , Bn = ^ in the previous subsection. A l l the considerations in the previous section work and the function B and C are exactly those in (7). We deduce the required lower bound on Bn from the estimates —) exp < n! < V27rn (—I exp e> ^ 1 2 n + l W ^ 1 2 n which is due to H . Robins [9] and hold for any n 6 M Hence we obtain the estimate 183 a _ 4 - 2 „ C 2 n ) ! 2 ^ P 2 M ± T > 1 1 7rn exp ^ 7r exp ^ We round the constant — ^ and get / ? 2 n > If we set x = x(N) = 1 - N~ £ [0;1), N EN, then for JV > 2 re exp- n we obtain* £ (-; 1), I n — = - I n JV and with c = 0.228 we get c l n — = - c l n JV > 0.2 In JV and c l n 2 < 0.16. V2 / 1-x 9 b 1-x 9 XN+1 N B /9 For the last fraction in (14) we obtain the inequality < n-. In this way the required inequality (15) follows 1 ~x expN 19 " from the inequality (14). 3.3 A wanderer on Manhattan How long do we have to wait on Times Square for an aimless wanderer (starting at the square) to achieve the return probability more than 90%? The net of streets and avenues is not exactly square but rectangular grid. However, we replace it by the square grid with side length 200 meters. We assume that the wanderer can walk this distance in 2 minutes. Proposition 3. Under the assumption above, the aimless wanderer returns to Times Square with probability more than 90% after _ rsii < 5.363 * 1 0 1 6 years. 720 y Proof. W e set N = \ e s i \ Then the first subtracted term in (15) is less than (0.2 * 51 - 0.16)"1 < 0.0997 and the second one is quite negligible, less than e~9 0 . Thus ^ > 0.9. References [1] Abel, N . H . (1826): Untersuchungen über die Reihe: 1 + mx + m ^™2 ^ x2 + m ( - m 2 ) x3 + ... j o u r n a i für die reine und angewandte Mathematik 1, 311-339. [2] Bilingsley, P. (1995). Probability and measure. New York: John Wiley & Sons Inc. [3] Dhrymes, P.J. (1980): Distributed Lags. Problems of Estimation and Formulation. North Holland, Amster- dam. [4] Feller, W . (1968): An Introduction to Probability Theory and Its Applications, Volume I, John Wiley & Sons, New York, 3rd edition. [5] Lange, K . (2015): Polya's random walks theorem revisited, Amer. Math. Monthly 122, 1005-1007. [6] Levin, D . A . and Peres, Y . (2010): Pölya's theorem on random walks via Pölya's urn, Amer. Math. Monthly 777,220-231. [7] Novak J. (2014): Pölya's random walk theorem, Amer. Math. Monthly 121, 711-716. [8] Pölya, G. (1921): Über eine Aufgabe der Wahrscheinlichkeitsrechnung betre_end die Irrfahrt im Strassennetz, Math. Ann. 84 (1921), 149-160. [9] Robins, H . (1955): A remark on Stirling's formula, Amer. Math. Monthly 6, 26-29. [10] Rogers, K . (2017): Transient and reccurent random walks on integer lattices. 10 pages 184 Numerical Valuation of the Investment Project with Expansion Options Based on the PDE Approach Jiří Hozman1 , Tomáš Tichý2 Abstract. Compared to the standard D C F methodology, the real options approach provides a solution to optimal investment decisions that captures the value of flexibilities embedded in a project. In this paper we focus on one specific kind of investment decisions — an option to expand. Assuming values of both the project and the embedded option are determined in terms of time and underlying output price, driven by a relevant stochastic process, one can unify the P D E approach to describe the development of values of the project and options. More precisely, the link is realized through a payoff function enforced at a fixed time. A s a result, we obtain a system of relevant governing equations of the Black-Scholes type. Since explicit formulae are known for this type of P D E problem only in specific cases, one must turn to some approximation methods. With reference to the results obtained in valuing financial options, we apply the discontinuous Galerkin method to solve the relevant governing equations. The obtained numerical scheme is applied to a simple illustrative expansion decision problem. Keywords: real options valuation, project value, option to expand, Black-Scholes equation; discontinuous Galerkin method, numerical solution J E L Classification: C44, G13 A M S Classification: 65M60, 35Q91, 91G60 1 Introduction Capital budgeting is an essential discipline of corporate finance and basically can be viewed as the planning process that aims to increase the value of the firm/project by proper investment decisions within a long time horizon. Hence, valuing the profitability of investment projects plays a particularly important role here. More than four decades ago, the cornerstone of modern investment theory was built, linking valuation of corporate investment opportunities as pricing of financial options on real assets, see the pioneering paper by Myers [12]. Due to the analogy with an option on financial asset, the methodology has become known as real options approach that interprets the flexibility value as the option premium. Since then, a large number of various solution techniques have been developed [11], from a simulation approach, over dynamic programming to contingent claims analysis, which compares the change in option/project values with the change in the value of a suitably constructed portfolio of trading assets within the relevant partial differential equation (PDE), see [2]. In this short contribution we extend our previous results on the topic of numerical pricing of financial option contracts using discontinuous Galerkin (DG) method, see [7], [8] and [9], to valuation of implicit flexibility in the investment project. We proceed as follows - in Section 2 the relevant P D E model is formulated, while in Section 3 a numerical valuation scheme is presented. Finally, in Section 4 a simple numerical experiment related to reference data is provided. 2 PDE Model for Valuation of the Embedded Option In this paper we concentrate on valuing the flexibility of an investment project, i.e. the real option value embedded in that project. At first it is necessary to describe the value of the project itself and then we are able to find the value of its flexibility by solving the relevant PDEs that link both option and project values, see inspiring ideas in [10]. Let us consider that fluctuations in project values (and also in real option prices) are tracked back to uncertainty via the underlying output price P that evolves in time t according to the following stochastic differential equation (proposed in [3]): dP(t) = (r - 8)P(t)dt + o-P(t)dW(t), P(0) > 0, (1) 1 Technical University of Liberec, Studentská 2, 461 17 Liberec, Czech Republic, jiri.hozman@tul.cz 2 Department of Finance, VSB-TU Ostrava, Sokolská 33, 701 21, Ostrava, Czech Republic, tomas.tichy@vsb.cz 185 where r > 0 is the risk-free interest rate, 6 > 0 is the mean convenience yield on holding one unit of the output, W(t) is a standard Brownian motion and cr > 0 is the volatility of the output price. Let Vo(P, t) and V\(P, t), for current price P and time t, denote the value of the project having no options and with the embedded option allowing the particular action (e.g., expansion), respectively. We also assume that the possibility of a single decision-making is related to a prespecified time T only, i.e., a real option is held in isolation by a single firm and it is of a European-style. Let T* > T be the maximal life-time of both projects and (fo(P, i) and 0 that V\(P, t) > Vo(P,t), i f t e [0, T); Vi(P,T*) =V0(P,T*) = 0 and l > W •CBSW) with the terminal states Vi(P, T*) = 0, P > 0, i = 0,1. Further, for all time instants t e [0, T), the project value V\ (P, t) is equal to the project value Vo(P, t) increased by the true value of the flexibility of the given investment opportunity, see [14]. In view of this fact, it is possible to track values of both projects and the embedded option value simultaneously within one timeline on [0, T), that are linked through the function n(Vo, V\) enforced at the expiry date T. More precisely, let F(P, t) denote the value of the flexibility at the current price P and actual time t e [0, t), then the value function F satisfies the following P D E (see [4]): dF IT +£BS(F) = 0, 0 < P < +co, 0 < t < T, wiihF(P,T) = n(Vo(P,T),VL(P,T)), (3) ot where X B S is the second order linear differential operator of the Black-Scholes (BS) type defined in (2) and n plays the role of a payoff function, the specific form of which depends on the type of flexibility provided by the particular real option, see Section 4 for the case of an option to expand. In summary, determining the value of flexibility of an investment project at the present time t — 0 consists of two consecutive problems. First, we solve a pair of PDEs (2) with homogeneous terminal conditions to obtain the project values at t — T that we use in the construction of the terminal value of the embedded flexibility at t — T. Consequently, we solve the problem (3) to obtain the present value of flexibility. 3 Numerical Valuation Since analytical formulae for the B S type PDEs are available only either in the simplest cases or under very strong limitations, an application of modern numerical methods takes a crucial part for real options valuation. In our study, we employ the D G method, successfully used also in the field of financial options pricing (see, e.g., [8] and [9]), to improve the numerical valuation process. We proceed as follows. A t first, we localize the governing equations to a bounded spatial domain and discuss the choice of suitable boundary conditions. Next, we apply the standard discretization steps and present the resulting numerical scheme. 3.1 Localization and boundary conditions The proposed valuation methodology, related to numerical solving of terminal-value problems (2) and (3), requires their localization to a bounded interval Q. - (0, P m a x ) , where PMAX » 0 is the maximal sufficient value of the underlying output price. Formally, we restrict the governing equations and the relevant terminal conditions to the bounded domain Q.. Therefore, we have to impose the project values at both endpoints P — 0 and P = PMAX. A s the prescribed values the estimations based on the net present value approach are performed, see [10]. Since the cash flows of the projects are ?i > • • • > ? R = T > tR+i > • • • > ? M = 0 with the time step T = T*/M. Denote e S f , i = 0 , 1 , the approximation of the solutions u9\t), i = 0,1, from (6) at time level tm e [T,T*], m = 0 , . . . , R. Similarly, we define the D G approximate solution of problem (7) as functions * Wh(tm), tm e [0,T], m = R,...,M. Let u^0 = w^j, = 0 be the initial states, then wff * Wh(0) is computed in the following three steps (note that we use the backward time running, i.e., ? m + i —tm = - T ) : {U h!m+VVh ) ~T ^h (U h!m+VVh ) = (U h!m>Vh ) ~ ^ (V h)(tm+l) + T (lfi(tm+l),Vh) (8) V v f t e ^ , m = 0 , 1 , 1 , 1 = 0,1, K - ^ H ' M i W Vvh^Sl (9) ( w r ' - V f t ) - Tah(w™+ \ vf c ) = K \ vf c ) - i A ( v f c ) ( W i ) Vvfc e S£, m = / ? , . . . , M - 1, (10) where the starting data (at expiration date) are given as SP H -approximation of the payoff function JT depending on states u^R, i = 0,1, see (9). Finally, note that the equations (8) and (10) result into a sequence of systems of linear algebraic equations with sparse matrices that uniquely determine the relevant solutions on the corresponding time levels, see [8]. 187 4 Numerical Experiment: Option to Expand In this section, to briefly illustrate capabilities of the numerical scheme introduced above, we present numerical experiments on valuing an option to expand an investment project in the mining industry. The proposed valuation procedure is implemented in the solver Freefem++, incorporating G M R E S as a solver for non-symmetric sparse systems, for more details, see [6]. As in [10] we consider iron ore mine, value of which is given by the project value V$(P, t). Concurrently, we consider the mining company adopting the embedded option F(P, t) for investment in expansion in the mining project V\(P,t) for / e [0,7"). This expansion option is exercisable at T and requires the amount 7C > 0 for expansion. Thus the value of F(P, T) is positive when V\ (P, T) > Vo(P, T) +'K and otherwise has zero value. In line with this the payoff function corresponds to the European vanilla call option F = V\ — VQ with strike *7C and it is defined as follows n(V0(P,T),V1(P,T))=max(V1(P,T)-Vo(P,T)- » T and assume that production rates satisfy qo(t), i f f e [ 0 , r ) , qi(t) = \ K-qo(t), i f t e [T,T{), (13) 0, i f t e [T*,T*], where the factor K > 1 represents the expansion rate. Then the after-tax cash flow (related to the relevant project) is given according to [5] by th regression parameter (it is a set of numbers) and si is the random error term. The hedonic price model, which is described in section 3, is then estimated using the G W R model. This allows analyzing the variability of common price determinants over the space. B y the definition of G W R , each individual spatial unit i obtains its own estimate of the model coefficients, and therefore the G W R models the spatial heterogeneity of the data simultaneously [4]. In order to obtain the proper neighborhood structure for the local regressions, the Gaussian distance-decaying kernel is used. The Gaussian Kernel function can be expressed as [4]: K(z) = exp(-z2 /2), (2) where z is constructed as: z = | (ui - Vi)/h, for m - Vi <= h 0, for ui - vi > h, where h denotes the selected bandwidth size, which was selected by the L O O C V criterion (Leave-one-out-crossvalidation) [4]. Once the proper bandwidth size for each spatial unit (thus we are using adaptive bandwidth size) is obtained and the Gaussian Kernel is used, it yields the final W matrix. This W matrix is a matrix that captures the proper observation points to be used in the kernel function and gives them proper weight by pasting its distance to the reference points into the kernel function. Therefore, the G W R estimation is equivalent to the weighted regression estimation, which can be, in the matrix form, represented as follows [7]: Pi(uu vt) = [X'W(Ui, Vi)X]-l X'W(ui, Vi)y, (3) where X is a design matrix of the regression and the y is a vector of the dependent variable. For more technical details of the estimation techniques and details regarding the estimation of the covariance matrix and associated models diagnostics refer to [4] and [7]. A unique aspect of the G W R model is the fact that the model yields different coefficient parameter values for each observation e.g. the row of the design matrix of the regression X and thus the spatial units of more equivalent coefficients can be clustered together. This essentially summarises the concept of the housing submarkets, which are more deeply discussed in the following section. 2.2 The Housing Submarkets In our modeling frameworks, we define the housing submarkets as the cluster within which the real estates dispose of a considerably higher level of homogeneity. Many recent studies seem to agree that the housing submarkets can be identified by clustering the coefficients of the G W R model [4]. A s far as the clustering of the coefficients goes, the -means algorithm [2] is commonly used. Before that, however, as we are working with quite a large number of housing features, some form of dimensionality reduction is required. This reduction is performed by the famous Principal Component Analysis (PCA) estimated via the Singular Value Decomposition (SVD), see [2]. The complete analytical framework of constructing the housing submarkets is summarized and illustrated in Figure 1. Firstly, the design matrix of the regression X is constructed (Including the selection of proper model form as well as dummy coding of categorical variables). Secondly, the parameters regarding the G W R model are determined (the bandwidth sizes) and the G W R model is estimated. Then the G W R coefficient matrix is scaled using the z-standardization, and the S V D of the matrix obtained. For the interpretation of the main sources of variability between the housing submarkets, the first four columns of the V matrix from the S V D can be inspected, which is not done in this study. Then, the U matrix from the S V D is taken, and the first eight columns of the matrix U, which explain more than 90 percent variability in the coefficient matrix, are clustered, using the -means. The 192 number of k was determined via common sense combined with the Total Sum of Squares loss, for different values of*. The Initial Dataset n long I2t y m J2 13 . . ik 1 14.61776 50.04680 1560560 4 =5 2 . 0 2 14.43770 50.12354 15.62380 4 52 7 . . 0 3 14.42434 50.09010 16.10805 2 56 4 . . 1 4 14.58541 50.10435 16.32259 4 130 6 . . 1 5 14.51452 50.14090 15.19930 2 47 3 . . 0 6 14 40791 50.13379 15 65606 5 75 8 . 0 7 14.44140 50.05603 15 89370 3 76 4 . . 0 5 14 37736 50.09306 15 00433 1 27 1 . . 0 9 14.36774 50.06S76 15 78122 4 70) 3 . . 1 10 14.50586 50.13676 15.48133 1 60 10 . . 0 K- means U matrix GWR models GWR coefficients B1 B2 1 29.94906 71.08633 10.327131 2 30.44605 69.51001 9.351546 3 2919822 68.84062 9.166977 4 28.76888 7G.25991 11 25580B \ f i 28 52766 70.33482 9391740 6 31 11241 68.45289 9 187193 7 31.42737 69.98236 9 732332 8 30.14132 69.92618 8.590835 9 29.41160 69.24610 8 837706 10 31.06790 68.S656& 10.19073B B3 -21.16705 -21 31486 -19.12511 -1971206 -1921444 -19 80626 -21 06122 -18 10292 -21.03054 -18.82793 Perform PCA (SVD) usv Bk -1967436 -20 24257 -1909472 -1989011 -20 96973 -21.19112 -19.17056 -21 63445 -21 87501 -20 11814 Kriging Interpolation V matrix First 4 PCs • Bar plot Figure 1 Housing Submarkets estimation Framework Then, the last step includes, in order to have submarkets distribution continuously, the kriging interpolation of the values. The kriging technique is a form of a spatial prediction, which under certain assumptions is guaranteed to have Best Linear Unbiased Predictions. For more technical details and an overview of kriging technique see [1]. 3 Dataset and Model Structural Form The Data used for the purposes of our analyses were retrieved from the real estate webpage h t t p s : / / w w w . s r e a l i t y . c z / . This server contains multiple estate advertisements and is assumed to be a credible representation of the flats market within the area of study. From the time period from February 2021 to May 2021, more than 2500 real estate advertisements were collected. However, some advertisements were not suitable for the purposes of our analysis as some forms of human errors or missing listed prices were present. For example, observations with the listed price of 0 C Z K and 1 C Z K had to be withdrawn from the dataset. The hedonic price equation is constructed based on the variables discussed in this section. The dependent variable of interest is the price of the estate in C Z K . Following many studies, we used the logarithmic transformation as this allows for much better interpretability and may help with the absence of normality of the distribution. Therefore, the log(price) is our dependent variable. The first independent variable is the Meters. This variable is commonly found to be highly correlated with the price and has very strong explanatory power. For the second independent variable, we use the variable Room, which is indicating the number of rooms the real estate disposes of. Another, yet important variable is the Floor, which indicates the vertical position within the housing unit. Taking the inspiration from the study of [4], we also use derived variables indicating the zero and top floors. Additionally, variables regarding the building condition were also utilized, combined with other variables regarding the type of the building i.e. the Concrete and Brick. A very important key price determinant which many studies do not seem to be utilizing is the type of ownership. The two main types of ownerships are common, i.e. the Private and the Cooperative. It is naturally assumed that the Private type of the ownership is perceived exceptionally grandiosely, as there are usually certain legislative limitations associated with the Cooperative type of ownership. Lastly, another additional estate features such as the Garage, which indicates that an estate disposes of a garage, the Balcony indicating the presence of Balcony and/or Terrace and, lastly, the Kitchenette are also modeled. The data contain all described variables for 2 314 real estates and thus n - 2314. Additionally, since we are interested in spatial modeling, coordinates, i.e. longitude and latitude, were also kept. 193 A l l of these presented variables were used for our hedonic model and thus the following hedonic (log) price equation is estimated: log(price) — BQ + B\Meters + BjRoom + B^Floor + B^Floor zero + B^Floor top + B(,A fter reconstruction + B-jVery good + B$Concrete + BgPrivate + BioKitchenette + B\\Balcony + B^Garage + B^New building x Brick + (4) BuNew building x Concrete + e. When reporting the coefficients of the G W R , it is not very suitable to report every single coefficient estimate for every single spatial unit. Thus, we report certain summary statistics of the model, and the estimated G W R model coefficients are presented in Table 1. Table 1 G W R coefficient summary: Capital city Prague key Min 1st Qu. Median 3st Qu. Max Global Frac Positive Meters 0.002 0.009 0.010 0.011 0.014 0.010 1 Room -0.005 -0.001 0.002 0.026 0.216 0.031 0.669 Floor -0.029 -0.005 0.003 0.008 0.033 0.002 0.620 Floor zero -0.286 -0.127 -0.073 -0.003 0.109 -0.066 0.241 Floor Top -0.199 -0.044 -0.006 0.029 0.125 -0.009 0.454 Building type: Concrete -0.610 -0.198 -0.133 -0.088 0.258 -0.153 0.026 Condition: Very good -0.235 0.025 0.050 0.078 0.294 0.049 0.889 Condition: After reconstruction -0.121 0.034 0.069 0.106 0.312 0.071 0.940 Private -0.095 0.096 0.165 0.255 0.624 0.183 0.988 Kitchennete -0.119 -0.003 0.032 0.066 0.227 0.041 0.729 Balcony/Terrace -0.099 0.028 0.053 0.080 0.279 0.056 0.932 Garage -0.081 -0.003 0.025 0.085 0.314 0.043 0.725 Concrete x New Estate -0.266 0.119 0.198 0.268 0.566 0.188 0.935 Brick x New Estate -0.279 0.008 0.065 0.113 0.449 0.061 0.780 Constant 13.908 14.497 14.563 14.689 14.963 14.570 1 When investigating the G W R coefficients, a few fairly interesting observations can be made. Firstly, most of the regressors have a relatively large fraction of positive values the only exceptions being the Floor Zero and Concrete. The negative effect on the price is fairly expected from those characteristics as they have mostly negative perceptions associated with them. On the other hand, the characteristics such as After reconstruction, Private and New Estate have fairly large fraction of positive values and relatively high median and global effects. As the baseline, the simple linear regression model was also used to estimate the hedonic equations (4). In other to compare both models, various metrics such as e.g. A I C and R2 pse, calculated as corr(y, y)2 , can be compared. The Rpse f ° r m e O L S model is 0.61, where for the G W R model is 0.92. Similarly, the A I C seems to be in favor of the G W R model, with values AICGWR = 128 and AICOLS = 376.37. Clearly, the G W R model provides a much better model fit and thus better statistical inference. Moreover, as an extra feature of G W R , the housing submarkets can be identified. This would not be the case of the O L S model, as the model does not model the spatial heterogeneity nor does it allow the coefficient estimates to vary in the space. 4 Housing Submarkets In order to identify the housing submarkets in Prague, the G W R coefficient matrix is centered, and then, using the methodology summarised in the Figure 1, the S V D matrix factorization is performed. Firstly, the main sources of variability can be investigated, using the V matrix from the S V D (see e.g. [6]). Then, in order to identify the housing submarkets, the fe-means clustering algorithm is applied on the U matrix of the S V D 1 . The selected parameter k was determined via the loss function and common sense approach discussed above and equals to ten. The final interpolated housing submarkets in Prague can be closely observed in the Figure 2. 1 We clustered the first eight columns of the matrix, as they capture more than 90 percent of the variability. 194 Housing Submarkets: Spatial Distribution Capital city Prague Submarket Figure 2 Housing submarkets of Prague (n = 10) Once the housing submarkets are identified, it is assumed that the level of homogeneity within each cluster is considerably higher as opposed to the levels between the clusters. Particularly, the effects of the key price determinants, which are modeled in the hedonic equation (4), are assumed to be extensively similar within every single housing submarket. Moreover, as an extra feature of the described framework of identifying the housing submarket, assuming that the real estate data are collected over multiple time periods (say, once a year for the past five years), the housing submarkets allow for the analysis and evaluation of the key price determinants over time, modeling both phenomena of spatial heterogeneity as well as time effects. First and most importantly, inspecting the Figure 3, we can conclude that the effect of price determinants does vary in space. Additionally, a few interesting conclusions can be made. The effect of Concrete building type has quite a large negative effect on the price, which is naturally expected. Interestingly enough, the Garage seems to be perceived as an extravagant feature, especially in the historical parts of the city. Moreover, the effects of Meters and After a reconstruction tend to vary quite vigorously over the analyzed area. Last but not least interesting conclusion which can be made is the one submarket nA, which is associated with the very historical part of the city. Even though this submarket is quite small, a very large heterogeneity compared to the other clusters is present. 5 Conclusion The spatial statistics techniques (the G W R model and Kriging interpolation) were combined with the techniques of statistical learning i.e. the P C A and the fe-means methods in order to identify the housing submarkets in Prague. 195 Housing Submarkets: Effect of Concrete building Capital city Prague Housing Submarkets: Effect of Garage Capital city Prague 1 10 8 5 9 4 3 2 Submarket Housing Submarkets: Effect of Meters Capital city Prague 4 10 9 8 3 5 2 Submarket Housing Submarkets: Effect of After reconstruction Capital city Prague * i, u 1it*h p u 1 t I1 i i0 9 Submarket m i» •m < 10% o1 / a-10'/ Submarket $ 2 $ 'I $1 J.) $ r, 3 2 9 Submarket Figure 3 Distributions of effect on the price for selected price determinants To model the spatial heterogeneity and to explore the effect of key price determinants over space, the G W R model proved to be a very suitable model to utilize. In future researchers, one can explore the effect of time, which is without any doubt also a key factor for the hedonic price models. The framework introduced in this study is very suitable for both one time period as well as multiple time period comparisons. Acknowledgements This research was supported by the Internal Grant Agency of the Prague University of Economics and Business under project F4/27/2020. References [1] Bivand, Roger S and Pebesma, Edzer J and Gomez-Rubio, Virgilio and Pebesma, Edzer Jan. (2013). Applied spatial data analysis, Springer. [2] Hastie, T , Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: data mining, inference, and prediction. Springer Science & Business Media. [3] Hrobař, Petr., Holý, Vladimír. (2020). Spatial Analysis of the Flat Market in Prague. In International Conference on Mathematical Methods in Economics 2020 (MME 2020). Brno: Mendel University, s. 193- 199. I S B N 978-80-7509-734-7. [4] Kopczewska, Katarzyna and Čwiakowski, Piotr. (2021). Spatio-temporal stability of housing submarkets. Tracking spatial location of clusters of geographically weighted regression estimates of price determinants. Land Use Policy, 103 [5] Lipan, M . (2016). Spatial approaches to hedonic modelling of housing market: Prague case. Bachelor's thesis, Charles University, Faculty of Social Sciences, Institute of Economic Studies [6] Robert, Christian. (2014). Machine learning, a probabilistic perspective, Taylor & Francis [7] Zhou, Qianling and Wang, Changxin and Fang, Shijiao. (2019). Application of geographically weighted regression (GWR) in the analysis of the cause of haze pollution in China. Atmospheric Pollution 10-3 196 Forecasting Czech unemployment rate using dimensional reduction approach Filip Hron1 , Lukas Fryd2 Abstract. We compare prediction power within A R M A , vector autoregression (VAR) and factor augmented V A R model. The prediction is done for the Czech unemployment rate. We show, that the few unobserved common factors estimated by P C A are effective to predict different horizon of the series and outperformed tradictional V A R approach. Keywords: dimensional reduction, unemployment rate, factor augmented V A R , high dimensional time series J E L Classification: C50, O40 AMS Classification: 62P20 1 Introduction The curse of dimensionality is a problem we face in trying to understand the behaviour of complex systems, such as the national economy. The monthly and quarterly frequency of data collection significantly limits the estimation of a larger model. In the case of a popular vector autoregression, the number of coefficients increase quadratically based on the number of included variables in the model and their lagged values. The limitation to a few variables significantly affects the analysis of the transmission of information through the economy as well as its predictive power. One possible solution to the curse of dimensionality problem is dimension reduction. Geweke [4], assumes, that a complex system is driven by a few unobserved common factors. These factors have a vector autoregressive behaviour and they are a combination of a exogenous variables. The first well-known empirical study by Sargent and Sims [2] establish and show, that the two dynamic factors are relevant predictors for explaining a substantial variance of the main U.S. macroeconomic indicators with the quarterly period. Furthermore, Stock & Watson [3] highlight the usability of the dynamic factor models in forecasting. Their study supports the assumption, that the few factors estimated by the principal components have robust forecasting properties. Finally, Bernanke et al. [5] extend existing methodology of the dynamic factor modelling by the vector autoregression process (FAVAR). They propose vector autoregression estimator based on the principal component analysis capable shrinks over the one hundred macroeconomic variables into the three respectively five factors. Hence, F A V A R process is represented as a combination of a directly observable relatively small number of the variables and the few unobserved common factors. Moreover, the factor augmented V A R approach allows calculate effects of many observable variables represented by factors. In this paper, we evaluate the predictive power of the A R M A , V A R and F A V A R models. Specifically, we focus on the forecasting of the Czech unemployment rate during the period 2000 to 2020. 2 Data and methodology 2.1 Data The data set consists of the time series from 2000 until the end of 2020 with quarterly period. A l l time series are seasonally adjusted and transformed into the stationary process. Moreover, we standardize each series (Z-score) for the proper use of the principal components approach (PCA). In the study is used predominantly Czech predictors. The assumption is the domestic indicators has the most significant impact on the behavior of the unemployment rate. Furthermore, the variable selection mostly depends on the overall availability of the series during the investigated period. On the other hand, we are aware, that German 1 Department of Econometrics, University of Economics in Prague, Winston Churchill Square 4, 13067 Prague, Czech Republic, hrofOl @ vse.cz 2 1 Department of Econometrics, University of Economics in Prague, Winston Churchill Square 4, 13067 Prague, Czech Republic, lukas.fryd @ vse. cz 197 economy had a large impact on our country, too. Therefore, we include the German gross domestic product. The series are listed in the table below: Variable General unemployment rate Actual individual consumption Economically inactive Gross value added Imports of goods and services Gross domestic product Industry production Public 15+ Wages and salaries Gross household saving rate Customer price index Gross domestic product (Germany) Table 1 Overall data set for forecasting 2.2 Factor-augmented vector autoregression approach The curse of dimensionality problem can be solved for example by shrink the number of coefficients based on the proper prior beliefs (Bayesian VAR) or reduce the space of the used predictors - dimension reduction. In general, it is assumed, that the economy is influenced and driven by the relatively small number of the common unobserved factors. These factors can be understood as invisible hand or the moves on the demand side. If we consider the static factor model firstly introduced by Geweke [4], we assume that, these common factors affecting all the time series only contemporaneously and the process follows: X, = AF, + e„ (1) where vector Xt N x 1 represents the observable variables, vector Ft K x 1 represents the unobservable factors and et represents uncorrelated idiosyncratic errors. In addition, we assumed, that the number of the common factors is relatively smaller than the observable series (K « N). Next, Geweke introduce dynamic form of the factor models. The equation 1 is extended by lagged values of Ft with lag polynomials A ( L ) of q order. X,=A(L)F,+e„ (2) Stock & Watson [3] points out the usability of dynamic factor models. Authors demonstrate, that a few common factors estimated by P C A are great predictors for forecasting various time series. Based on the standard V A R model and dynamic factor models Bernanke et al. [5] define the Factor-augmented vector autoregression approach as follows: Ft Y, ML) Ft-i Yt-i (3) where M x 1 vector Yt contains the directly observable factors, K x 1 vector Ft represents the unobserved common factors similarly to the equations 1 and 2. These factors are estimated from the data set Xt, which is a relatively large macroeconomic informational set of N observable variables. In the study, the Yt represents the Czech unemployment rate and the factors Ft corresponds to the shrinkage of the other mentioned predictors into the few common components. Similarly to the dynamic factor models the A(L) indicates the polynomial lag operator [5]. The vector Xt can be expressed as a combination of F, and Yt follows: Xt = AF F, +AY Yt + e„ (4) where AF and AY represents the factor (N x K) and the directly observable variables (N x M) loading matrices [5]. Lastly, et represents the idiosyncratic specific error components. 198 Bernanke et al. [5] introduce two estimators. The first one based on the principal component analysis, two-step principal components and next extimator based on the bayesian approach. In this paper we utilize only the two-step principal components approach. The factors can be estimated based on the expression below: Ft=C,-0yYt, (5) where Ct represents the estimation of Ct - (Ft,Yt) by K + M principal components of Y, and X,. The space spanned by Ct do not take into account the fact, the Yt is observable variable. However, Stock & Watson [7] underline and highlight, that with eligible large N and the number of components P, when P » K, the principal components precisely recover the spanned space. Consequently, the factors F, are estimated from space described by Ct reduced by effect of Yt, in the first step. In the second step, standard V A R model is used. 2.3 Forecast evaluation We use 3 well known indicators to properly evaluate the forecast power of each approaches: Root Mean Square error, Mean Absolute Percentage error and Median Absolute Percentage error. These benchmarks follows: 1 T RMSE = ,-Y,(yt-Pt)2 \ í=i (6) MAPE = /- IVl ťíľ 1 — y APE / i jMAPE (=1 yt-pt yt MdAPE — median where yt represents actual value and pt predicted. yt-pt yt (7) (8) 3 Results The standard V A R process follows a combination of the unemployment rate, inflation and gross domestic product. The inflation is represented by consumer price index. In addition, the best lag order for the F A V A R and V A R is set to p = 4. According to the Bernanke et al. [5], the optimal number of used factors can be estimated based on the informational criteria. However, authors also points out, that the number of factors can be determined ad-hoc. We conclude, the methodology works well using only 3 factors. These factors described almost 60% of the system variance during the current period. We assume, the described variance increases with the lagged values. We provide in-sample and out-of-sample predictions. The forecast horizons are from one to tree years starting by year 2018. The in-sample prediction are displayed in table 2. Est R M S E M A P E M d A P E VAR(4) 0.0894 0.1034 0.1013 F A V A R (4) 0.0518 0.0648 0.0577 A R M A ( 4 , 4 ) 0.1012 0.1162 0.1034 Table 2 Prediction of each approach and their benchmarks on the period 2000 - 2017 Next, we highlight the fit of each model. The figure 3 shows, that F A V A R approach fits well the period of the Great Recession in 2008 followed by full economy recovery. The benchmarks show, the F A V A R is significantly better than other approaches in in-sample predictions. The F A V A R produce only around 5% prediction accuracy error respectively to the measures. On the other hand, the standard V A R and A R M A methodology have average 10°7o errors. Out-of-sample predictions are slightly different. The F A V A R is definitely better then the standard V A R approach with included inflation and gross domestic product, see table 3. The included information in the estimated factors definitely help to produce better forecasts. However, A R M A model outperforms the the others models significantly in first two horizons. The A R M A model produces more accured forecast. More specifically, A R M A produce forecast 1 year ahead with only RMSE - 1.1%, compared to 3% R M S E measure of the F A V A R and V A R . 199 In addition, M A P E measure for A R M A with h - 8 is equal to 4.6%, compared to MAPEFAVAR = 12% and MAPEVAR = 14.16%, which is significantly higher. O n the other hand, based on the R M S E and M A P E with the horizon h - 12 the F A V A R outperformed A R M A model. These measurements differ around 2 percentage points. Forecasting Czech unemployment rate is slightly difficult questions during period 2018 and 2020 because of its relatively small variance. Furthermore, the C O V I D situation represents relatively great shock in the Czech labor market. Moreover, it is appropriate to use more forecast accuracy measures. On the one hand A R M A holds slightly similar values for the measures, on the other hand, measures for V A R and F A V A R are considerably different. Est h R M S E M A P E M d A P E VAR(4) 4 0.0299 0.1297 0.1376 F A V A R (4) 4 0.0254 0.1064 0.1159 A R M A ( 4 , 4 ) 4 0.0110 0.0352 0.0248 VAR(4) 8 0.0313 0.1416 0.1376 F A V A R (4) 8 0.0272 0.1200 0.1159 A R M A ( 4 , 4 ) 8 0.0126 0.0460 0.0320 VAR(4) 12 0.0622 0.1837 0.1607 F A V A R (4) 12 0.0521 0.1539 0.1452 A R M A ( 4 , 4 ) 12 0.0772 0.1627 0.0819 Table 3 Unemployment rate forecast with the different horizons (4,8,12) Figure 1 Actual (black) versus predicted (red) values of scaled Unemployment rate. Different forecast horizons provided. VAR(3) with forecast horizon h = 4 ~ VAR(3) with forecast horizon h = 8 ~ VAR(3) with forecast horizon h = 12 time time time FAVAR (3 factors) with forecast horizonh = 4 — FAVAR (3 factors) with forecast horizonh = 8 — FAVAR (3 factors) with forecast horizon h = 12 time time time ARMA(4,4) with forecast horizon h = 4 — ARMA(4,4) with forecast horizon h = 8 — ARMA(4,4) with forecast horizon h = 12 a> 0.00 a> 0.00 a> 0.00 Z> 2005 2010 2015 => 2005 2010 2015 2020 Z> 2005 2010 2015 2020 time time time 200 4 Conclusion In the study, we utilize Factor-augmented vector autoregression approach to produce prediction of the Czech unemployment rate. The methodology supports the assumption, that the economy is driven by a few unobserved common factors. These factors can be estimated from the large informational set of macroeconomic variables. We show, that three unobserved factors extracted from the twelve macroeconomics time series significantly improve long-term (3 years) unemployment rate forecast. On the other hand, A R M A model has better forecast accuracy for short-term period (1-2 years). Moreover, F A V A R model captures unemployment rate during Great Recession in 2008 better, than V A R and A R M A model. Acknowledgements We gratefully acknowledge the support of this project by the research grant V S E I G A F4/34/2020, Faculty of Informatics and Statistics, University of Economics, Prague. References [1] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267-288. [2] Sargent, T. J., & Sims, C . A . (1977). Business cycle modeling without pretending to have too much a priori economic theory. New methods in business cycle research, 1, 145-168. [3] Stock, J. H . , & Watson, M . W. (2002). Forecasting using principal components from a large number of predictors. Journal of the American statistical association, 97(460), 1167-1179. [4] Geweke, J. (1977). The dynamic factor analysis of economic time series. Latent variables in socio-economic models. [5] Bernanke, B . S., Boivin, J., & Eliasz, P. (2005). Measuring the effects of monetary policy: a factoraugmented vector autoregressive (FAVAR) approach. The Quarterly journal of economics, 120(1), 387-422. [6] Lombardi, M . J., Osbat, C , & Schnatz, B . (2012). Global commodity cycles and linkages: a F A V A R approach. Empirical Economics, 43(2), 651-670. [7] Stock, J. H . , & Watson, M . W. (2002). Forecasting using principal components from a large number of predictors. Journal of the American statistical association, 97(460), 1167-1179. 201 Health index for the Czech districts calculated via methods of multicriteria evaluation of alternatives Dana Hiibelova1 , Beatrice-Elena Chromková Manea2 , Alice Kozumplíková3 , Martina Kuncová4 , Hana Vojáčkova5 Abstract. The aim of this paper is to calculate the health index for each of the 77 Czech districts to identify the best and worst districts from this perspective. The health index is constructed as a composite indicator of 8 different areas covering 60 criteria. The index reduces the size of the original set of indicators, which allows its clear and easy interpretation, including its spatial differentiation. Thanks to this, the results can be a suitable basis for decision-making by state administration and self-government bodies or other organizations dealing with the issue of population health. The data come from publicly available databases of the Czech Statistical Office, the Institute of Health Information and Statistics of the Czech Republic, the Ministry of Labor and Social Affairs of the Czech Republic, the Czech Hydrometeorological Institute and the Czech Panel Survey of Households. For the health index construction, the methodology using selected methods of multicriteria evaluation of alternatives is described. With regard to the available data, methods using numerical input, weights of criteria and determining the complete order of alternatives were analyzed and used. Keywords: multicriteria comparison, health index, health inequality, Czech districts J E L Classification: C44,114 A M S Classification: 90B50, 91B06 1 Introduction Health is a concept of a complex nature that raises many theoretical (concerning the definition and conceptualization of health) and empirical (as to the way of how we operationalize, measure, and analyse health status) questions. The World Health Organization (WHO) defines health as "... an overall state of complete physical, mental, and social well-being and not merely the absence of disease or infirmity." It is also "... a source of daily life, not a goal i n life" [26]. The health of the population is considered an essential indicator of the development and competitiveness of regions. Health quality indicates the state and links between social, economic, demographic, environmental, but also political processes [12]. Determinants of health then represent indicators that influence the presence and development of risk factors for the disease [3]. Health inequalities are unequal differences resulting from inequalities i n a number of determinants of different natures, which, due to their uneven distribution, are considered one of the main causes of creating and maintaining health inequalities [13]. A n y measurable aspect of health that varies among individuals or according to socially relevant groups i n a population are regarded to be health inequalities. Ideally, everyone should have the same chances to reach his or her full health potential [4]. Although health inequalities are uneven, they can be prevented, as they are the result of inappropriate public policies or unhealthy lifestyles [25]. Health inequalities are apparent in different groups of the population, according to age, gender, ethnicity, socio-economic status or place of residence, etc., resulting i n the exposure of individuals and groups to spatially distributed health risks. Spatial epidemiology focuses its activity on the evaluation of territorial aspects of health inequalities based on the interpretation of geographical contexts. When using secondary data, acceptable explanations of spatial differentiation of determinants of health inequalities can be found, while the disadvantages (time and economic complexity) of case and control studies, resp. cohort studies are eliminated. The direct applicability of the method of descriptive epidemiology has been proved, for example, in the creation of the Atlas of Cancer Incidence i n England and Wales [24] or the U S Mortality Atlas [2]. Spatial data i n relation to health characteristics are also presented by the Eurostat atlases [8], [10], which use various 1 Mendel University in Brno, Faculty of Regional Development and International Studies, Department of Social Studies, třída Generála Píky 2005/7, 613 00 Brno, Czech Republic, hubelova@mendelu.cz 2 Mendel University in Brno, Faculty of Regional Development and International Studies, Department of Social Studies, třída Generála Píky 2005/7, 613 00 Brno, Czech Republic, chromkov@mendelu.cz 3 Mendel University in Brno, Faculty of Regional Development and International Studies, Department of Environmental Sciences and Natural Resources, třída Generála Píky 2005/7, 613 00 Brno, Czech Republic, alice.kozumplikova@mendelu.cz 4 College of Polytechnics Jihlava, Department of Economic Studies, Tolstého 16, 586 01 Jihlava, Czech Republic, kuncova@vspj.cz 5 College of Polytechnics Jihlava, Department of Technical Studies, Tolstého 16, 586 01 Jihlava, Czech Republic, hana.vojackova@vspj.cz 202 calculations and cartographic outputs to describe i n detail the situation i n European countries on the basis of selected causes of avoidable mortality and taking into account the cultural and social context of the country. Another important atlas concerning the mortality and health situation of the population is the Atlas of Health in Europe [27]. This atlas contains basic statistics on the health status of the population i n the period 1980-2001. The data were collected, verified, and processed i n a uniform way for a better comparison. The second edition of the 2008 atlas provided a summary of current data from 53 European countries (1980-2006) to effectively address the new public health challenges [28], Outputs using similar methods on data from the Czech Republic have been published since the 1990s when researchers from the Institute of Health Information and Statistics prepared the Atlas of Avoidable Deaths in Central and Eastern Europe, similar to the already published atlas at that time for Western European countries [14]. A n Atlas of Morbidity and Mortality was also created by the same researchers, where the health status and mortality i n the E U 1 5 countries and in other acceding countries were compared. The Atlas of Socio-Spatial Differentiation [23] and the context of social and health services presented i n the Atlas of Long-Term Care of the Czech Republic [29] deal with the social aspects of development. A s these resources do not cover more than one health-related area, the aim of this article is to fill this gap and to calculate the health index for each of the 77 Czech districts to identify the best and worst districts using selected methods of multicriteria evaluation of alternatives. The health index is constructed as a composite indicator of 8 different areas covering 60 criteria. The index reduces the size of the original set of indicators, which allows its clear and easy interpretation, including its spatial differentiation. 2 Data and methodology 2.1 Data The data come from publicly available databases of the Czech Statistical Office [7], the Institute of Health Information and Statistics of the Czech Republic [17], the Ministry of Social Affairs of the Czech Republic [21], the Czech Hydrometeorological Institute [5] and the Czech Panel Survey of Households [6]. They cover 60 criteria in 8 different areas (Table 1) and 77 Czech districts. The selection of the areas and criteria was inspired by the Euro-Healthy project [8], which created an index for the health status of the population i n E U countries at their regional level N U T S 2 and metropolitan areas. As in the Euro-Healthy project, we have compiled a comprehensive health index, which is composed of sub-indices for various individual dimensions of health. Unlike the "EuroHealthy" project, our results differ in their regional dimension, which allows us to specify i n more detail the individual determinants of health inequalities in the Czech Republic at the level of districts and selected municipalities. It allows us to more objectively identify the regional differentiations in their geographical, settlement, economic, social, environmental differences. Most of the 60 criteria are of minimization type, only 16 of them are of the maximization type. The choice of the areas and criteria is based on previous research (see for example [15]) and on data availability for the chosen topic. No. Area Number of criteria 1 Economic conditions and social protection 14 2 Education 2 3 Demographical changes 4 4 Environmental conditions 6 5 Individual conditions 3 6 Safety i n road transport and crime 5 7 Sources of health and social care 5 8 Health status 24 Table 1 Basic description of compared areas 2.2 Methodology The original idea of the research was to determine the health index for each district i n the Czech Republic on the basis of the 60 criteria described above, divided into 8 areas. Methods of multicriteria evaluation of alternatives are suitable for this type of analysis as described in [18]. These methods have been developed to help the decisionmaker to find the best alternative or the ranking of alternatives [11]. To find the ranking of alternatives (ai, a2, ..., ap) via numerical criteria (/}, f2, ..., fi) when equal preferences are set for the criteria, the methods using criteria weights which result i n a complete order of alternatives can be applied [1]. The methodology for this paper is based on two phases: i n the first one, the 14 criteria of the l.area (Economic conditions and social protection) were 203 used as a sample to assess the suitability of the application of individual methods; in the second phase, the health index calculation procedure based on 3 steps was suggested. In the first step, the evaluation of districts was obtained according to each area separately (with the equal weights of criteria within the areas). In the second step the same method was used for the complete evaluation i n 8 areas together. This result could be taken as the health index of each district but for the graphical representation in the map of the Czech Republic, the results were divided into 6 clusters in the third step. Due to the nature of the input data, methods using quantitative evaluation of alternatives were tested to select the one that will correspond to the expected evaluation of districts and can be easily programmed to create the graphical output. The health index can therefore be estimated as a utility based on the mentioned 8 areas - then methods for maximizing the utility function are offered for calculations, i.e. W S M (Weighted Sum Method), S A W (Simple Additive Weighting), W S A (Weighted Sum Approach), U F A (Utility Function Approach) [1]. The second possibility is to create the benefit as a relative distance from the ideal and nadir alternative, which is a typical procedure for the TOPSIS method. A s the aim is to divide the districts into color clusters based on the index, and the utilities of each criterion should be obtained from the method and not set by decision-maker (as in U F A ) , we decided first to test W S M , S A W , W S A and TOPSIS and then to choose one of them for the next calculations. 2.3 WSM, SAW and WSA methods W S M , S A W and W S A belong to the category of Multi-criteria decision making ( M C D M ) where the principle of utility maximization is used [1]. For all of them the formula (1), for the calculation of the final utility u(a,) for each alternative i, is common, the only difference is the calculation of normalized values r y that express the transformation of data into 0-1 scale (if wj are the criteria weights). U{ai) = T.J=1Wj7ij. Vi = l , - , p (1) W S M is one of the simple methods that does not uses any criteria normalization function and for the calculation of the final utility of the alternatives the r y is equal to real data a,j. S A W normalizes real data and several normalization formulas can be used, we calculate r y as i n formula (2) for the maximization type of criterion or (3) for minimization type, where min(a,j)=0 i n both formulas. W S A is a case of S A W method where for the normalization formulas (2) and (3) are used with min(a,j) taken from the real data. -min a: n, = — 1 . (2) J max aij—min ay ni = • : (3) > max aij—mm ay 2.4 TOPSIS method TOPSIS (Technique for Order Preference by Similarity to Ideal Solution) method is able to rank the alternatives using the relative index of distance of the alternatives from the ideal and nadir alternative. The steps of this method can be described as follows [11]: normalise the decision matrix according to Euclidean metric (4), calculate the weighted decision matrix W = (w,j) = y, • ry , and identify vectors of the hypothetical ideal H and nadir D alternatives over each criterion, where Hj = maxWjy and Dj = m i n wtj, measure the Euclidean distance of every alternative to the ideal and to the nadir alternatives over each attribute using formulas (5) and finally order alternatives by maximizing ratio c, which is equal to the ratio of df and the sum of d,+ and df. ru = -•====, V i = 1 p, j = 1 k, (4) dt = J l y U O y - Hj)2 and d-= J l ? = 1 ( w i ; - - D ; ) 2 , V i = 1, V, (5) A l l these steps assume that all criteria are of the maximization type. If not, it is necessary to transform the minimization criterion into maximization. Usually the difference from the worst case is used (TOPSIS-classic) for transformation but it was proved that the inverse values (TOPSIS-inverse) could be better for TOPSIS method with more minimization criteria [19], that is why we use both kinds of transformation. 204 3 Results and discussion Before the final health index construction, it was necessary to select the appropriate method . The l.area with 14 criteria, where 4 are maximizing and 10 are minimizing, was takes as a sample to select the method. Since the input values are both i n different units and in different scales, it is not appropriate to use the W S M method . The results (utilities for S A W and W S A , and relative distance for TOPSIS) are shown on Figure 1. It is clear that there are ind eed differences i n the TOPSIS method for different transformations, and it is visible that the W S A method better spreads the districts in terms of the resulting utility on a scale of 0-1. Therefore, W S A was used as an input for the 2.phase of the index calculations. W e note that the correlation of the results between W S A and other methods was i n all cases higher than 0.84, so the results of all tested method s are very similar. 50 I/) £ 4 0 i° 3 0 'S 20O) 10 0 l n i ••i . .1 I SAW I WSA ITOPSIS-clasic TOPSIS-inverse 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 Utility (SAW, WSA) or the relative distance (TOPSIS) 0,9 Figure 1 First phase results - number of districts by final utility (WSA) or relative distance (TOPSIS) The second phase was based on W S A method . Table 2 illustrates the first step in the health index calculations, it means the scores (utilities) of each area of measurement for the first 10 districts (alphabetically ordered - there is no space to show all 77 districts). The higher the value, the better a district is situated on the ind ex (area). District l.area 2 area 4. area 5 area 6.area 7 area 8 area Benešov 0.65990 0 37688 0 49608 0.58251 0 40297 0.61525 0 39019 0 59461 Beroun 0.60963 0 43044 0 47313 0.50299 0 44326 0.75968 0 46559 0 60469 Blansko 0.59635 0 43331 0 50368 0.61504 0 16543 0.81784 0 35800 0 71159 Brno-město 0.52210 0 91188 0 57322 0.47588 0 57080 0.71333 0 80041 0 69450 Brno-venkov 0.62155 0 46245 0 49281 0.57769 0 47073 0.76993 0 20837 0 73282 Bruntál 0.32968 0 16065 0 53924 0.70661 0 32183 0.60676 0 43221 0 37165 Břeclav 0.61993 0 19019 0 45514 0.47287 0 54274 0.73828 0 41117 0 65755 Česká Lípa 0.55099 0 20507 0 65403 0.66022 0 30179 0.40803 0 24782 0 43539 České Budějovice 0.62144 0 56948 0 57650 0.73994 0 48904 0.74659 0 58184 0 62593 Český Krumlov 0.58995 0 18394 0 50539 0.93554 0 58595 0.61723 0 39332 0 45265 Table 2 Demonstration of the first step of calculations using the W S A method (example of 10 districts) Rank 1 2 3 4 5 6 7 73 74 75 76 77 District Praha-záp. Brno-město České Budějovice Praha- východ Plzeň-město Jihlava Praha Most Bruntál Karviná Jeseník Tachov H.Index 0.696 0.690 0.637 0.611 0.594 0.582 0.581 0.359 0.348 0.344 0.340 0.329 Table 3 Demonstration of the second step of calculations via W S A (example of best and worst districts) We can observe differences both within the same district (among various areas measured) and between districts (within the same area). For example, within the Brno-město district the 2. area (Education) has the highest value of the index (0.91188) as compared to the other areas, and also the highest level on the same area (e.g. 2. area) among presented districts (0.91188 compared to 0.16065 for Bruntál district). In the second step, W S A method was used again for the final health index calculation, where all areas were supposed to have the same weight. Table 3 describes the best and the worst districts. A s we assumed, big cities like Prague and Brno belong to healthier areas (mainly for the good results i n l.area - Economic conditions and social protection; 2.area - Education; 5.area - Individual conditions and 8.area - Health status), while northwest 205 Bohemia and north Moravia belong to the less healthy ones (for example the last district, Tachov, has belowaverage results i n all areas except 4.area - Environmental conditions; Jeseník or Bruntál has low values also i n the 4.area and in the 6.area - Safety i n road transport and crime). In the last (third) step, the obtained health index values were divided into 6 clusters according to the size of the resulting index (1.cluster cover districts with the values higher than 0.65, the last, 6.cluster, cover districts with the value lower than 0.45), and then the clusters were incorporated into the map of the Czech Republic for better visualization of the results (see Figure 2). Figure 2 Districts according to the clusters of the health index 4 Conclusion For the optimal development of the regions, it is important to create and implement strategies that will lead to ensuring competitiveness and sustainable regional development, for which the optimal quality of population health is one of the conditions. A satisfactory health status of the population directly affects the economic level of the region, and it has a positive effect on the social development and demographic stability at the regional level [16]. Not only due to the current pandemic situation (COVID-19), it is obvious that an increased morbidity and lower quality of health i n the population puts more strain on the public budget and the use of public goods and services (e.g. social and health expenditures). A healthy region and sustainable regional development therefore require a healthy population, a high-quality health system, individual responsibility for own health, and proper prevention[22]. The aim of this article was to propose a three-step procedure using the W S A method for calculating the health index for each of the 77 Czech districts and finally put it in the map as clusters. The results can be a suitable basis for decision-making by state administration and self-government bodies or other organizations dealing with the issue of population health. The proposed procedure also allows the setting of weights of criteria and areas according to requirements of the decision-maker. Acknowledgements The paper was supported by the contribution of long-term institutional support of research activities by the College of Polytechnics Jihlava and by project TL03000202 Health Inequalities in the Czech Republic: Importance and Relationship of Health Determinants of Population i n Territorial Disparities is co-financed with the state support of the Technology Agency of the Czech Republic i n the Program E T A . References [1] Alinezhad, A . & Khalili, J. (2019). New Methods and Applications in Multiple Attribute Decision Making (MADM). Springer Nature Switzerland A G . [2] Bell, B . S., Hoskins, R. E., Pickle, L . W . et al. (2006). Current practices i n spatial analysis of cancer data: mapping health statistics to inform policymakers and the public. Int J Health Geogr 5, 49. https://doi.org/10.1186/1476-072X-5-49 [3] Berman, J. D . , Fann, N . , Hollingsworth, J. W . et al. (2012) Health Benefits from Large-Scale Ozone Reduction i n the United States. Environmental Health Perspectives, 120(10): 1404-1410. 206 [4] C S D H . (2008). Closing the gap in a generation: Health equity through action on the social determinants of health. Final Report of the Commission on Social Determinants of Health. W H O , Geneva. [5] Czech Hydrometeorological Institut (2020). National Geoportal INSPIRE, http://geoportal.gov.cz [6] Czech Statistical Office (2020). Population Censuses, https://www.czso.cz/csu/czso/population- censuses [7] Czech Statistical Office. (2020). Demografická ročenka okresů. https://www.czso.cz/csu/czso/demograficka-rocenka-okresu-2010-az-2019 [8] E C . 2020. Shaping EUROpean policies to promote H E A L T H equitY. https://cordis.europa.eu/project/id/643398/reporting [9] Eurostat. (2002). Health statistics: atlas on mortality in the European Union: data 1994-96. ed. Luxembourg: Office for Official Publications of the European Communities. [10] Eurostat. (2009). Health statistics: atlas on mortality in the European Union, ed. Luxembourg: Office for Official Publications of the European Communities. [11] Figueira, J., Greco, S. & Ehrgott, M . (2005). Multiple Criteria Decision Analysis - State of the Art Surveys. [12] Fraser, S. D. S. & George, S. (2015). Perspectives on Differing Health Outcomes by City: Accounting for Glasgow's Excess Mortality. Risk Management and Healthcare Policy 8: 99-110. [13] Graham, H . (2004). Social determinants and their unequal distribution: clarifying policy understandings. The Milbank quarterly, 82(1): 101-124. [14] Holland, W . W . (1993). European Community Atlas of Avoidable Death. 2nd edition, vol. 2. Commission of the European Communities Health Services Research Series no. 9. Oxford: Oxford University Press. [15] Htibelová, D . , Chromková-Manea, B . & Kozumplíková, A . (2021). Zdraví a jeho sociálni, ekonomické a environmentálni determinanty: teoretické a empirické vymezení. Sociológia. 53 (2), 119-146. [16] Htibelová, D . , Ptáček, P. & Šlechtová, T. (2021). Demographic and socio-economic factors influencing health inequalities in the Czech Republic. GeoSpace. 15 (1) in press. [17] Institute of Health Information and Statistics of the Czech Republic. (2020). Mortalita. https://reporting.uzis.cz/cr/index.php?pg=statisticke-vystupy~mortalita [18] Ishizaka, A . & Nemery, P. (2013). Multi-Criteria Decision Analysis. U K : John Wiley & Sons, Ltd. [19] Kuncová, M . & Sekničková, J. (2020). Influence of the Different Transformation of the Minimization Criteria on the Result - the Case of W S A , TOPSIS and A R A S Methods. In: International Conference on Mathematical Methods in Economics 2020 ( M M E 2020) [online]. Brno, 09.09.2020 - 11.09.2020. Brno: Mendel University, 2020, pp. 332-338. [20] Mardani, A., Jusoh, A., Nor, K . M D . , Khalifah, Z., Zakwan, N . & Valipour, A . (2015). Multiple criteria decision-making techniques and their applications - a review of the literature from 2000 to 2014. Economic Research-Ekonomska Istraživanja, 28(1), 516-571. [21] Ministry of Labor and Social Affairs of the Czech Republic. (2020). Statistiky o trhu práce. https://data.mpsv.cz/web/data/statistiky New York: Springer Science + Business Media Inc. [22] Novotná, H . , Ponikelský, P., Slepička, A . & Šafařík, F. (2011). Společnost a životní prostředí v regionálním rozvoji. Praha: Vysoká škola regionálního rozvoje. [23] Ouředníček, M . , Temelová, J. & Pospíšilová, L . (eds.). (2011). Atlas sociálně prostorové diferenciace. Praha: Karolinum. [24] Silva dos S. & Swerdlow, A., J. (1993). Thyroid cancer epidemiology i n England and Wales: time trends and geographical distribution. British Journal of Cancer, 67: 330-340. [25] Whitehead, M . & Dahlgren, G . (2007). Concepts and principles for tackling social inequities in health: Levelling up Part 1. Copenhagen: WHO Regional Office for Europe. http://www.euro.who.inť_data/assets/pdf_file/0010/74737/E89383.pdf [26] W H O , (2019): Annual Report 2018. Promoting Access to Safe, Effective, Quality and affordable essential medical products for all. W H O , Geneva. https://apps.who.inťiris^itstrearn/handle/10665/324765/WHO-MVP-EMP-2019.03-eng.pdf [27] W H O . (2003). Atlas of health in Europe. WHO, Copenhagen, Denmark: Regional Office for Europe, 1 atlas (vii, 112 p.). [28] W H O . (2009,). Atlas of health in Europe. 2nd ed., 2008. WHO, Copenhagen, Denmark: Regional Office for Europe, 1 atlas (vii, 126 p.). [29] Wija, P., Bareš, P. & Zofka, J. (2019). Atlas dlouhodobé péče. Praha: Institut pro sociální politiku a výzkum, z. s. 207 Modelling of PX Stock Returns during Calm and Crisis Periods: A Markov Switching Approach Michaela Chocholata1 Abstract. This paper deals with the modelling of the Czech stock market characterized by the weekly P X stock returns based on the Markov switching (MS W) approach. The analysed period spanning from April 8, 2007 to February 7, 2021 comprises both the "normal" calm and "turbulent" crisis periods. The two-regime M S W model thus enables to capture and specify the periods of bull and bear markets characterized by positive mean return-low volatility and negative mean return-high volatility periods, respectively. The analysis was enriched by consideration of the price/return - trading volume relationship, but it led neither to recognizable changes in values of estimated parameters nor to substantial differences in transition matrices. The presented results clearly confirmed the existence of several turbulent periods reflecting the worldwide financial and economic situation indicating the occurrence of crisis period with the probability of almost 0.14. However, the "normal" calm behaviour of returns occurred much more often and was proved to be more persistent in comparison to "turbulent" crisis period. Keywords: P X index, stock returns, Markov switching model, bull and bear market J E L Classification: G10, C22, C58 AMS Classification: 62M05, 62P20, 91B84 1 Introduction The P X index is the official index of the Prague Stock Exchange and consists of the most actively traded blue chips of the Prague Stock Exchange [17]. The P X index is a free-floating price-weighted index calculated in C Z K , the dividend yields are not included in its calculation. It was calculated for the first time on March 20, 2006 replacing the P X 50 and P X - D indices. The P X index continues in the development of the P X 50 index adopting its historical values. The P X 50 index, composed of 50 issues, was launched at April 5, 1994 with the opening value fixed at 1000 points. The number of basic issues of P X index has become variable since December 2001. The P X index is reviewed on a quarterly basis in order to maintain index quality [19]. Although nowadays the P X index contains issues of only 12 companies, it can serve as an indicator of the Czech economics [16]. In general, stock indices exhibit upward and downward trends reflecting impacts of various types of shocks and crises. Considering the ongoing Covid-19 pandemic, plenty of studies have been published to capture its effects on stock market returns across different countries and areas (see e.g., [2] and [13]). Very popular seems to be, to use the Markov switching ( M S W ) model enabling to distinguish different states of the world - calm periods with the normal behaviour of stock returns (bull market) and crisis periods with dramatically changing behaviour of stock returns characterised by high volatility and falling stock prices (bear market), see e.g., [7]. The M S W model for the Czech P X stock returns data was estimated e.g., by [12] and [14]. It is clear that the time-varying behaviour of stock returns is a natural part of trading and is linked to the arrival of new information and the subsequent reaction of market participants. Based on the new information, expectations regarding future market prices are being revised and this will also be reflected in the trading volumes. The analysis of stock returns can be thus further extended by consideration of the trading volume variable. Various approaches are known to analyse the relationship between stock prices/returns and trading volume - see e.g. [9] and [15]. Lamoureux and Lastrapes [11] supposed that volatility and trading volume are simultaneously and positively correlated, as they are a function of a stochastic variable defined as an information flow. This means that the arrival of new information on the market will simultaneously cause a change in volatility and trading volumes. However, as pointed out by Kalotychou and Staikouras [8], such a model does not explicitly exclude the possibility of different lags in price-volume relationship. The study of W u [15] implements the Markov switching model for weekly returns of the Taiwan stock exchange index, including period January 2000 - June 2015, to study the relationship 1 University of Economics in Bratislava, Faculty of Economic Informatics, Department of Operations Research and Econometrics, Dolnozemská cesta 1, 852 35 Bratislava, michaela.chocholata@euba.sk. 208 between stock returns and trading volume. Presented results documented different performance across regimes and analysed industries. This paper attempts to capture the dynamic behaviour of the Czech stock market characterized by the weekly values of the P X stock index during the period April 8, 2007 - February 7, 2021. To identify the bull and bear regimes, the two-regime M S W model is used. The analysis is enriched with the consideration of the trading volume variable to study its impact on stock returns across analysed period. The rest of the paper is organized as follows. Section 2 is devoted to methodology issues including the M S W model and the stock return - trading volume relationship, section 3 presents the data and empirical estimation results and section 4 concludes. 2 Methodology Since the behaviour of financial time series can change quite dramatically during some periods of time, in recent years several time series models have been developed to capture the occurrence of different regimes (states) generated by a stochastic process2 . Literature, see e.g. [5], generally distinguishes two categories of regime-switching models - models with regimes determined by observable variables and models with regimes determined by unobservable variables. With regard to the empirical part of the paper, we will concentrate on the second category of models which supposes that the occurrence of the particular regime is determined by an unobservable stochastic process usually denoted as st. The most famous model in this category, the Markov switching model (MSW) in which the transitions between regimes are governed by a Markov process, was popularized by Hamilton [6], The basic M S W model distinguishes only two states of the world corresponding to calm "positive mean-low volatility" periods and turbulent "negative mean-high volatility" periods, also known as bull and bear markets ([1], [3], [10], [14]). In case of two regimes, the variable st takes on two values, 1 and 2, i.e. if st =1 the process is in regime 1 at time t and if st = 2, the process is in regime 2 at time t [4], The dynamic behaviour of stock returns rt can be in general described by a linear AR(p) model in both regimes [5]: rt = 0o,sc + i,strt-i + - + 0p,strt-p + £t. s t = 1,2 , £t~N(0, [0.011111, -0.065934]. Then the Monte Carlo simulation to run 1000s of runs of different randomly generated weights for the individual stocks were used. Finally, the calculation of the expected return, expected volatility and Sharpe Ratio for each of the randomly generated portfolios were done. The limit for the calculation was always 1000 random distributions for a given portfolio, with the minimum share per fund was set at 5%. This value was determined on the basis of a realistic estimate of the potential investor's behavior. The goal is not to find the ideal machine combinations for portfolios that are far from human reasoning. The aim is to simulate possible investor decisions that could potentially be made in given securities markets. The script was adopted from [13], see below: 4 Note: The Markowitz Bullet is related to shaping (drawing) the Markowitz efficient frontier. This is well and detailed described, for example, in the text by Ian Rayner, 2019 (https://www.raynergobran.com/2019/02/the-markowitz-bullet-a-guided-tour/), Ian Rayner is originator of Rayner Gobran L L C (https://www.raynergobran.com/). 216 returns • data.pet change!) ^calculate mean daily return and covariance of daily returns mean d a i l y returns = returns.mean() c o v m a t r i x = returns.cov() #set up array to hold results results • n p . z e r o s ( ( 3 . n u m p o r t f o l i o s ) ) for i in iter(rangefnum p o r t f o l i o s ) ) : #select random weights for portfolio holdings weights = np.random.random(len(data.columns)) #rebalance weights to sum to 1 weights /= np.sum(weights) ^calculate portfolio return and volatility p o r t f o l i o return = np.sumfmean d a i l y returns * weights) * 252 p o r t f o l i o std dev = np.sqrt(np.dot(weights.T,np.dot(cov matrix, weights))) * np. #store results in results array r e s u l t s [ 0 , i ] = p o r t f o l i o r e t u r n r e s u l t s [ l , i ] = p o r t f o l i o std dev #store Sharpe Ratio (return / volatility) • risk free rate element excluded for r e s u l t s [ 2 , i ] = r e s u l t s [ 0 , i ] / r e s u l t s [ l , i ] #convert results array to Pandas DataFrame r e s u l t s f r a m e • p d . D a t a F r a m e ! r e s u l t s . T , c o l u m n s = [ ' r e t ' , ' s t d e v ' , ' s h a r p e ' ] ) 3 Results The outcome of our research is a graphical representation of sets of Markowitz portfolios. Each output is limited either by time - by a year or by a 5-year period. Individual types of funds that are traded in individual countries have been examined. Findings: Every investor is looking for the best way to invest. In today's globalized world, where it is possible to invest in different countries, there is a common question of where to invest (in what region) and in what type of investment funds. This is what our research focused on. The types of funds traded in selected countries with the best results are presented below (due to the scope of the paper, we present only selected ones and in brief). It should be added that we based our research on the 5-year development of individual funds, their percentage changes, and these values were averaged to six decimal places. This gives only a summary picture of the performance of each type of investment fund by country, not an examination of the sub-development of selected funds (in the order of units), as it is the case with most research conducted in the Markowitz model. It is the combination of return and volatility (the C value, the color scale shown to the right of the graph) shows what values a potential investor may be in. Belgium, 2009-2014, 49 funds Switzerland 2009-2014, 65 funds volatility Figure 1 5 - years Performance Bond Funds Markowitz Portfolios Results Mixed Assets funds are available mainly in in the developed economies of Western Europe, where they are traditional investment funds. 217 France 2013-2018, 28 funds Spain, 2013-2018, 31 funds u 1.5 1.0 0.5 S o.o -0.5 -1.0 -1.5 fl u1 2 3 4 5 volatility Figure 2 5- years Performance Mixed Assets Funds Markowitz Portfolios Results The limitation of this model is, that not all the funds are covered. Only the funds with daily 5 years performance was considered. Furthermore, it was calculated with average values. On the other hand, the research team want to get there research as close to praxis as possible. The assumption was that the investor follows the historical development of individual markets and funds. Our potential investor only monitors stable funds, i.e. those that publish their results daily. If we wanted to create a basis for the decision-making of a risk-seeking investor, we would have to include those funds that do not have a 5-year history. However, this has not been addressed in the research. After studying all the outputs, the research team made the following key conclusions: Annual performances often reach extreme values; 5-year performances show more stable indicators Mixed Assets funds are able to generate higher value but are riskier; Developed markets are less loss-making than young stock markets (non-G20); The effective set and typical Markowitz Bullet can be seen for portfolios containing more than 20 and 30 different funds, respectively; For under-developed countries under 20 funds, mainly the convex downward part of the Markowitz Bullet shows; In developed countries with more than 20 funds, there is a dominant upward concave part of the Markowitz Bullet. 4 Conclusions A differentiation of the portfolio is a common strategy of every investor. How to break down portfolio diversification into what types of assets, in what proportion and in which country is a key question of success in the financial markets. The theory states that the financial markets, in terms of their awareness, are closest to perfectly competitive markets. In general, the historical values of the various types of investment funds are well known and readily available, and they can be used to estimate their future development. O f course, there is no internal information on the actual condition of the funds, which is an unavoidable investment risk. Nevertheless, the statistics help to estimate very closely the future development of investment funds based on historical data. This fact, together with Markowitz's model theory, was the main starting point of our investigation. In the initial phase of our research, we focused primarily on funds such as alternatives, equity, bonds, mixed assets, commodities, and money markets. 1620 possible Markowitz portfolios were created using advanced computational tools, each containing 1,000 possible investment portfolio layouts. The limitation of our investigation was mostly the availability of complete time series of examined funds. W e used daily percentage changes in all available funds by country and fund type. From the point of view of the availability of the data we required 5 years of daily values (different periods were examined, always for 5 years since 2009), we were able to examine only bonds, equity and mixed assets funds. The whole research was aimed at maximum practicality and simplicity of the model. Therefore, in addition to the 5year daily values of individual funds (averaged percentage changes, rounded to 6 decimal places), we also based on other limitations. Those were minimum 5% investment in one fund, a portfolio containing at least 5 funds, and exactly 1,000 portfolio combinations. This limits mean that we expect a more conservative type of investor based on well-established and historically known investment funds. Steady yields, with less frequent negative values, have been shown by bond funds, while equity funds are generally more dispersed. Mixed Assets have known values in Western European countries. 218 Despite the initial ambitions of the research team, a large number of Markowitz Bullets have been generated, namely 550, from which 8 key conclusions have been drawn. In the future, we would like to examine the general performance of funds by type. W e want to focus on examining the economic efficiency of SRI funds compared to other types. (That was also our original research intention.) Research has shown that to this end it will be necessary to better define the period of time for the relevant data, only 3 years (instead of 5 years), and we also recommend creating a rather triple number instead of 1000 Markowitz combinations. Acknowledgements This paper was supported by grant FP-S-20-6376 of the Internal Grant Agency at Brno University of Technology. References [I] Anagnostopoulos, K . P., & Mamanis, G . (2011). The mean-variance cardinality constrained portfolio optimization problem: an experimental evaluation of five multiobjective evolutionary algorithms. Expert Systems with Applications, 38(11), 14208-14217. [2] Baule, R. Korn, O. Kuntz, L . (2019) Markowitz with regret. Journal of Economic Dynamics and Control, 103, 1-24. ISSN 0165-1889, https://doi.Org/10.1016/j.jedc.2018.09.012. [cit. 2020-12-21] Available at: http://www.sciencedirect.com/science/article/pii/S0165188919300600. [3] Bienstock, D . (1996). Computational study of a family of mixed-integer quadratic programming problems. Mathematical Programming, 74(2), 121-140. [4] Cesarone, F., Scozzari, A . , & Tardella, F. (2013). A new method for mean-variance portfolio optimization with cardinality constraints. Annals of Operations Research, 205(1), 213-234. [5] Chang, T.-J., Meade, N . , Beasley, J. E., & Sharaiha, Y . M . (2000). Heuristics for cardinality constrained portfolio optimisation. Computers & Operations Research, 27(13), 1271-1302. [6] Elton, E . J. (2007) Modern portfolio theory and investment analysis. 7th ed. New York: John Wiley & Sons, 728 p. I S B N 978-0-470-05082-8. [7] Haugen, H . (2000) Modern Investment Theory. Pearson, 5th edition, 680 p. I S B N 978-0130191700. [8] Hicks, J. R. (1934) Application of Mathematical Methods to the Theory of Risk, Econometrica. [9] Jupyter - Python. (2020) [cit. 2020-12-21] Available at: https://github.com/pandas-dev/pandas/blob/v0.25.3/pandas/core/generic.py#Ll 0421 - L I 0435. [10] Markowitz model, The Engineering Economist, DOI: 10.1080/0013791X.2019.1636439. [II] Markowitz, H . (1959) Portfolio Selection: Efficient Diversification of Investment. John Wiley & Sons, New York. [12] Morning Star. (2020). [cit. 2020-12-21] Available at: https://www.morningstar.co.uk/. [13] Python Data Analysis Library. (2020) [cit. 2020-12-21] Available at: https://pandas.pydata.org/. [14] Python for finance. (2020) Investment portfolio optimization with Python, [cit. 2020-12-21] Available at: https:// https://www.pythonforfinance.net/2017/01/21/investment-portfolio-optimisation-with-python. [15] Roebers, L . M . , Selvi, A . , Vera, J. C . (2019) Using column generation to solve extensions to the Markowitz model, The Engineering Economist, 64:3, 275-288, DOI: 10.1080/0013791X.2019.1636439. [16] Streichert, F., Ulmer, H , & Zell, A . (2004). Evolutionary algorithms and the cardinality constrained portfolio optimization problem. Operations Research Proceedings, 2003, 253-260. [17] Way, R., Lafond, F-, Lillo, F., Panchenko, V . , Farmer, J.D. (2019) Wright meets Markowitz: How standard portfolio theory changes when assets are technologies following experience curves, Journal of Economic Dynamics and Control, 101, 211-238, ISSN 0165-1889, https://doi.Org/10.1016/j.jedc.2018.10.006. [cit. 2020-12-21] Available at: http://www.sciencedirect.com/science/article/pii/S0165188919300181. 219 SBM models in data envelopment analysis: A comparative study Josef Jablonský1 Abstract. Data envelopment analysis (DEA) is a traditional modelling tool for relative efficiency and performance evaluation of a set of decision-making units. One of the classes of D E A models measures the level of efficiency using slack variables slacks-based measure ( S B M ) models. There have been proposed various models of this class for measuring efficiency and super-efficiency by various authors in the past. The paper compares the typical representatives of S B M efficiency and super-efficiency models and discusses their properties. Typical S B M models do not consider integer inputs and/or outputs. The paper extends typical S B M models to include integer conditions and analyses the differences in their results compared to the non-integer models. Keywords: Data envelopment analysis, slacks-based measure, S B M model, efficiency, integer programming J E L Classification: C44 A M S Classification: 90C15 1 Introduction D E A models were introduced by Charnes et al. (1978) as a tool for relative efficiency and performance evaluation of the set of decision-making units (DMUs). The model evaluates inputs (to be minimized in the typical case) and outputs (to be maximized) of the D M U under evaluation within the production possibility set defined by the linear combination of the D M U s in the set. The envelopment form of their input-oriented model is formulated as follows: Minimize 6q n subject to Yux i& +s ~j =6 qx qp J = m ' C1 ) 1=1 Z ? t t 4 * = l , . . . , r , i=i 2,>0, s~j >0, s% >0, i = 1,..., n,j= 1,..., m,k = 1,..., r, where n is the number of D M U s , m is the number of inputs with input values Xy, i = 1, n, j = 1, m, r is the number of outputs with output values y*, i = 1, n, k = 1, r, It, i = 1, ..., n, are the weights of the D M U s , s~j, j = 1,..., m, s£, k = 1,..., r, are slack variables, and D M U ? is the unit under evaluation. The optimal value 6q = 1 indicates that this unit is on the efficient frontier, and it is at least weakly efficient. The value 6q < 1 indicates inefficiency. It is a radial measure of efficiency because all inputs of the model must be reduced by 0q to reach the efficient frontier. Since 1978, when the first D E A model was formulated, many modifications of the traditional model (1) have been proposed by various authors. Very popular is the group of models that measure the level of efficiency using slack variables only. This paper aims to review the main representatives of slacks-based D E A models and discuss their properties. Section 2 contains the formulation of typical slacks-based models. Section 3 deals with slacks-based superefficiency models that allow ranking of D M U s identified as efficient by traditional models. Both sections 2 and 3 show the extension of the models by integer constraints and discuss how these constraints affect the results. The next section of the paper contains numerical illustration. Finally, the study concludes by discussion and possible research directions. 1 Prague University of Economics and Business, Faculty of Informatics and Statistics, Department of Econometrics, W . Churchill Sq. 4, Praha 3, Czech Republic, e-mail: jablon@vse.cz 220 2 Slacks-based DEA models The radial input-oriented model (1) considers proportional input reductions while keeping the current level of outputs. On the contrary, the radial output-oriented models increase outputs by considering the current level of inputs. Slacks-based D E A models consider in efficiency evaluation both input and output variables simultaneously. Charnes et al. (1985) formulated the following model that is often denoted as the additive D E A model: m r Maximize X 5 7 + X 5 * (2 ) j=i k=i n subjectto Y,x ijA i+s ] =x qj' 7=1...., m, i=i Y,yikX i-s+ k=yqk' k=l,...,r, (3) 1=1 2,>0, s] > 0 , s%>0, i= 1,..., n,j= 1,..., m,k = 1,..., r, The model (2) - (3) returns the optimal objective function value equal to 0 for the efficient D M U s , and the value greater than 0 for the inefficient units. The main drawback of model (2) - (3) is the incomparability of slacks' units included in the objective function. The value of the objective function cannot be explained in any way. Moreover, the efficiency score (optimal value of the objective function) is not invariant on the scale used. A possible solution is to normalize the input/output values by dividing by their maximum. Another possibility is to use a weighted slacks-based model proposed by (Ali et al., 1995). This model has the same set of constraints (3), but its objective function is the following: in r Maximize X ^ j 5 /+ ^w ks t (4) j=i k=i where w j , w\ are the positive weights of the slacks. The problem of this model may be the appropriate setting of the weights, and again, the efficiency score can just hardly be explained to decision-makers. Both unweighted and weighted additive models can be extended by the constraints that ensure the assumptions about the returns to scale - the sum of lambda variables is unrestricted/equal to 1/greater or equal than 1/lower or equal than 1 for the assumption of constant/variable/non-decreasing/non-increasing returns to scale. The main drawbacks of the previous two models are solved by the model introduced by Tone (2001). This model is usually known as S B M (slacks-based measure) model. It belongs to one of the very often used D E A models at all. The S B M model is based on minimization of all slacks, but its objective function is formulated as follows: i m ' V ( , . . / „ ) Minimize p = — . (5) The set of constraints of the S B M model is the same as in additive models. The model is not linear but can be moved to a linear model easily. S B M model returns objective function p = 1 for efficient units and p < 1 for the inefficient ones. The advantage of this model in comparison with the additive models is in the following: Efficiency scores are invariant on the measurement scales of inputs and or outputs, Efficiency scores are decreasing functions of all slacks variables, i.e. increasing/decreasing of any input/output leads to a decreasing of the efficiency score of the unit under evaluation. The model (2), (5) is not linear in its objective function but can easily be linearized by Charnes-Cooper transformation. The unit under evaluation is efficient if p = 1; lower values indicate inefficiency. A s in the previous case, the S B M model can be extended by returns to scale constraints. In some cases, the inputs and outputs in D E A models may be defined as integers. So, it is necessary considering integer constraints in D E A models, i.e. the following expressions in the mathematical formulation of the models must be an integer: 221 Yux ii^ j=l,...,m, (6) 1=1 Jt = 1,..., r. In the input-oriented version of the C C R model (1), the set of constraints is modified as follows: 7 = 1 , . . . , « , n YjX ijX i=X 'j' 7=1,..., OT, 4 = yflt. *=l,...,r, (7) 2,>0, ^ > 0 , i = 1, ...,n,j= 1,..., m, x ^ > 0, integer, j=\,...,m, s% > 0, integer = 1,..., r, A n interesting formulation of an SBM-based model was published in (Tone, 2016). This model leads to finding the closest virtual unit on the efficient frontier by maximizing the objective function (5) instead of minimizing the traditional S B M model. The new model is denoted as the SBM-max model, and its objective function is optimized over the feasible set that is the same as the efficient frontier derived using the classical S B M model (sometimes called the S B M - m i n model). However, the feasible set is non-convex in typical cases. That is why the solution of the SBM-max model cannot be derived easily. Tone (2016) proposes an iterative algorithm for the approximation of this optimal solution. The efficiency score derived by the SBM-max model is always greater or equal to the S B M efficiency score. Additive and weighted additive models (2) - (3) and (3) - (4) can be extended by integer constraints easily by considering all slacks as integer values. However, in the S B M model (3), (5) is the situation with integer constraints more complex, and their adding leads to solving an integer non-linear program. The same holds for the SBM-max model that is even more computationally complex. 3 Slacks-based super-efficiency DEA models Traditional D E A models, including all models presented in the previous section, cannot distinguish among efficient units as they have identical efficiency scores. Therefore, many approaches how to rank efficient units have been proposed in the past. In this section, we will summarize the most important models that are based on measuring super-efficiency using slacks. The first model of this category was introduced by Tone (2002). This model is widely applied in a variety of studies. This model removes the unit under the evaluation from the dataset and looks for a unit D M U * with inputs x/*, j = 1,..., m, and outputs yt*, k = 1,..., r that is efficient in the S B M model (3), (5) after this removal. The super-efficiency measure is the distance of the unit under evaluation, and the D M U * measure using slack variables. The model is as follows: i m „ X qj TYl •— Minimize — — (8) 1 r * r k=\ n subject to ^ xijXi +s~j = x*, j = 1,..., m, i=l, feq n Z y i k ^ - 4 = yl * = i,...,r, (9) i=l,i±q xqj O, s] >0, s£> O, k = 1,..., r, i = 1,..., «,7 = !,.••) m,k= 1,..., r, The objective function (8) is always greater or equal thanl - it is equal to 1 for S B M inefficient D M U s and greater than 1 for S B M efficient ones. That is why this model cannot be used to obtain a complete ranking of all D M U s but just for the units identified as efficient by the S B M model. The objective function is not linear but can be transformed into a linear program, similarly as in the S B M model. Except for the linearized version of his model, its input- and output-oriented versions are often used. Two additive super-efficiency models were formulated in (Du et al., 2010). Both models have the same set of constraints and differ slightly in the objective function only. We will work with the second of the two models. The procedure works in two steps. The first step consists of solving an optimization model. In the second step, the results of the optimization model are used for deriving the super-efficiency measure. The optimization model is the following: Minimize subject to 1 m + r z ^ + z ^ j=\X qj k=\y qk (10) Z v - - - - v / yqk> h > 0, sj > 0, s£ > 0, j= 1,..., m, k= 1,..., r, (11) i = 1,..., n,j = 1,..., m, k = 1,..., r. Let s: *,j 1,..., m, and s£ *, k = 1,..., r, be the optimal values slacks in solving model (10) - (11). Then, the super-efficiency measure Sq for the unit under evaluation is calculated using the following formula: 1 + —> — m j=ix w 1 r c + * r k=i yqk (12) This measure is always greater or equal to 1. The disadvantage of this approach is that the optimization model itself does not directly return the super-efficiency score, but it must be calculated using (12) in the second step. Jablonský (2012) introduced a super-efficiency model based on the goal programming methodology. Its formulation follows: Minimize subject to l + /D + ( l - i ) I [ * ; . ' * ] + Z I > « ' y J k=l 7 = 1 , . . . (13) Z yA+h •yqk> s+ n^Dxgi,s-k2 0 , l i > 0 , t e <0,1>. k= 1,..., r, j = 1,..., m,k = 1,..., r, j = 1,..., m,k = 1,..., r, / = 1,. (14) ., n, where sfl ,s+ jl,sk2, s^2 are negative and positive deviational variables from the inputs and outputs of the D M U ? . The model considers the undesirable deviations only (negative deviations for inputs and positive for outputs) in the objective function. D is the maximum relative deviation, and t is the parameter. Its value 0 leads to minimizing the sum of relative deviations, the value 1 to minimize the maximum deviation. The optimal objective function of the model is always greater than 1 for the units being S B M efficient. 223 If some of the inputs or outputs must be integers, the super-efficiency models (10) - (12) and (13) - (14) may be applied after a simple modification as presented in the previous section of the paper. However, Tone's superefficiency S B M model (8) - (9) requires solving a quite complex integer non-linear optimization problem, and it is hardly applicable in practice. 4 A numerical example The results of all presented models will be illustrated on a small example (12 D M U s , two inputs and two outputs) without any economic background. The dataset is presented in Table 1. This table also contains the results (efficiency scores) of the traditional C C R input-oriented model and the following SBM-based models: - (Unweighted) additive model (2) - (3) - A D D . The weighted additive model with the weights equal to the reciprocal values of maximums of each input and output - A D D W . - Tone's S B M model (3), (5). SBM-max model. Table 1 A dataset and efficiency scores for the illustrative example D M U s 11 12 O l 02 C C R A D D A D D W S B M S B M m a x D M U 1 5 42 126 10 1.000 0.00 0.000 1.000 1.000 D M U 2 14 84 184 16 0.742 79.71 0.843 0.632 0.737 D M U 3 7 49 150 12 1.000 0.00 0.000 1.000 1.000 D M U 4 3 27 82 6 1.000 0.00 0.000 1.000 1.000 D M U 5 18 98 201 9 0.664 118.00 1.694 0.351 0.381 D M U 6 9 72 96 11 0.527 129.14 1.030 0.506 0.612 D M U 7 6 52 101 18 1.000 0.00 0.000 1.000 1.000 D M U 8 3 36 73 3 0.890 21.00 0.309 0.530 0.693 D M U 9 10 72 144 7 0.648 86.62 1.065 0.407 0.556 D M U 1 0 6 46 142 8 1.000 0.00 0.000 1.000 1.000 D M U 1 1 5 28 85 7 1.000 0.00 0.000 1.000 1.000 D M U 1 2 6 38 63 4 0.538 59.20 0.652 0.388 0.551 In our example, all models identify 6 D M U s as efficient, and the same number is inefficient. The efficiency scores of both weighted and unweighted additive models can be just hardly explained. It is a simple sum of slacks, and the higher values correspond to the less efficient units. The efficiency scores obtained by the S B M model are always less than or equal to the scores produced by the C C R model - this property was proved in (Tone, 2001). SBM-max model finds a unit on the efficient frontier closer to the unit under evaluation, i.e. the efficiency score calculated by this concept is always greater than or equal to the S B M efficiency score and can be (but need not be) greater than the C C R efficiency score. Note that the ranking of inefficient units produced by all models is not the same even though all models (except C C R ) are based on the same principle. We have tested how the results are influenced by adding integer constraints. The efficiency scores of integer models always lead to only slightly better results than the data in Table 1, but the improvement was insignificant. Table 2 presents the results of several super-efficiency D E A models described in the previous section of the paper. Except for all SBM-based super-efficiency models, for comparison purposes, we added results of the radial super-efficiency model derived from C C R model (Andersen and Petersen model - AP). The included SBM-based models are the following: - Tone's super S B M model (8) - (9) - S S B M . - The model (10) - (12) proposed in (Du et al., 2010) - S D U . Goal programming model (13) - (14) for the values of parameter t = 0 and t = 1 - S B M G 0 and S B M G 1 . 224 Table 2 Super-efficiency scores DMUs A P SSBM SDU SBMGO S B M G 1 D M U l 1.0097 1.0058 1.0117 1.0117 1.0213 D M U 3 1.0142 1.0074 1.0149 1.0149 1.0308 D M U 4 1.0847 1.0406 1.0847 1.0780 1.0826 D M U 7 1.5000 1.2000 1.5000 1.3333 1.7739 D M U 10 1.0115 1.0057 1.0115 1.0114 1.0210 D M U 11 1.0054 1.0027 1.0054 1.0054 1.0108 The aim of this paper is not a detailed analysis of the results of super-efficiency models but rather a review of available SBM-based models. Interestingly, the models (Du et al., 2010) and Jablonsky (2012) return identical super-efficiency scores for some D M U s . The second of these two models solves just one optimization problem, whereas the first model works in two stages. Therefore, it is less convenient and computationally demanding. A l l presented models rank as the best D M U 7 followed by D M U 4 . The ranking of the remaining four S B M efficient units is not the same as shown in Table 2. 5 Conclusions This paper aimed to overview D E A models that measure the level of efficiency and/or super-efficiency using slack variables. This group of models is growing in popularity among researchers and is often used for comparison purposes as a complement to traditional radial D E A models. The illustrative example shows that the results of the presented models are not consistent with each other. Future research can include analyzing the mutual relations of the SBM-based models and their relation to other D E A models. Also, an analysis of integer and continuous SBM-based models is an open task. Even though various SBM-based models for network systems and multi-period analyses were proposed, the research in this field is open to new ideas. Acknowledgements The research is supported by the Czech Science Foundation, project no. 19-08985S - Models for efficiency and performance evaluation in non-homogeneous economic environment. References [1] A l i , A.I., Lerme, C.S. & Seiford, L . M . (1995). Components of efficiency evaluation in data envelopment analysis. European Journal of Operational Research, 80(2): 462^-73. [2] Charnes, A . , Cooper, W.W., Golany, B . , Seiford, L . M . & Stutz, J. (1985). Foundations of data envelopment analysis for Pareto-Koopman's efficient empirical production functions. Journal of Econometrics, 30(1-2): 1-17. [3] Charnes, A . , Cooper, W.W. & Rhodes, E . (1978). Measuring the efficiency of decision making units. European Journal of Operational Research, 2(6), 429—444. [4] Du, J., Liang, L . & Zhu, J. (2010). A slacks-based measure of super-efficiency in data envelopment analysis: A comment. European Journal of Operational Research, 204(3): 694-697. [5] Jablonský, J. (2012). Multicriteria Approaches for ranking of efficient units in D E A models. Central European Journal of Operations Research. 20(3), 435—4-49. [6] Tone, K . (2001). A slacks-based measure of efficiency in data envelopment analysis. European Journal of Operational Research, 130(3), 498-509. [7] Tone, K . (2002). A slacks-based measure of super-efficiency in data envelopment analysis. European Journal of Operational Research, 143(1), 32—41. [8] Tone, K . (2016). Data envelopment analysis as a Kaizen tool: S B M variations revisited. Bulletin of Mathematical Sciences and Applications, 16, 49-61. 225 Swap Heuristics for Emergency System Design with Multiple Facility Location Jaroslav Janáček1 , Marek Kvet2 Abstract. Appropriate deployment of facilities in a node set of a road network is a core of efficient emergency system design. The previously suggested models based on the original weighted /^-median problem did not reflect limited capacity of the deployed service centers. In addition, the original approaches assumed that only one facility can be located at a given road network node. This paper introduces a new formulation of the problem, where the temporary inaccessibility of the nearest service facility is taken into account and furthermore, a service center can be equipped with more than one facility. To be able to solve large instances of the generalized problem, a swap heuristics was suggested and a strategy combining the best admissible and first admissible strategies was studied. The hypothesis that even a simple swap heuristics is able to reach a near-to-optimal solution of the complex location problem was verified by the documented computational study. Keywords: location problems, emergency medical service system, multiple facility location, swap heuristics J E L Classification: C44 A M S Classification: 90C06, 90C10, 90C27 1 Introduction Advanced knowledge of mathematical modelling and optimization is an essential property of professional top managers responsible for efficient usage of money and other shared public resources - people, vehicles, technical equipment, etc. Another competitive advantage of the best leaders consists in the ability to find a suitable solving tool for any problem. Different quantitative methods developed by the operations researchers and other IT experts can significantly help us in making the right decisions or in choosing the best admissible solution from the set of all alternatives. Due to a wide range of possible applications of mentioned mathematical methods not only in Economics, but also in many other fields, we concentrate our effort on developing an effective and fast solving tool for a special class of problems used to optimize the urgent pre-hospital healthcare system. The studied challenge originates from the weighted /^-median problem, the idea of which consists in choosing given exact number p of service center locations from a specific set of candidates in order to optimize the quality criterion of system design [1, 2, 3, 7, 13, 14]. Even i f this kind of problem is solvable quite well either by various exact [5, 8, 15] or heuristic methods [4, 6, 17], the mathematical model itself does not take into account several important aspects of the real system especially when the Emergency Medical Service ( E M S ) system is considered. Therefore, the original formulation needs some extension. The first disadvantage of common weighted /^-median problem can be expressed as disregard for stochastic behavior of real E M S system and randomly occurring demands for service. When a new emergency occurs, the nearest located center from the demand point may not have sufficient capacity to cover the demand. In such a case, the request is assigned to the closest available crew, which does not have to be the nearest one. This model extension can be achieved by the concept of so-called generalized disutility, which allows us to model providing the service from more centers [10, 11, 12]. Another weakness of the original model lies in the fact that it does not allow to locate more facilities in the same network node. It means that no candidate for service center may get equipped with more than one ambulance vehicle or other resource. On the other hand, if we look at some bigger cities, it is common that there are more service centers spread over the territory. For example, there are four E M S stations located in Žilina. Thus, the associated facility location problem should comply with multiple facility location. 1 University of Žilina, Faculty of Management Science and Informatics, Univerzitná 8215/1, 010 26 Žilina, Slovakia, jaroslav.janacek@fri.uniza.sk 2 University of Žilina, Faculty of Management Science and Informatics, Univerzitná 8215/1, 010 26 Žilina, Slovakia, marek.kvet@fri.uniza.sk 226 Obviously, adjusting the original model to these new requirements can make the problem harder for effective and fast solving. Therefore, the main research idea of this paper consists in suggesting a swap heuristics, which would be able to comply with multiple facility location in one network node. The background of this heuristic approach was introduced for the original weighted /^-median problem in [16]. It is based on processing the set of feasible solutions of the problem formed by a special set called uniformly deployed set. The biggest advantage of the uniformly deployed set of solutions lies in its constructing, which is completely independent from the solved problem and used objective function [9]. To study the basic characteristics of suggested heuristic approach (computational time demands and the resulting solution accuracy), the numerical experiments with real-world middle-sized problem instances have been performed and the obtained results are discussed in a separate section. 2 P-Facility Location Problem with Multiple Facility Location A /^-location problem formulation is often used in connection with public service system designing, where the designed system services randomly occurring demands from a group of service centers, which have limited ability to yield service. The designed system provides the service to people, who are concentrated at n dwelling places of a serviced region. It is assumed that a dwelling place j = 1, ..., n will generate demands for service with frequency bj. The system services a demand emerged at the dwelling place j by a facility, which is located at a service center location i. The destined facility, e.g. servicing vehicle, has to travel from location i to the location j, then it satisfies the demand and returns back to the service center. When employed by the service, the facility is unable to service other demands. That is why, a currently emerged demand cannot be satisfied ever from the closest service center, but it is covered by the first available facility in general. To describe this mode of facility operating in an associated model, the probability values pi, pr are introduced to express by the value pt a number of cases, when a demand is satisfied from the k-th nearest facility due to the fact that this facility is the nearest available one [10, 11, 12], A n original public service system design problem is usually formulated as a choice of p service center locations from a set of m possible service center locations so that the expected time distance from a demand location to the nearest available facility location is minimal. This original problem has been broadly studied and successfully solved by many authors [1, 2, 3, 4, 5, 6, 7, 13, 14], but their approaches assumed that exactly p service centers were to be deployed. As each possible service center location usually corresponds to one road network node, the assumption acceptation excludes situations, when more than one facility is established at one service center location. If we introduce symbol dy to denote a network time-distance between locations i and j, then a combinatorial model of (1) can describe the original problem. ^ X i , J Z « A ( p j , * ) J ' P c f t ...,m},\P\ = p\ (1) I j=l k=l J For given demand location j, the ranking function v(P, j, k) returns the k-th nearest service center location from the set P. A much more complex problem is faced, when the case of service centers equipped with more than one facility is admitted. To model the case, a mapping y: P —» {1, ...,/?} is introduced to express the number y(i) of facilities assigned to a service center ieP. Then the model (1) can be adapted to the studied case in the form of (2). ™n \ t . b ^ k d w ^ k l j , P ^ { \ , . . . , m } , M 1 ' - ' PY",!>('') = /4 (2) [ j=l k=\ isP J The models (1) and (2) are very similar, but the ranking function w(P, y,j, k) is much more complex than v(P,j, k), what can be demonstrated by the following algorithm, which maps the quadruple [P, y, j, k] to an element of P. w(P, y,j,k) 0. Order the elements of P into a sequence z'(l), i(\P\) so that the following inequalities hold: d,(i),j < di(2),j< ... | - 1) = \P\.(m-\). Based on the above explanation, we introduce the swap operation Swap(P, y, i, j) for ieP and je D-{i] defined by the following commands: Swap(P, y, i, j) If7 e D - P then P = P u {;'}, y(j) = 1, else y(j) = y(j) + 1. Set =) p.(m-l) and Threshold = 0. In the mixed strategy, the best-found admissible solution is used to update the current solution. The proposed swap heuristic with mixed strategy performs according to the following steps: 228 SwapMS(P, y, maxNos, Threshold) 0. {Initialization of the best-found solution} Set/* =/(P, y), P* = P, y* = y. 1. {Initialization of the neighborhood search} Initialize list L of all pairs [i,j]sPx (D-{i}) and set Nos = 0. Perform step 2 and having performed step 2, continue with step 3. 2. {The neighborhood search} While Nos < maxNos and L ? 0do Withdraw a pair [i,j] from L. IffiSwap (P*, y\ [0,1] with ip(0) = 1 and ip(l) = 0. The L W S - R estimator corresponds to a reciprocal weight function as in [11], i.e. the magnitudes of the weights 1 , 1 / 2 , 1 / 3 , . . . ,1/n are assigned to individual observations after a permutation, which is determined only implicitly (in the course of computing the estimator). For the sake of comparisons, we consider three other weight functions. Linear weights are defined by = 1 - t , (6(0,1), (5) 240 Table 1 The 31 datasets together with their basic characteristics (n and p). Ranks corresponding to the mean prediction errors evaluated in (1) in a 10-fold cross validation for various versions of the L W S estimator are presented here. The novel L W S - R method (3) is presented in the last column. I Version of L W S Index Dataset Response variable n P (5) (6) (7) (3) 1 Aircraft Cost 23 5 1 3 4 2 2 Ammonia Unprocessed percentage 21 4 2 4 1 3 3 Auto M P G Miles per gallon 392 5 2 3 4 1 4 Boston housing Crime rate 506 6 4 3 2 1 5 Building Electricity consumption 4208 7 3 4 1 2 6 California housing Median house price 20640 9 2 1 3 4 7 Cirrhosis Death rate 46 5 3 1 4 2 8 Coleman Test score 20 4 2 1 3 9 Concrete compression Concrete compression strength strength 1030 7 2 4 3 1 10 Delivery Delivery time 25 3 1 2 4 3 11 Education Education expenditures 50 4 1 4 3 2 12 Electricity Output 16 4 2 1 4 3 13 Employment # of employed people 16 7 3 1 2 4 14 Engel Food expenditures 235 2 4 3 1 2 15 Furniture Log relative wage 11 2 4 1 2 3 16 Houseprices Selling price 28 6 3 4 1 2 17 Imports Level of imports 18 4 2 4 1 3 18 Investment Investment 22 2 2 1 3 4 19 Istanbul stock exchange Istanbul index 536 8 3 2 1 4 20 Kootenay Newgate 13 2 1 3 4 2 21 Livestock Expenses 19 5 4 2 1 3 22 Machine PRP 209 7 4 2 3 1 23 Murders # of murders 20 4 3 4 2 1 24 N O x emissions L N O x 8088 4 2 4 1 3 25 Octane Octane rating 82 5 2 3 1 4 26 Pasture Pasture rental price 67 4 2 3 1 4 27 Pension Reserves 18 2 1 2 3 4 28 Petrol Consumption 48 5 1 2 4 3 29 Stars C Y G Log temperature 47 2 3 1 4 2 30 Travel and tourism TSI 141 13 3 2 4 1 31 Wood Wood gravity 20 6 4 1 2 3 241 trimmed linear weights generated by the weight function V-TL(Í) = ( l - ^ • l [ t < r], t G (0,1), (6) where denotes an indicator function, and weights generated by the error function ýE(t) = 1 - f exp{-x2 }dx, t G (0,1). (7) The (normalized) ranks Ri(b)/n for i = 1 , . . . , n play the role of í G (0,1) within (4). 3.1 Numerical experiments We performed a numerical study over 31 datasets with economic motivation with the aim to compare reciprocal weights within the L W S estimator in linear regression with other choices of weights. The datasets were carefully selected so that the linear model is meaningful for them. To describe the experiments, each of 4 versions of the L W S estimator is computed for each of the 31 datasets. In a 10-fold cross validation, the mean square error as the most basic measure of prediction ability is evaluated for each situation. Based on the results, we computed the ranks corresponding to the mean prediction errors (MSE) of various linear regression estimators. Let us first present aggregated results over the 31 datasets. The L W S estimator with trimmed linear weights (7) turns out to be the best among the 4 versions of the L W S . Particularly, (5) yields the minimal prediction error for 6 datasets (19 % of the datasets), (6) for 8 datasets (26 % of the datasets), (7) for 11 datasets (35 % of the datasets), and L W S - R (3) for 6 datasets (19 % of the estimators). In addition to the aggregated results, let us discuss also the results for individual datasets. These are presented in Table 1. Let us now explain the presented results on a particular example considering the first dataset denoted as Aicraft; the L W S estimator with the weight function (5) turns out to be the best for this dataset, (3) is the next, (6) comes as the third, and the weight function (7) yields the worst (largest) value of M S E among the four versions of the L W S for this dataset. 4 Conclusions As we can currently experience disruptive or unexpected patterns in the economies around the world (not only) as a consequence of the COVID-19 pandemic, it is natural to think about the perspective of methods of chaos theory and/or fractals within econometric data analysis. This paper gives an overview of some recent fractal-based tools for the analysis of economic data with a (multi)fractal structure; recent applications turn out to appear especially in the context of financial time series and only rarely in other tasks for other than temporal data. Further, we realized that a data analysis approach based on reciprocal weights assigned to individual observations can be interpreted as a method inspired by fractals [11]. We arrive at introducing implicit reciprocal weights to individual observations in the context of the L W S - R estimator in the linear regression model. Our numerical study over 31 datasets in the linear regression model is especially focused on the performance of the novel L W S - R method. The results however reveal that the L W S - R estimator is able to overcome other versions of the L W S estimator (with other choices of weights) only for a small percentage of the datasets. Nevertheless, we recommend to repeat such study for data with a larger number of variables. Without surprise, no weights turn out to be uniformly the best across all datasets; trimmed linear weights are the best in more datasets than any other weights. Optimality of weights can be derived only under specific assumptions on the distribution of the distances; such optimality results were derived for nonparametric hypothesis tests for multivariate (possibly highdimensional) data in [12]. It is natural to plan future research of the performance of estimators based on reciprocal weights in other tasks, such as in nonlinear regression or in the task to estimate expectation and scatter in the multivariate model, i.e. by means of the minimum weighted covariance determinant ( M W C D ) estimator, which is based on robust Mahalanobis distances. Acknowledgements The work was supported by the Czech Science Foundation projects GA21-1931 IS (Information flow and equilibrium in financial markets). The author is grateful to Lubomír Soukup and Marcel Jiřina for discussion. References [1] Biais, B., Foucault, T , & Moinas, S. (2015). Equilibrium fast trading. Journal of Financial Economics, 116, 292-313. 242 [2] Briggs, J. (2015). Fractals: The patterns of chaos. Discovering a new aesthetic of art, science, and nature. Echo Point Book & Media, Brattleboro. [3] Calvet, L . E . & Fisher, A.J. (2013). Extreme risk and fractal regularity in finance. Contemporary Mathematics, 601, 65-94. [4] Caporale, G . M . , Gil-Alana, L . , Plastun, A . , & Makarenko, I. (2016). Intraday anomalies and market efficiency: A trading robot analysis. Computational Economics, 47, 275-295. [5] Chen, F , Tian, K . , Ding, X . , et al. (2017). Multifractal characteristics in maritime economics volatility. International Journal of Transport Economics, 44(3), 365-380. [6] Dhifaoui, Z . (2016). Robust to noise and outliers estimator of correlation dimension. Chaos, Solitons & Fractals, 93, 169-174. [7] Faggini, M . & Parziale, A . (2016). More than 20 years of chaos in economics. Mind & Society, 15, 53-69. [8] Faggini, M . , Bruno, B . , & Parziale, A . (2019). Does chaos matter in financial time series analysis? International Journal of Economics and Financial Issues, 9(4), 18-24. [9] Higgins, D . M . (2017). Residential property market performance and extreme risk measures. Pacific Rim Property Research Journal, 23, 1-13. [10] Jiang, Z.Q., Xie, W.J., Zhou, W.X., & Sornette, D . (2019). Multifractal analysis of financial markets. Reports on Progress in Physics, 82(12), Article 125901. [11] Jiřina, M . & Jiřina, M . (2015). Classification using Zipfian kernel. Journal of Classification, 32, 305-326. [12] Jurečková, J. & Kalina, J. (2012). Nonparametric multivariate rank tests and their unbiasedness. Bernoulli, 18(1), 229-251. [13] Jurečková, J., Picek, J., & Schindler, M . (2019). Robust statistical methods with R. 2nd edn. Boca Raton: C R C Press. [14] Kalina, J. (2012). Highly robust statistical methods in medical image analysis. Biocybernetics and Biomedical Engineering, 32(2), 3-16. [15] Kalina, J. (2014). On robust information extraction from high-dimensional data. Serbian Journal of Management, 9(1), 131-144. [16] Kalina, J. (2021). Managerial decision support in the post-COVID-19 era: Towards information-based management. In L . C . Carvalho, L . Reis, & C . Silveira (Eds.), Handbook of Research on Entrepreneurship, Innovation, Sustainability, andlCTs in the Post-COVID-19 Era (pp. 225-241). Hershey: IGI Global. [17] Klioutchnikov, I., Sigova, M . , & Beizerov, N . (2017). Chaos theory in finance. Procedia Computer Science, 119,368-375. [18] Lahmiri, S. & Bekiros, S. (2020). B i g data analytics using multi-fractal wavelet leaders in high-frequency Bitcoin markets. Chaos, Solitons & Fractals, 131, Article 109472. [19] Latif, N . , Pečarič, D . & Pečarič, J. (2017). Majorization, Csiszár divergence and Zipf-Mandelbrot law. Journal of Inequalities and Applications, 2017, Article 197. [20] Liu, Z., Ye, Y , M a , F , & Liu, J. (2017). Can economic policy uncertainty help to forecast the volatility: A multifractal perspective. Physica A, 482, 181-188. [21] Redko, V.G. & Sokhova, Z . B . (2017). Processes of self-organization in the community of investors and producers. Studies in Computational Intelligence, 736, 163-169. [22] Segovia, J.E.T., Fernandez-Martinez, M . , & Sánchez-Granero, M . A . (2019). A novel approach to detect volatility clusters in financial time series. Physica A, 535, Article 122452. [23] Siokis, F M . (2014). European economies in crisis: A multifractal analysis of disruptive economic events and the effects of financial assistance. Physica A, 395, 283-292. [24] Stanley, H.E., Gabaix, X . , Gopikrishnan, P., & Plerou, V. (2007). Economic fluctuations and statistical physics: Quantifying extremely rare and less rare events in finance. Physica A, 382(1), 286-301. [25] Traina, C , Traina, A . , Wu, L . , & Faloutsos, C. (2010). Fast feature selection using fractal dimension-Ten years later. Journal of Information and Data Management, 1(1), 17-20. [26] Víšek, J.A. (2011). Consistency of the least weighted squares under heteroscedasticity. Kybernetika, 47, 179-206. [27] Yan, Q., Su, M . , Wu, Y , & Wang, X . (2019). Economic efficiency evaluation of coastal tourism cities based on fractal theory. Journal of Coastal Research, 93, 836-842. 243 LTPD variables inspection plans and effect of wrong process average estimates Nikola Kaspříková1 Abstract. The lot tolerance proportion defective acceptance sampling plans were designed by Dodge and Romig to minimize the mean number of items inspected per lot of the process average quality when the remainder of rejected lots is inspected (rectifying plans). It has been shown that for the attributes sampling plans the mean number of items inspected per lot of the process average quality increases both when the true process average is greater and when the true process average is smaller than the estimated value of this parameter. The paper addresses the sampling plans for the inspection by variables and considers the effects of wrong guess of the process average quality value on the economic performance of the plans measured by the mean number of items inspected per lot of the process average quality. The rectifying lot tolerance proportion defective sampling plans are calculated and evaluated using an R software extension package. Keywords: acceptance sampling, inspection cost, LTPD, A O Q L J E L Classification: C44 A M S Classification: 90C15 1 Introduction The lot tolerance proportion defective (LTPD) acceptance sampling plans were designed by Dodge and Romig to minimize the mean number of items inspected per lot of the process average quality when the remainder of rejected lots is inspected (rectifying plans). The plans were originally designed by Dodge and Romig for the inspection by attributes. Plans for the inspection by variables and for the inspection by variables and attributes (all items from the sample are inspected by variables, the remainder of rejected lots is inspected by attributes) were then proposed and it was shown that these plans are in many situations more economical than the corresponding Dodge-Romig attribute sampling plans. The L T P D plans for inspection by variables and attributes have been introduced in [7], using approximate calculation of the plans. Exact operating characteristic, using non-central t distribution, has been later implemented for the calculation of the plans in the LTPDvar package [6]. The operating characteristics used for these plans are discussed by Jennett and Welch in [3] and by Johnson and Welch in [4]. It has been shown that these plans are in many situations better than the original attribute sampling plans, see the analysis provided in [8]. The calculation of the L T P D variables sampling plans is implemented in the R extension package [6], providing both operating characteristics shown in [3] and [4]. Furthermore, the package covers the L T P D variables plans which are using the exponentially weighted moving average ( E W M A ) statistic in the inspection procedure to reflect the recent development in acceptance sampling plans design, for more details and references see [6]. The economic performance of the Dodge and Romig plans is based on good estimate of the process average quality, as discussed in [2] for the case of attribute sampling plans. The recent paper [5] showed the effects of wrong guess of the process average quality on the Average Outgoing Quality Limit (AOQL) plans for the inspection by variables. This paper considers the L T P D plans proposed in [8] and shows the economic characteristics of these plans measured by the mean cost of inspection per lot of the process average quality. It considers the effects of wrong guess of the process average quality value on the economic performance of the plans. The structure of this paper is as follows: first, the design of the original Dodge-Romig L T P D sampling plans for the inspection by attributes (see [1]) is recalled. Then we recall the design of the the L T P D variables sampling plans as shown in [8]. The optimal acceptance sampling plan for the unknown standard deviation case is calculated in a short case study and the effect of the wrong supposed value of the process average proportion defective on the mean inspection cost per lot of the process average quality is then shown. The calculation and economic evaluation of the plans is done using the free [6] software which has been published on the Comprehensive R Archive Network. 1 Prague University of Business and Economics, Department of Mathematics, Nám. W. Churchilla 4, Praha, Czech Republic 244 2 LTPD attributes inspection plans For the inspection procedures in which each inspected item is classified as either good or defective (the acceptance sampling by attributes), Dodge and Romig (see [1]) consider sampling plans («, c) which minimize the mean number of items inspected per lot of process average quality assuming that the remainder of the rejected lots is inspected Is = N-(N-n)-L(p;n;c) (1) under the condition L(pt;n;c) k or > k. (3) The exact operating characteristic for this case is (see the approximative and the exact operating characteristic in [3] and [4]) poo L(p,n,k)= / g(t,n — \,u\-p\fn)dt, (4) Jk^fn where g(t,n — \,u\-py/n) is probability density of the noncentral t distribution with (n — 1) degrees of freedom and noncentrality parameter ui-py/n, where u\-p is (1 — p) • 100% quantile of the standard normal distribution. The plan parameters (n,k) are determined so that the plan has optimal economic characteristics and satisfies the requirement (2), when (4) is used as the operating characteristic. The optimal economic characteristics shall mean that the mean inspection cost per lot of the process average quality is minimized Ims = N-{N-n)-L{p;n;k). (5) 4 Economic evaluation of the plans Let's calculate the L T P D acceptance sampling plan for sampling inspection by variables if the standard deviation of the quality characteristic is unknown in a case study below. The economic performance of the plan will be evaluated with the mean inspection cost per lot of the process average quality. Example 1. A lot with N = 1000 items is considered in the acceptance procedure. The lot tolerance proportion defective is given to be pt = 0.01, and the process average quality is p = 0.001. Find the L T P D acceptance sampling plan for sampling inspection by variables when remainder of rejected lots is inspected. 245 The plan can be calculated using the functions available in the LTPDvar package for the R software [9], see the documentation of the package for a more detailed description. The solution is n = 85, k = 2.627151. The mean inspection cost per lot of the process average quality for this plan is 104.67. The values of the input parameters influence the resulting sampling plan and its economic characteristics. The L T P D acceptance sampling plans are optimized with respect to mean inspection cost per lot of the process average quality p. If the supposed value of process average quality (let us denote such value pa) is different from the true value of p, the resulting acceptance sampling plan will still satisfy the condition (2), but the value of IMS may not be optimal. The Table 1 shows the mean inspection cost per lot of the process average quality IMS for plans calculated for various values of the supposed process average quality pa, keeping the other parameters from our example unchanged. The values of the mean inspection cost (see also Figure 1) are increasing if the guess value pa becomes farther from the true process average quality value. Underestimating the true values shows more significant increase in the mean inspection cost than overestimating p. 125 120 115 110 105 0.0006 0.0008 0.0010 0.0012 0.0014 Pa Figure 1 Inspection cost for plans constructed for various p guess pa, real p = 0.001 Example 2. Let us change some of the input parameter values from Example 1 and consider now the lot size N = 5000, p = 0.005. The Table 2 and Figure 2 show the situation after parameter update in Example 2. The values of the mean inspection cost are again increasing i f the guess value pa becomes farther from the true process average quality value, underestimating the true values shows more significant cost increase. It may be observed that the outcomes are very similar to those obtained in [5] for A O Q L variables sampling plans and both are in line with results shown in [2] for the corresponding Dodge-Romig attribute sampling plans. 246 Pa n k 0.0005 62 2.686609 124 65 0.0006 67 2.67084 115 95 0.0007 72 2.656909 110 13 0.0008 76 2.646865 107 16 0.0009 81 2.635469 105 16 0.001 85 2.627151 104 67 0.0011 90 2.617612 105 18 0.0012 94 2.610582 106 32 0.0013 98 2.604022 107 98 0.0014 103 2.596409 110 66 0.0015 107 2.590737 113 19 Table 1 L T P D plans for process average quality guess between 0.0005 and 0.0015, real p = 0.001 1400 1300 1200 - 1100 1000 900 800 "I 0.003 1 0.004 1 0.005 0.006 0.007 Pa Figure 2 Inspection cost for plans constructed for various pa, real p = 0.005 247 Pa n k hns 0.003 286 2.481788 1278.48 0.0035 345 2.467027 1060.68 0.004 414 2.454095 897.95 0.0045 497 2.442386 795.91 0.005 596 2.431858 761.91 0.0055 715 2.422303 795.37 0.006 857 2.413686 890.14 0.0065 1027 2.405877 1038.19 0.007 1223 2.399022 1226.11 Table 2 L T P D plans for process average quality guess between 0.003 and 0.007, real p = 0.005 5 Conclusion This paper addressed rectifying L T P D sampling plans for the inspection by variables under the assumption that the standard deviation of the quality characteristic is unknown. The mean inspection cost per lot of the process average quality has been used as the economic characteristic of the plans. The effects of the supposed value of the process average proportion defective on the mean inspection cost per lot of the process average quality was shown and it has been observed that the mean inspection cost is increasing when the guess differs from true process average quality value in both directions, underestimating the true values showed more significant increase in the mean inspection cost than overestimating. Based on the results of the observations in the case study in this paper it seems that it is better to overestimate the process average quality than to underestimate it. Nevertheless the results of this paper are just observations in numerical experiments and it would be interesting to find some results in analytic form in future research. Acknowledgements This paper has been produced with contribution of long term institutional support of research activities by Faculty of Informatics and Statistics, Prague University of Business and Economics. References [1] Dodge, H . , F , and Romig, H . G . (1998). Sampling Inspection Tables: Single and Double Sampling. New York: John Wiley. [2] Hald, A . (1981). Statistical theory of sampling inspection by attributes. New York: Academic Press. [3] Jennett, W. J., and Welch, B . L . (1939). The Control of Proportion Defective as Judged by a Single Quality Characteristic Varying on a Continuous Scale, Supplement to the Journal of the Royal Statistical Society, 6, 80-88. [4] Johnson, N . L . , and Welch, B . L . (1940). Applications of the Non-central t distribution, Biometrika, 38, 362-389. [5] Kasprikova, N . (2020). Remarks on economic characteristics of rectifying A O Q L plans by variables. In International Conference on Mathematical Methods in Economics 2020 (pp. 253-258). Brno : Mendel University. [6] Kasprikova, N . (2020). LTPDvar: LTPD and AOQL plans for acceptance sampling inspection by variables. R package version 1.2. http://CRAN.R-project.org/package=LTPDvar. [7] Klufa, J. (1994). Acceptance sampling by variables when the remainder of rejected lots is inspected, Statistical Papers, 35, 337 - 349. [8] Klufa, J. (2010). Exact calculation of the Dodge-Romig L T P D single sampling plans for inspection by variables, Statistical Papers, 51(2), 297-305. [9] R Core Team. (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. U R L http://www.R-project.org 248 Flexible Job Shop Schedule generation in Evolution Algorithm with Differential Evolution hybridisation František K o b l a s a 1 , M i r o s l a v V a v r o u š e k 2 Abstract. Flexible Job Shop Scheduling becomes an emerging scheduling problem due to its nature to model constraints in holistic manufacturing systems. Its flexibility during sequencing and assigning tasks is typical for most smart factories, cyber physical systems, new systems in distribution and procurement. There are many ways to deal with these scheduling problems, and population-based heuristics are the most common and thriving. Evolution Algorithms are the most popular as most general and practically used in many optimisation areas, while Differential Evolution principles are considered the most successful. This paper addresses the problem representation by chromosome and schedule generation to be suitable for hybridising Evolution Algorithm optimisation with Differential Evolution principles. Semi-active, Active and Non-Delay schedules are experimentally compared on benchmark models to find their suitability to be represented by one or two chromatids chromosome. Subsequently, several Differential Evolution strategies are tested and discussed to find their suitability to be implemented as a mutation operator in the Random key-based Evolution Algorithm. Keywords: Flexible Job Shop, Scheduling, Chromosome Representation, Evolution Algorithm, Differential Evolution. J E L Classification: C63, L23 A M S Classification: 90C59 1 Introduction Job shop problems reflect decision-making during production planning, and their complexity rises over time as emerging technologies and production principles are developed at a high pace. Where classical Job Shop (JSP) reflects manufacturing systems suited for customer-oriented products, Flexible Job shops (FJSP) follow constraints given by flexibility of manufacturing devices [20], ability to simulate manufacturing[13] and maintenance[5] processes with the aid of Digital Twins [18]. There is a wide variety of heuristic and metaheuristic methods to solve FJSP, beginning with fast dispatching rules over searching strategies to population-based biomathematics inspired heuristics. Population-based metaheuristics [11] are long-time emerging techniques and the most commonly used in academic optimisation for their ease of use for various decision problems. The majority of them follow the basic pattern of Evolution, where individuals (solutions) are selected to be further modified by reproduction-recombination operations (crossover, mutation), after which old and new solutions compete or collaborate to create a new generation of better solutions. Despite a significant effort to specify them and present them as a different class of algorithms, they differ only in focus on a particular evolution operator. Genetic Algorithms [25] following a strict pattern of surviving of the fittest and recombination by crossover and mutation are often bound in literature with Boolean representation. Evolution Strategies [2] are using Real number representation of problem and mutation as recombination while focusing on the strategy of building and substituting old and new population. Differential evolution [3], with better parent-child selection and elimination strategy, uses distance in representation as a tool to recombine (evolve) individuals into the new population of successors. It is possible to get a swarm optimisation algorithm by using differential evolution while considering the same individual moving in time and space as two individual succeeding each other and keeping in mind the history of its motion (place, speed and direction). This article is not focusing on developing a new Evolution Algorithm (EA) inspired by an animal (Lion [28], Dolphin [26], Squirrel [10], Moth[17] or Bat [27] etc.) or universe (Blackhole[7], Lightning [21], Gravity [8]) behaviour, but Technical university of Liberec, Department of manufacturing systems and automation, Studentská 2, Liberec 1, Czech Republic 1 frantisek.koblasa@tul.cz. 2 miroslav.vavrousek@tul.cz. 249 rather on mechanical nature of itself optimisation. We propose hybridisation of the Evolution Algorithm by making new individuals with differential evolution recombination operators, which is not common in combinatorial opti- misation. Firstly, differences between classical Simple Genetic (Evolution) Algorithm ( S G A ) and Differential Evolution (DE) are explained and defined while pointing out its suitability and difficulties while implementing a typical level of process optimisation of D E on combinatorial problems as scheduling. Most common D E operators are explained to be further tested on benchmark problems. Secondly, the test problem class of FJSP problems is explained together with solution construction influenced by problem representation and schedule type generation. Different results (makespan and generation of convergence) given by various schedule generation and D E mutations are analysed to find the best suitable for S G A hybridisation. dGA, which follows S G A steps while using D E mutation operator, is proposed. 2 Evolution Algorithm and Differential Evolution D E follows the general approach of E A while keeping a specific approach to select parents, reproduction cycle and creating new generation (see Figure 1). D E makes new individuals base on crossover with every individual in the population. So while D E uses every individual, E A can use a specific strategy to select new parents [15]; thus, not every individual in the population has an opportunity to reproduce. E A reproduction is made by two (rarely by more) parents recombination during crossover and mutation based on individual gene exchanges. D E first generates mutants (1-10) base on several individuals from the population and then recombine them with the parent individuals (current population) during a crossover. Finally, elimination of D E differs i n rigid "parent vs childe" elimination strategy where offspring substitutes parent i f the value of the objective function is better [14]. In E A , parent and child can coexist in the same generation i f they succeed in the elimination step (elitist in this case). Initial population generation I Initial population generation Chromosome repair and the calculation oifixi) Eq. (11) T: Chromosome repair and the calculation of/ft,) Eq- (11) Parent selection (strategy - e.g. roulette wheel) Crossover and mutation strategy (eg. uniform) Chromosome repair and the calculation of/[.r,) T Individual (parents and children) selection (strategy) in to new population (eg. elitist) Generate mutant it, for every individual ,rrEq.(l-10) Crossover it, with it, uniformly on defined probability ^ t o make child x,* T Chromosome repair and the calculation,/^) of .r,* lfflx*,)>Jfy,) than x*, is replacing x, in population Selection of the best individual Selection of the best individual Figure 1 Simple Evolution and Differential Evolution comparison There are various approaches to generate mutant. The "simplest" way is to generate a gene base on randomly selected individuals (1), (2), (9). The class of „faster convergence to the local extreme" is including the gene of the fittest (3), (4), (7), (8). Some are taking beside best individuals also genes of the individual x, to perform crossover with (5), (6). The last group is those who manipulate with randomly selected individuals (9) by sorting them by fittest fix) or performing crossover as part of mutant creation (10). Tested mutation in this paper: R / l R/2 B / l B/2 C2b/1 u = r1+ F(r2 - r3) u = r1 + F(r2 - r3) + F(r4 - r5) " = x best + F(r2 - r3) " = x best + F(r2 - r3) + F(r4 - r5) u = xt+ F(xbest - Xi) + F(rx - r2) [23] [23] [23] [23] [4] (1) (2) (3) (4) (5) 250 C 2 b / 2 u =X i +F(Xbest - Xi) +F ( r i _r 2 ) +F(r2 - r 3 ) [4] (6) R2b/1 u = r i + F ( x 6 e s t - r 1 ) + F ( r 2 - r 3 ) [19,23] ( 7 ) R 2 b / 2 u =r ± + F(xbest - rt) + F(r2 - r 3 ) + F ( r 4 - r s ) [23] ( 8 ) R r l " = r 1 + F ( r 2 - r 3 ) ; / ( r 1 ) > / ( r 2 ) > / ( r 3 ) [12] (9) C 2 r u = xi+rand(0,l)(r1-xi) + F(r2-r3) [16] (10) where M is a so-called mutant, r, is a randomly selected individual from the population, x, is an individual whom crossover with mutant will be performed, Xbest is an individual with the best value of/(x), F=<0; 1 > is scaling factor. The following chapter describes the optimisation problem used to compare D E strategies (1-10) in combinatorial optimisation. 3 Flexible Job Shop scheduling problem Flexible Job Shop (FJS) scheduling problem is a well-known N P - Hard combinatorial problem [6], which besides classical sequencing common to scheduling problems, also includes machine job assigning. This research uses the most benchmarked FJS objective function of minimising makespan [9]f(xj=Cmax (11). Equations (12) and (13) are describing the operating sequence constraint [24]. Constraint (14) guarantees machine allocation so that each operation can be processed only on one machine (unlike distributed FJS) from the machine set at one time. The constraints (15) and (16) are non-negative or 0 - 1 binary variables, which are restrictions on decision variables. Cmax = max{cin.} (11) s-t- cik - c K k _ 1 } > tikjxikj,k = 2,...,ni,Vi,j (12) [{ctig ~ cik - thgj)xhgj] > 0 V [(ci f c - chg - tikj)xijk > 0] Vi.j, g,h (13) ^ xlkj = lVi,k,j (14) X ikj^^ik cik>0,Vi,k (15) xikj 6 {0,l},Vi,/c,y (16) Job indexes are noted as i and h i, h= 1,2.,...,« ; j is machine index, j = 1,2,...,m ; k, g are operation indexes, k, g= 1,2,..., «, (where «, is the total number of operations of job i). Processing time of k"1 operation of job i is % which leads to completion time of operation 0,k. Logic variable x-,kj stand for machine j is selected for 0 » . A± is a machine set covering operation variants. The constructive algorithm for FJSP is scheduling operation (a - for active (AS) and b - for non-delay schedules (ND) in step t = {l,n), where n is the total number of operations: 1. Creating list Vt of schedulable On operations, including all machine variants. 2. Conflict set creation a) Find possible earliest ending time ft*=min Ok in V t { c s ) and machine M* on which c* occurs b) Find possible earliest starting time st*=min Ok in Vt { s*} and machine M* on which occurs. 3. Choose optimal operation which requires M* and: a) its starting time c^.^ < cik b) its starting time st*=Sik 4. Continue until there is On unscheduled It is expected to get different quality of results of makespan (11) by different schedule generations as A S guarantees search in the neighbourhood where the optimal solution lies. It is impossible to construct another schedule by changing the processing order on the machines and having at least one job/operation finishing earlier and no job/operation finishing later. The subset of A S are N D schedules that can find the optimal solution. However, it is not guaranteed as it is constructed to no machine is kept idle while a job/an operation is waiting for processing. N D is included in the original comparison as most used in practice. Semi-active schedules are often used without recognition [29] (as well as active [22]) in scientific articles while solving FJSP by pair chromosome where the first represents job sequence, the second machine assignment. Semiactive schedules (SA) are still widely used as they allow optimisation of job assignment while having larger search space than A S or N D . In S A no job/operation can be finishing earlier without changing the order of processing on any one of the machines, 251 4 Random key-based SGA - dGA - DE Testing efficiency of Differential approaches to manipulate with genes (1) in E A is done by comparison of classical S G A and D E with hybrid d G A (Table 1) with a sequence of operators given by Figure 1 with the following setup: • Generations - up to the moment of population convergence or 200 generations without improvement of fix); the goal is to find how fast the algorithm converges. • Crossover - uniform crossover with probability pc= 0.85 that new individual will share gene of 2 n d parent. S G A uses classical crossover, D E and d G A use mutation-crossover mechanism (1-10). • Population size N= 2JM + 100, where J is the number of jobs and M is the number of machines. • Representation - Random key (RK) representation [1] is used for sequencing and new machine representation based on R K is introduced to substitute the usual Integer based representation of machine assignment so it can be optimised by D E mutant generation. Operation 1 II 2 II 3 II 4 15 Sequence * 10 0 0 1 4 Solution chromosome Sequence Chromatid O.73||0.25||0.38||0.85| 0.15 0..5| 0.25||0.3R||0.73| 0.85Solution chromosome Machine Chromatid 0.35||0.54||0.8l||0.29 0.18 1 / 0.18 0.54 ().8l||0.35| 0.29 Machine selection Options M l II M2 II M l II M3 M2 Machine M2 M4 M2 II M l 1 M3 Machine selection Options M2 II M4 |[ M2 II M5 ~ l <0-0.49><0.5-l> 0.29*M3M5 <0-0.49><0.5-l> 0.29*M3 Figure 2 Sequencing and machine selection (assigning) by R K chromosome • Parent selection - S G A and d G A - Roulette wheel. In D E all individuals in the population are selected. • Elimination - S G A and d G A use the Elitist strategy surviving of the fittest N, while D E uses parentoffspring tournament selection. • Schedule generation - A l l algorithms are tested with SA, A S , N D schedule generations. A S and N D are not dealing with job assigning by chromosome as it is done by the earliest possibly finished operation (step 2) while S A strictly uses chromosome information to reduce FJSP to JSP. SGA DE dGA SGA DE dGA Model JxM LB Best Schedule f(x)b f(x)a f(x)b f(x)a f(x)b f(x)a C(x)b C(x)a C(x)b C(x)a C(x)b C(x)a MkOl 10x6 36 40 SA 40 41.8 40 r 41.5 40 41.6 80 56 907.0 522.3 1052.0 530.3 MkOl 10x6 36 40 A 41 41.9 41 r 41.8 40 41.5 176 111 622.0 342.7 521.0 356.3MkOl 10x6 36 40 ND 41 42 41 r 41.8 41 41.9 120 71.8 409.0 251.8 422.0 295.8 Mk02 10x6 24 26 SA 29 29.7 36 r 37.7 29 30.5 163 131 919.0 387.2 1046.0 794.0 Mk02 10x6 24 26 A 31 33.3 33 r 35.5 31 33.7 402 275 787.0 312.8 985.0 447.7Mk02 10x6 24 26 ND 32 33.3 32 r 33.3 31 32.6 310 201 566.0 281.3 597.0 394.4 Mk03 15x8 204 204 SA 204 204 211 T 218.5 204 204 89 80.6 1195.0 620.0 1577.0 840.3 Mk03 15x8 204 204 A 204 204 204 r 204.4 204 204 449 305 664.0 448.6 579.0 376.9Mk03 15x8 204 204 ND 204 204 204 204 204 204 243 223 330.0 250.0 447.0 290.8 Mk04 15x8 48 60 SA 62 65.2 66 r 68.8 62 66.3 246 151 1119.0 545.5 997.0 697.5 Mk04 15x8 48 60 A 67 69.3 68 73 67 70.6 524 323 703.0 411.1 780.0 489.5Mk04 15x8 48 60 ND 66 66.9 67 r 67.3 66 66.8 353 242 757.0 335.3 696.0 432.0 Mk05 15x4 168 172 SA 173 176.1 180 r 182.3 173 176.6 282 164 985.0 624.6 1236.0 693.4 Mk05 15x4 168 172 A 182 185.1 187 r 190.1 183 187.3 727 497 747.0 464.3 894.0 514.2Mk05 15x4 168 172 ND 177 179.3 179 r 181.3 177 179 626 452 677.0 415.1 955.0 542.3 Mk06 10x15 33 57 SA 65 68.9 87 T 99.7 65 67.3 271 224 1497.0 819.8 1833.0 1333.6 Mk06 10x15 33 57 A 78 81.4 81 r 84.6 79 83.2 538 434 691.0 342.5 860.0 460.0Mk06 10x15 33 57 ND 74 79 79 r 81.9 77 81.4 572 365 657.0 346.6 891.0 418.3 Mk07 20x5 133 139 SA 144 147 157 r 168.1 144 147.2 241 164 1423.0 527.0 1826.0 1132.8 Mk07 20x5 133 139 A 164 173.4 175 T 180.9 173 177.5 650 476 843.0 344.4 792.0 471.9Mk07 20x5 133 139 ND 164 172.6 172 r 177.1 169 175.5 755 474 770.0 333.2 865.0 483.6 Mk08 20x10 523 523 SA 523 523 523 523 523 523 89 78.3 951.0 514.0 1069.0 643.3 Mk08 20x10 523 523 A 523 523 523 r 525.1 523 523.5 412 324 892.0 540.9 827.0 517.4Mk08 20x10 523 523 ND 523 523 523 523 523 523 50 46.4 231.0 211.9 228.0 210.6 Mk09 20x10 299 307 SA 307 316.2 369 388 311 330.7 320 245 1475.0 692.1 1916.0 1322.2 Mk09 20x10 299 307 A 373 385.8 385 395 375 385.1 725 550 818.0 433.5 819.0 511.1Mk09 20x10 299 307 ND 321 328.7 327 r 336.6 322 327.7 810 586 898.0 557.2 1112.0 629.8 MklO 20x15 165 197 SA 219 226.3 323 r 342.8 278 293.4 475 401 936.0 697.3 1923.0 1226.9 MklO 20x15 165 197 A 298 304.4 306 314.3 298 304.3 614 446 869.0 313.1 825.0 502.6MklO 20x15 165 197 ND 259 263.1 260 r 264.9 255 260.8 549 301 754.0 150.5 841.0 447.3 Table 1 Test results of M K 1 - M K 1 0 problem by S G A , D E and d G A evolution algorithms 252 Average best/(x)a (marked yellow in Table 1) in Table 1 shoves the best average of particular best mutation (for D E and dGA), not the average of all mutations. It is not possible to show detailed results as the size of this paper is limited. Results considering the objective function of makespan (total competition time i.e. production lead time) (11) show that best average results/(x)a based on 10 replications are obtained with Semi-active schedules. The only exceptions are problems of M K 9 - 1 0 (Table 1 - expected better but worse result is orange) where D E and d G A non-delay schedules give us better results. That means we can obtain regularly better results during optimisation by manipulating not only the sequence of operations as in Active and Non-delay schedules. Semi-active schedules offer us bigger flexibility in multicriteria optimisation, taking into account also other possible objective functions based on analysing machines (Utilisation, ROI, OEE). We can assume that Semi-active schedules using d G A and D E algorithms give us the best results of C(x), which is representing the generation of population convergence to one solution (so further optimisation is not possible). On the other hand, active schedules are the ones with the slowest convergence in the case of S G A (highest C(x)). The most desired algorithm property of late convergence (high C(x)) shoves superiority of d G A as it converges six times later than S G A in average and 1.38 later than D E . It has to be pointed out that d G A has not converged in most cases and experiments were ended after 200 generations without improvement of/(x). The main influence on C(x) has probably itself differential approach to generate new individuals. Another possible influence which can be the synergy of differential evolution recombination and Elitist elimination is unlike; however, it would be necessary to do the experiment in which S G A would use parent versus childe elimination to prove this hypothesis. Comparing S G A , D E and d G A in terms of /(x)b with theoretical lower bound ( L B ) and the best know solutions, It can be assumed that S G A has not significantly but still better results. A l l of the tested algorithms have found optimum in the case of M K 0 3 and M K 0 8 (/(x)a =/(x)b). One of the important goals was to find out i f there is some dominant D E mutation operator (1-10). Results have shown that none of the mutations shows significantly better/(x)a or/(x)b. However, R l (1) and C2b/1 (5) were the most frequent mutations finding the best and average best of 10 replications results in both D E and dGA. 5 Conclusion Our research found that Semi-active schedules generate the best results except for MK09-10. This was not expected as Active schedules generation searches in the neighbourhood which expect to have an optimal schedule. Proposed hybridisation of S G A by D E mutation as recombination operator shoves that results of/(x) are not much different. However, d G A and D E converges much slower than S G A , which opens possibilities in long run optimisations. None of D E mutations has shown dominance in both convergence and/(x); however, R l and C2b/1 with S A schedules were the most frequent in getting the best results. Further research will focus on improving the proposed d G A in job assigning part of the problem. It has strong potential in general optimisation thanks to not significantly worst results in combinatorial optimisation and known good results of D E in processing optimisation. Acknowledgements This work was supported by the Student Grant Competition of the Technical University of Liberec under the project Optimisation of manufacturing systems, 3D technologies and automation No. SGS-2019-5011 References [1] Bean, J. C. (1994). Genetic algorithms and random keys for sequencing and optimisation. ORSA journal on computing, 6(2), 154-160. [2] Beyer, H . - G , & Schwefel, H.-P. (2002). Evolution strategies - A comprehensive introduction. Natural Computing, 7(1), 3-52. https://doi.Org/10.1023/A:1015059928466 [3] Bilal, Pant, M . , Zaheer, H , Garcia-Hernandez, L . , & Abraham, A . (2020). Differential Evolution: A review of more than two decades of research. Engineering Applications of Artificial Intelligence, 90, 103479. https://doi.Org/l 0.1016/j .engappai.2020.103479 [4] Das, S., & Suganthan, P. N . (2010). Differential evolution: A survey of the state-of-the-art. IEEE transactions on evolutionary computation, 75(1), 4-31. 253 [5] Fuško, M . , Rakyta, M . , Krajčovič, M . , Dulina, L . , Gaso, M . , & Grznar, P. (2018). Basics of Designing Maintenance Processes in Industry 4.0. MM Science Journal, 2018, 2252-2259. https://doi.Org/l 0.17973/MMSJ .2018_03_2017104 [6] Garey, M . R., Johnson, D . S., & Sethi, R. (1976). Complexity of flow shop and job shop scheduling. Mathematics of Operations Research, 1(2), 117-129. Scopus. https://doi.org/10.1287/moor.L2.117 [7] Hatamlou, A . (2013). Black hole: A new heuristic optimisation approach for data clustering. Information sciences, 222, 175-184. [8] He, S., Zhu, L . , Wang, L., Y u , L . , & Yao, C . (2019). A modified gravitational search algorithm for function optimisation. IEEE Access, 7, 5984-5993. [9] Chaudhry, I. A., & Khan, A . A . (2016). A research survey: Review of flexible job shop scheduling techniques. International Transactions in Operational Research, 23(3), 551-591. https://doi.Org/10.l 11 l/itor.12199 [10] Jain, M . , Singh, V . , & Rani, A . (2019). A novel nature-inspired algorithm for optimisation: Squirrel search algorithm. Swarm and evolutionary computation, 44, 148-175. [11] Jedrzejowicz, P. (2019). Current Trends in the Population-Based Optimisation. In N . T. Nguyen, R. Chbeir, E. Exposito, P. Aniorté, & B . Trawiňski (Ed.), Computational Collective Intelligence (s. 523-534). Springer International Publishing, https://doi.org/10.1007/978-3-030-28377-3_43 [12] Kaelo, P., & A l i , M . M . (2006). Some variants of the controlled random search algorithm for global optimisation. Journal of optimisation theory and applications, 130(2), 253-264. [13] Kliment, M . , Trebuna, P., Pekarcikova, M . , Straka, M . , Trojan, J., & Duda, R. (2020). Production Efficiency Evaluation and Products' Quality Improvement Using Simulation. International Journal of Simulation Modelling, 19(3), 470-481. https://doi.org/10.2507/IJSIMM19-3-528 [14] Koblasa, F., Králíková, R., & Votrubec, R. (2020). Influence of E A control parameters to optimisation process of FJSSP problem. International Journal of Simulation Modelling, 19(3), 387-398. https://d0i.0rg/l 0.2507/IJSIMM 19-3-519 [15] Koblasa, F., Vavroušek, M . , & Manlig, F. (2020). Selection Strategies i n Evolution Algorithms and Biased Selection with Incest Control. 38th International Conference on Mathematical Methods in Economics 2020. [16] Mallipeddi, R., Suganthan, P. N . , Pan, Q . - K , & Tasgetiren, M . F. (2011). Differential evolution algorithm with ensemble of parameters and mutation strategies. Applied soft computing, 11(2), 1679-1696. [17] Mirjalili, S. (2015). Moth-flame optimisation algorithm: A novel nature-inspired heuristic paradigm. Knowledge-Based Systems, 89, 228-249. https://doi.Org/10.1016/j.knosys.2015.07.006 [18] Pekarčíková, M . , Trebuňa, P., Kliment, M . , Edl, M . , & Rosocha, L . (2020). Transformation the Logistics to Digital Logistics: Theoretical approach. Acta Logistica, 7(4), 217-223. https://doi.org/10.22306/al.v7i4.174 [19] Qin, A . K , Huang, V . L . , & Suganthan, P. N . (2008). Differential evolution algorithm with strategy adaptation for global numerical optimisation. IEEE transactions on Evolutionary Computation, 13(2), 398-417. [20] Sevic, M . , & Keller, P. (2019). Design of Cnc Milling Machine as a Base of Industry 4.0 Enterprise. Mm Science Journal, 2019, 3555-3560. https://doi.org/10.17973/MMSJ.2019_12_2019042 [21] Shareef, H , Ibrahim, A . A . , & Mutlag, A . H . (2015). Lightning search algorithm. Applied Soft Computing, 36,315-333. [22] Sriboonchandr, P., Kriengkorakot, N . , & Kriengkorakot, P. (2019). Improved Differential Evolution Algorithm for Flexible Job Shop Scheduling Problems. Mathematical and Computational Applications, 24(3), 80. https://doi.org/10.3390/mca24030080 [23] Storn, R., & Price, K . V . (1997). Differential Evolution-a simple and efficient heuristic for global optimisation over continuous spaces-J. of Global Optimisation. Journal of, 11. [24] Sun, L . , Lin, L . , Wang, Y., Gen, M . , & Kawakami, H . (2015). A Bayesian Optimisation-based Evolutionary Algorithm for Flexible Job Shop Scheduling. Procedia Computer Science, 61, 521-526. https://doi.Org/10.1016/j.procs.2015.09.207 [25] Vose, M . D . (1999). The simple genetic algorithm: Foundations and theory. M I T press. [26] W u , T., Yao, M . , & Yang, J. (2016). Dolphin swarm algorithm. Frontiers of Information Technology & Electronic Engineering, i7(8), 717-729. [27] Yang, X.-S. (2010). A new metaheuristic Bat-inspired Algorithm. Studies in Computational Intelligence, 284, 65-74. Scopus, https://doi.org/10.1007/978-3-642-12538-6_6 [28] Yazdani, M . , & Jolai, F. (2016). Lion Optimization Algorithm ( L O A ) : A nature-inspired metaheuristic algorithm. Journal of Computational Design and Engineering, 3(1), 24-36. https://doi.org/10.1016/jjcde.2015.06.003 [29] Zhang, G., Hu, Y . , Sun, J., & Zhang, W . (2020). A n improved genetic algorithm for the flexible job shop scheduling problem with multiple time constraints. Swarm and Evolutionary Computation, 54, 100664. https://doi.Org/10.1016/j.swevo.2020.100664 254 Distortion risk measures in portfolio optimization Miloš Kopa1 , Juraj Zelman2 Abstract. The paper deals with mean-risk problems where the risk is modeled by a distortion measure. This measure could be seen as a generalization of Conditional Value-at-Risk or Expected shortfall. If the associated distortion function is concave the measure is coherent. We analyze several distortion measures for different choices of a concave distortion function. First, assuming a discrete distribution of returns, we identify the efficient frontier. Then we compute the portfolio maximizing reward-risk ratio. Finally, we compare the results for various distortion measures among each other. Keywords: portfolio optimization, distortion risk measure, efficient frontier, performance ratio J E L Classification: D81, G i l 1 Introduction Historically, distortion risk measures have their roots in the dual theory of choice under uncertainty proposed by [13] and were later developed by the axiomatic approach in [11]. The idea behind the distortion risk measure is the transformation of the given probability measure in order to quantify the tail risk more accurately and therefore give more weight to higher risk events. The motivation for distorting a probability measure arose from numerous studies on risk perception, such as the work [4], who observed that people evaluate risk as a non-linear distorted function rather than a linear function of the probabilities. Originally, distortion risk measures found their application in the insurance problems. For example, [10] presented an approach to insurance pricing using the proportional hazards transform. However, due to the relation between insurance and investment risks, distortion risk measures started to be also used in the investment context and portfolio selection problems (see for example [9]). Perhaps interesting could be a relation to stochastic dominance which is an attractive tool for random returns comparisons in various applications, see e.g. [7], [3], or [5] for recent applications of stochastic dominance in pension fund management. The remainder of this paper is structured as follows. Section 2 presents a notation and basic properties of the distortion risk measures. It is followed by a formulation of reward-risk ratio model based on distortion measures of risk in Section 3. Empirical study is presented in Section 4 and the paper is concluded in Section 5. 2 Distortion risk measures In the whole text, we assume that X is a set of random variables on a probability space (Q., T,P). A random variable X e X represents a loss random variable (typically, positive values are associated with losses and negative values represent gains) of some financial asset over a time interval of length T e K.+.. Definition 1. ([2]) Suppose that g : [0,1] —> [0,1] is a non-decreasing function such that g(0) - 0 and g{\) - 1 (also known as the distortion function) and X e X with a distribution function Fx(x). Then, the distortion risk measure associated with the distortion function g is defined as provided that at least one of the integrals is finite. When we define the decumulative distribution function (also known as the survival function) Sx(x) - 1 Fx(x) - P(X > x) and we use it instead of the distribution function, we obtain 1 Charles University, Faculty of Mathematics and Physics, Department of Probability and Mathematical Statistics, Sokolovska 83, 186 75 Prague 8, Czech Republic, kopa@karlin.mff.cuni.cz 2 Charles University, Faculty of Mathematics and Physics, Department of Probability and Mathematical Statistics, Sokolovska 83, 186 75 Prague 8, Czech Republic, zelman.juraj@gmail.com A M S Classification: 91B16, 91B30 255 [l-g(Sx(x))]dx+ / g(Sx(x))dx. co JO The interpretation of this definition is that the distortion measure represents the expectation of a new random variable with re-weighted probabilities. In some cases, such as problems related to insurance or capital requirements, it is appropriate to assume that the random variable X e X is non-negative. In this case, when X e X is a non-negative random variable, then pg reduces to /» CO PG(X)= / g(Sx(x))dx. Jo The class of distortion risk measures is prospective, because distortion measures, in the general case, fulfill the conditions of monotonicity, positive homogeneity and translation invariance. Theorem 1. ([8]) (Monotonicity) Suppose that X,Y € X andX < Y. Then pg(X) < pg(Y). (Positive homogeneity) For a distortion risk measure pg, X € X and A > 0 : pg(AX) = Apg(X). (Translation invariance) For a distortion risk measure pg and X € X it holds that Vc e R : pg(X + c) — pg(X) + c. Theorem 2. ([12]) The distortion risk measure pg(X) is sub-additive pg(X + Y) 1. (1) Consequently, we define the PH-transform measure as: /» CO PPH(X)= / S x ( x ) l l y d x , y > 1, Jo where Sx (x) - 1 - Fx (x) is defined as previously. As we can see from the definition of the distortion function g of the PH transform, this function is concave and therefore, the PH-transform measure satisfies the sub-additivity property. A s [10] mentions, this is an important property as it does not provide any advantage to policy-holders when splitting the risk of their positions into pieces. Another well known examples of distortion functions which generate coherent risk measures include: 256 • The W a n g transform ([11]) gx(x) = 0(0~l (x)+A) f b r x e [0, l ] , i > 0 , where O is the standard normal distribution function. • The MINVAR distortion function ([ 1 ]) g(x) = l - ( l - X ) l + A fbrx e [0,I],A > 0. (2) • The M I N M A X V A R distortion function ([1]) g(x) = 1 - (1 f o r x € [0 ) > 3 Reward-risk ratio Suppose that we have a discrete real random variable Y, representing losses (in percent), with possible values y\,...,ym e l , where y\ < y2 < • • • < ym- As we need to separate these values to negative and non-negative, assume that the index k e {0,...,m} is such that values y i , y 2 , • • •,yt are negative and yt+i, • • •, ym are nonnegative (where for k = 0 we understand that all values are non-negative and for k = m are all negative). For the simplicity, we assume that Vz e { 1 , . . . , m} : P(Y = yi) = Then, we know that its cumulative distribution function hFY(y) = ± £™=1 1 {yi 0 (we do not allow short sales). For a given vector of weights w, we can calculate a vector of gross losses for this portfolio as lp = (wT l)T e R m where loss matrix I is substituted by gross losses I by adding one (e.g. the value 1,1 represents 10% loss and value 0,9 represents 10% return). Equivalently the j-th position of the vector lp is equal to re-weighted sum of assets' gross losses at time j or £ " = 1 wJij. However, as we see from the previous part, where we derived the formula for distortion measure, to calculate values of risk measure for different portfolios, we need to first re-order the values of lp. Therefore, in our optimization problem we need to define a permutation matrix P = (Pij)™'™=i consisting of 0 and 1 such that the sum in every row and column is equal to 1. Then, we can define a new vector y = (yi,... ,ym) e R m such that it has the same values as lp, but its values are ordered from the lowest to the highest. Finally, ?_ denotes gross return of Y (representing gross loss). If we define a variable R representing the reciprocal value of a distortion reward-risk ratio (minimization over a reciprocal value of a reward-risk ratio is equivalent to maximization of a reward-risk ratio), we can formulate the distortion reward-risk optimization problem as + yk +yk+ig i — m 257 minimize R w subject to pg(?) = p(Y_)xR lp = (wT l)T Plp = y , where P = (Pij)™=l m £ p y = l V / e { l , . . . , m } ! = 1 (3) m £ p y = l V i e { l , . . . , m } y=i p v e { 0 , l } V i , 7 e { l , . . . , m } 5>i < j2 < • • • < ym wT e = 1 w > 0. 4 Empirical study To demonstrate our model, we selected ten stocks ( A l - A10), which are traded at stock exchanges N Y S E and Nasdaq, see Table 1. We restrict to a smaller sample of weekly adjusted closing prices ranging from 2020-12-21 to 2021-02-22. A smaller sample was selected due to the computational complexity of our model, which leads to a non-linear mixed-integer optimization problem. Asset Company Ticker GICS Sector A l Microsoft Corp. MSFTMicrosoft Corp. MSFT Information Technology A2 Intel Corp. INTC Information Technology A3 Goldman Sachs Group GS Financials A4 BlackRock B L K A5 Alphabet Inc. GOOGL Communication Services Alphabet Inc. Communication Services A6 AT&T Inc. T A7 Amazon.com, Inc. A M Z N Consumer Discretionary A8 Johnson & Johnson JNJ Health Care A9 General Electric GE Industrials A10 Exxon Mobil Corp. X O M Energy Table 1: Selected assets and their corresponding GICS sectors In our implementation, we focused on two distortion risk measures. The Proportional Hazard transform (defined in (1)) for two different parameters y — 2 and y — 5 and the M I N V A R distortion risk measure (defined in (2)) for two parameters A - 1 and A = 4. For better illustration of the position of the portfolio with the highest reward-risk ratio, we present it with resulting efficient frontiers in Figures 2a and 2b and Tables 2 and 3 with allocations of the optimal portfolios. As can be seen in Figure 2a, different choices of parameter y does not only affect the position of the efficient frontiers but influences their shape as well. This is the result of the shapes of Proportional Hazard functions depicted in Figure la. A s we can see, these functions assign higher values especially to lower values of x. Thus, the corresponding risk measure assigns higher probabilities to realizations with the highest losses. This effect is noticeable especially on the portfolios beyond the highest reward-risk ratio portfolio, where risks grow significantly faster than in the previous part of the efficient frontier. Therefore, different choices of parameters allow us to model various levels of risk perception and to construct optimal portfolios with respect to these levels. Moreover, as can be seen from Table 2, the optimal portfolios with the lowest risk and the highest reward-risk ratio differ significantly. Not only with respect to their values of risk but regarding their allocations as well. Similar results are obtained for the M I N V A R distortion function. In this case, different choices of parameter A do not only lead to different values of risk but also to different allocations of optimal portfolios. These differences can be noticed from Table 3. The effect on the shapes of efficient frontiers and their positions is depicted in Figure 2b. As we can see, the 258 0.0 0.2 0.4 0.6 0.8 1.0 Values of x 0.0 0.2 0.4 0.6 0.8 1.0 Values of x (a) Proportional Hazard transform (b) MINVAR distortion function Figure 1: Selected distortion measures for different parameters Proportional Hazard transform, y = 2 Return A l A 2 A3 A10 Risk RRR Optimum 1,93% 2,68% 0,386 0,310 0,024 0,280 0 0,860 0 0,140 0,992774 0,993617 1,026696 1,033354 Min Risk Max RRR Proportional Hazard transform, y = 5 Return A l A 2 A9 A10 Risk RRR Optimum 1,28% 2,54% 0,537 0,071 0,294 0,098 0,071 0,759 0 0,170 0,99964 1,009303 1,013188 1,015921 Min Risk Max RRR Table 2: Optimal portfolios with respect to the Proportional Hazard transform with corresponding mean returns, risks and reward-risk ratios (RRR). shapes of M I N V A R distortion functions from l b are translated into the shapes of efficient frontiers. Therefore, in comparison to the P H measure, we also obtain different allocations of optimal reward-risk portfolios. MINVAR distortion function, A = 1 Return A l A 2 A3 AH) Risk RRR Optimum 1,93% 2,82% 0,399 0,264 0,187 0,150 0 0,401 0,599 0 0,993088 0,994221 1,026426 1,034163 Min Risk Max RRR MINVAR distortion function, A = 4 Return A l A 2 A3 A9 A10 Risk RRR Optimum 1,32% 1,90% 0,471 0,155 0 0,374 0 0,421 0,169 0,211 0 0,200 1,0021 1,0047 1,0111 1,0142 Min Risk Max RRR Table 3: Optimal portfolios with respect to the M I N V A R distortion function with corresponding mean returns, risks and reward-risk ratios (RRR). 5 Conclusions The paper presents a tractable approach to portfolio optimization using distortion risk measures which could be seen as generalizations of Value-at-Risk and Expected Shortfall. In the empirical study, two different formulations (mean-risk and risk-reward ratio) and four different distortion measures are considered. The corresponding efficient frontiers and reward-risk maximizing portfolios are compared. Although the paper presents only static model, the distortion measures could be similarly applied to multistage models with exogenous [14], [15] or endogenous randomness [6]. Acknowledgements The paper was supported by the grant No. 19-2823IX of the Czech Science Foundation. 259 0.995 1.000 1.005 1.010 1.015 1.020 1.025 1.030 0 995 1.000 1 005 1.010 1.015 1 020 1.025 1.030 Risk Risk (a) Proportional Hazard transform (b) M I N V A R distortion function Figure 2: The efficient frontiers. Portfolios with the highest return, the highest reward-risk ratio and the lowest risk are highlighted References [1] Cherny, A . and Madan, D . (2009). New measures for performance evaluation. The Review of Financial Studies, 22, 2571-2606. [2] Dhaene, J., Kukush, A., Linders, D . and Tang, Q. (2012). Remarks on quantiles and distortion risk measures. European Actuarial Journal, 2, 319-328. [3] Kabasinskas, A . , Sutiene, K . , Kopa, M . , Luksys, K . and Bagdonas, K . (2020). Dominance-Based Decision Rules for Pension Fund Selection under Different Distributional Assumptions. Mathematics, 8, n.719. [4] Kahneman, D . and Tversky, A . (1979). Prospect Theory: A n Analysis of Decision under Risk. Econometrica, 47,263-292. [5] Kopa, M . , Kabasinskas, A . and Sutiene, K . (2021). A stochastic dominance approach to pension-fund selection, IMA Journal of Management Mathematics, https://doi.org/10.1093/imaman/dpab002. [6] Kopa, M . , and Rusý, T. (2021). A decision-dependent randomness stochastic program for asset-liability management model with a pricing decision. Annals of Operations Research, 299, 241-271. [7] Moriggia, V., Kopa, M . and Vitali, S. (2019). Pension fund management with hedging derivatives, stochastic dominance and nodal contamination. Omega, 87, 127-141. [8] Sereda, E . N . , Bronshtein, E . M . , Rachev, S. T , Fabozzi, F. J., Sun, W. and Stoyanov, S. V. (2010). Distortion risk measures in portfolio optimization Handbook of portfolio construction, pp. 649-673. [9] Van der Hoek, J. and Sherris, M . (2001). A class of non-expected utility risk measures and implications for asset allocations. Insurance: Mathematics and Economics, 28, 69-82. [10] Wang, S. (1995). Insurance pricing and increased limits ratemaking by proportional hazards transforms. Insurance. Mathematics and Economics, 17, 43-54. [11] Wang, S. (2000). A class of distortion operators for pricing financial and insurance risks. Journal of risk and insurance, 67, 15-36. [12] Wirch, J. and Hardy, M . (1999). A synthesis of risk measures for capital adequacy Insurance: mathematics and economics, 25, 337-347. [13] Yaari, M . E . (1987). The dual theory of choice under risk. Econometrica: Journal of the Econometric Society, 55,95-115. [14] Vitali, S., Moriggia, V. and Kopa, M . (2017). Optimal pension fund composition for an Italian private pension plan sponsor. Computational Management Science, 14, 135-160. [15] Zapletal, F , Šmíd, M . and Kopa, M . (2020). Multi-stage emissions management of a steel company. Annals of Operations Research, 292, 735-751. 260 The goal programming approach to investment portfolio selection during the COVID-19 pandemic Donata Kopanska-Brodka1 , Renata Dudzinska-Baryla2 , Ewa Michalska3 Abstract. The coronavirus pandemic has an impact on almost every field of our lives, including investments. More than a year has passed since the outbreak of the pandemic. Therefore, we have enough data to analyse the effects of the coronavirus crisis. The authors of this paper present the analysis of the impact of C O V I D - 1 9 pandemic on the investment portfolio of risky assets that are traded on the various markets, such as stock market, currency market and commodity market. Such components are supposed to be statistically independent. Our results show how portfolio responds to coronavirus pandemic and how this response changes over time with regard to investors with various preferences. The preferences at the moments of distribution of the portfolio's rate of return are expressed in the scenario form. In our study we apply the polynomial goal programming model to construct the investment portfolio in each month, taking into account the period from the pandemic outbreak. Keywords: goal programming, investor's preferences, investment portfolio, meanvariance-skewness portfolio, C O V I D - 1 9 J E L Classification: G i l A M S Classification: 91G10 1 Introduction Historically, epidemics, both global and regional, are most often accompanied by studies and empirical analyses regarding their impact on the economy and human behaviour. Various restrictions, limitations and sanitary regimes introduced during a pandemic distract decision-makers and the decisions they make are often motivated by other factors (emotional factors) instead of rationality. The outbreak of a pandemic can be a shock to the global economy but its economic consequences vary widely and some industries become financial beneficiaries. The coronavirus pandemic spreading globally from 2020 is seen as a real threat to the world economy, financial markets and commodity markets. The impact of the C O V I D - 1 9 pandemic on economies and financial markets was already noticed in the first months of the global spread of the virus. The paper [12] shows empirical studies supporting the thesis of the immediate reaction of global financial markets to the unexpected outbreak of epidemic. Because of the pandemic, some international institutions such as the International Monetary Fund (IMF) and the Organisation for Economic Cooperation and Development (OECD) made radical reductions in global economic growth forecasts [ 12]. A research regarding 30 countries conducted by Fernandez [11] showed that in 2020 there will be a decline in G D P of 2.8% on average. The quotations of most financial market indices reached very low levels in March 2020 [3]. The link between pandemic outbreak and stock market risk was examined in the paper of Zhang, H u and Ji [24], among others. The studies on the economic impact of previous epidemics such as S A R S in 2003 in Taiwan [4] and Ebola in 2014 [7] also shown that these outbreaks had a significant impact on financial markets. At the end of the first quarter of 2020, global stock market indices such as the Dow Jones Industrial Average (USA), the SSE Composite Index (China) or the Euronex 100 (Europe) reacted with strong declines. On the U S market, the three main stock market indices DJIA, N A S D A Q and S & P 500 recorded the biggest falls, with declines of 37.1%, 30.1% and 31.9%, respectively. Such a strong financial market reaction resulted in a decline of more than 10 trillion U S D in trading value [9]. Moreover, on 23 March 2020, the global share market index (MSCI A C W I ) hit its lowest level and declined by 32% compared to 2 January 2020. Also in March 2020, the major stock exchanges around the world experienced the largest declines but already five months later, despite the pandemic, there was a rebound leading to high quotation levels. Strong falls were again seen at the turn of September and October as well as October and November 2020. 1 University of Economics in Katowice, 1 Maja 50,40-287 Katowice, Poland, donate.kopanska-brodka@ue.katowice.pl. 2 University of Economics in Katowice, 1 Maja 50,40-287 Katowice, Poland, renata.dudzinska-baryla@ue.katowice.pl. 3 University of Economics in Katowice, 1 Maja 50, 40-287 Katowice, Poland, ewa.michalska@ue.katowice.pl. 261 Since the beginning of the global spread of the coronavirus in 2020, there was no significant impact of the pandemic on efficiency on the forex market. Changes in quotations in the area of the three currencies of interest: E U R , U S D and P L N were not as abrupt as they were on the share markets and the upward trend of E U R quotes against U S D and P L N was stable throughout the period. The reaction of the cryptocurrency market at the start of the pandemic was similar to that seen on global stock markets. The price of B I T C O I N reached its lowest level since April 2019 on 12 March 2020, thus violating the myth of cryptocurrencies as a "safe haven" for times of crises and stock market crashes [2]. The rapid spread of confirmed cases of coronavirus infection significantly affected the demand and supply of commodities thus contributing to declines in commodity trading [ 19]. Since January 2020, prices of major commodities have indicated a downward trend. The demand for oil collapsed, which contributed to the fall in its price on the stock exchanges [23]. Following the W H O announcement of the C O V I D - 1 9 pandemic in March 2020, the commodity market also experienced short-term significant fluctuations in precious metal prices but the following months already showed an upward trend. Due to the uncertainty surrounding W H O reports on the global spread of the coronavirus, gold prices continued to rise until March 2020. A significant but short-term reduction in the gold price occurred on 16 March 2020 when the amount of $1451.5 was paid for an ounce of gold. The impact of the epidemic can be transferred to financial markets through various channels. Not only through a decrease in the number of economic actors, high level of market linkages or financial integration but also through investors' decisions on the structure of their investment portfolio. According to Ramelli and Wagner [20], the investors are paying more attention to the economic and financial impact of the C O V I D - 1 9 pandemic. The problems addressed in this article focus on the decisions of the investor with fixed preferences regarding the parameters of rate of return of an optimal portfolio whose components are assets from different independent markets. The potential components of the investment portfolio are three assets treated as a "safe haven" (gold, currency and cryptocurrency), oil which is an important raw material determining the development of the global economy and investments in shares of companies on the domestic and foreign markets (represented by stock market indices). Optimal portfolios are constructed using methods of polynomial goal programming for different periods of development of the C O V I D - 1 9 pandemic, while the reaction of the decision-maker to various events accompanying the pandemic is described by the structure of the obtained portfolios. 2 Polynomial goal programming models in investment decisions In the classical portfolio selection problem proposed by Markowitz [16], we minimize the portfolio variance for a fixed level of rate of return. Scott and Horvath [22] showed that if the assumptions of the Markowitz model are not satisfied (i.e. the distribution of random rates of return is asymmetric or the utility function is not a quadratic function), the evaluation of investments should be based on central moments of at least the third or fourth order. Portfolio selection models which take into account higher order moments are classified as multi-criteria optimisation problems. Various techniques for solving such problems are considered in the literature, including the approximation of expected utility with higher order moments proposed on the grounds of utility theory or reduction of the problem to a goal programming approach with a linear or nonlinear criterion function [1, 8, 13, 15, 21]. Lai [14] was a precursor who used polynomial goal programming in optimal portfolio selection. This approach had many imitators [5, 6] but the assumptions they make about the model could lead to solutions that are not feasible from a practical point of view. The studies that have been conducted for years on distributions of random rates of return confirm the failure of the basic assumptions of the Markowitz model concerning the normality of the distribution of random rates of return [10, 18]. Thus, there is a growing interest in models that are extensions of the classical two-criteria model of optimal portfolio selection. The literature of the last decade abounds in modifications of the Markowitz model involving the inclusion of higher order central moments as additional criteria as well as in methods for solving the resulting problem. A multi-criteria model of selecting a share portfolio (without short selling) taking into account the expected value EP, variance VP and the third central moment as a skewness measure SP is formulated as fol- lows: 262 max(Ep) min(yP) max(SP) (1) xt > 0,i = 1, ...,JV where the quantities x ; for i = 1,..., N denote the shares in the portfolio. The preference for the value of parameters of the rate of return portfolio distribution occurring in model (1), i.e. the expected value EP, variance VP and third central moment SP are justified by expected utility theory [17, 8]. The EP maximisation represents the preference for higher expected benefits, VP minimisation corresponds to risk aversion, SP maximisation on the other hand refers to a preference for a positive skewness of the portfolio distribution that guarantees a lower probability of very low portfolio rates of return. Model (1) is a multi-criteria non-linear problem and its solution depends on the assumptions made and the choice of method. The simplest technique for solving such a problem is reduction to the problem with a single objective function. Many solutions of this type are proposed in the literature, one of the possibilities is the use of goal programming. The formal model of selecting an optimal stock portfolio using goal programming is as follows: min(z(d)) EP + de = E0 VP-dv = V0 SP + ds = S0 (2) 2i=ix i = i xt > 0,i = 1, ...,N de, dv, ds > 0 The minimised criterion function z(d) is defined as a function of deviations that depends additionally on the investor's preferences with respect to the moments of the distribution. These preferences are expressed by the ranks (a, B, y) assigned to undesired deviations (de, dv, ds) from the aspiration levels regarding the portfolio distribution parameters. The desired levels (E0, V0,S0) can be assumed or determined as optimal solutions of other models. Among the forms of the z(d) function considered in the literature, there is the polynomial form . This can be the polynomial form of the absolute deviations z(d) = (de)a + (dvY + (ds)y (3) or polynomial form of the relative deviations z(d) = de a dv ß ds — + — + — E0 + V0 + So (4) The rank triple of (a, B, y) represents the structure of the decision-maker's preferences regarding the distribution parameters. The procedure for determining the optimal multi-criteria portfolio proposed in this work consists of two stages and the potential components of the portfolio are selected assets whose rates of return are statistically independent random variables. In the first stage the reference values of portfolio parameters (E0, V0,S0) are determined. The expected rate of return E0 is equal to the maximum value from among the expected rates of return of all assets considered in a given period, the variance V0 is equal to the variance of the global minimum risk portfolio, and the skewness 5 0 value is equal to the maximum value from among the skewness of all assets in the given period. In the second stage, the previously determined values (rJ0 ,V^,50 ) are used as reference values (aspiration levels) in the polynomial goal programming model of the form: 4 The polynomial model proposed by Lai [14] considers the minimisation of deviations from aspiration levels defined only for the expected value and skewness of the portfolio, while the variance of the optimal portfolio satisfies rigid constraints and takes the value of one. 263 ( t d e \ a /dv\P / d s \ Y \ Ep + de = E0 VP-dv = V0 (5) SP + ds = S0 xt > 0,i = 1, ...,JV de, dv, ds > 0 where the values xt for i = 1,..., N denote the shares of assets in the portfolio and the values de, dv, ds represent deviations from desired values. The parameters of a multi-criteria portfolio of assets that are independent random variables are determined as follows: • expected value: EP = Yu=1(xt • E(Ri)), • variance: VP = 2f=i((*i)2 • v d, • skewness: SP = 2i=i((*i)3 ' where E(Ri) denotes the expected value of the ith asset, Vt its variance, and St the third central moment treated as a measure of skewness. The rank triple (a,B,y) describes the considered scenario of preferences with respect to the (EP, VP,SP) parameters, whereas a, B, y 6 {1; 2; 3}. For example, the scenario (2, 1, 3) corresponds to the situation when (VP > EP > Sp) which means that achieving the aspiration level for the portfolio variance is more preferred than achieving the aspiration level for the expected value and skewness.5 3 Portfolio structure during the COVID-19 pandemic - empirical study The aim of our studies is to analyse the structure of optimal portfolios determined for investors with different preferences regarding expected value, variance and skewness in subsequent months of the C O V I D - 1 9 pandemic. Potential components of portfolios were selected from various independent markets: cryptocurrencies (BITCOIN quoted on BitStamp), commodities ( G O L D , OIL), U S stock exchange (DJIA index), currencies ( U S D / E U R quoted on Forex), Polish stock exchange ( W I G index). A l l assets are quoted in U S D . Additionally, for the DJIA index one index point corresponds to 1 U S D and W I G index quotations (one index point corresponds to 1 P L N ) are expressed in U S D using the average U S D / P L N exchange rate provided by the National Bank of Poland. A l l quotations are from www.biznesradar.pl. In order to capture changes in the structure of the optimal portfolios over successive periods of the spread of the coronavirus pandemic, the analysis covered the period from October 2019 to March 2021 in which 16 three-month sub-periods were identified. The first portfolios (for the preference scenarios considered) were determined as at 1 January 2020 on the basis of the logarithmic daily rates of return from the last quarter of 2020. These were therefore portfolios corresponding to the pre-pandemic period. Subsequent portfolios were determined on a monthly basis (on the first day of the month) based on data from the preceding three months (58-66 observations). The last group of portfolios was calculated on 1 April 2021. In each period, the obtained optimal portfolios are a solution to the polynomial goal programming model (5). Due to non-linearity of this model, calculations were made in the SAS software using the N L P solver and self-prepared programs. The desired values of parameters of the distribution of rates of return of the portfolio were determined (separately for each of the sixteen sub-periods) according to the previously presented method of determining the reference values (E0,V0,S0). The investors' preferences with respect to the parameters of the distribution of the portfolio return rate were modelled by means of an ordered rank triple (a, B,y).lf for investors the most important thing is the expected value of the portfolio's rate of return then their preferences are represented by rank triples (1, 2, 3), (1, 3, 2), (1, 2, 2). If the variance of the portfolio rates of return is the most important for investors, their preferences are represented by the rank triples (2, 1, 3), (3, 1, 2), (2, 1, 2). O n the other hand, i f for investors the 5 The symbol" >• " denotes the relation of preferences to moments of the portfolio distribution. 264 skewness of the distribution of portfolio rates of return is the most important then preferences are represented by the rank triples (2, 3, 1), (3, 2, 1), (2, 2, 1). The obtained optimal portfolios were analysed both in terms of changes in their structure and the dynamics of changes in the shares of individual assets in subsequent months of the pandemic. Figure 1 shows the structure of portfolios determined for 16 sub-periods and two selected preference scenarios (1, 2, 3) and (2, 1, 3) 6 The dominant asset in all portfolios regardless of preference and sub-period is the E U R currency. The announcement of a coronavirus pandemic by W H O on 11 March 2020 causes panic among investors and the abandonment of riskier assets. A s of 25 March 2020, all the European Economic Area countries and more than 150 countries worldwide had been affected. For sub-periods 4 and 5 (covering data from January to April 2020), the optimal portfolios include only two components: E U R and G O L D which is traditionally seen as a safe investment. The shares of E U R and G O L D in the portfolios in all 16 sub-periods under the assumption that the expected rate of return on the portfolio is preferred over the variance and skewness (scenario (1, 2, 3)) are presented in Figure l a . In the period from M a y to mid-September, the infection curve in Poland flattened and the daily number of positive tests remained at the level of several hundred. Information in the media about the end of the first wave of the pandemic and relaxation of restrictions during the holiday period caused people to return to normality also in the area of investments - in sub-periods 8, 9 and 10 (covering data from M a y to September 2020) we observe a situation analogous to that before the pandemic, i.e. full diversification of portfolios. A similar situation is observed in periods 13-15, but this time the reasons can be attributed to the start of mass vaccination and the prospect of a return to "normality". IBITCOIN H G O L D DOIL DDJIA 0 E U R DWIG 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% IIIi l l 1 2 3 4 11® III 5 6 7 gl8 9 (a) M 10 11 12 13 14 15 16 Hill a I s s h 1 IBITCOIN H G O L D DOIL DDJIA 0 E U R DWIG 100% 90% 80% 70% 60% 50% 40% 30% 20% II h 1 2 III III III l a Hill 3 4 5 6 7 9 10 11 12 13 14 15 16 (b) Figure 1 Structure of optimal portfolios: (a) portfolios for preference scenario (1,2, 3), (b) portfolios for preference scenario (2, 1, 3) The main asset constantly present in portfolios is E U R . During the first wave of the pandemic (Figure 2), we observe an increase in the share of the European currency in portfolios, up to 80%. During the holiday season E U R is no longer as attractive as before and its share in portfolios returned to pre-pandemic levels (around 50%). The next wave of the pandemic brings a renewed increase in the share of E U R in portfolios. 1 0,8 0,6 0,4 0,2 0 l l l l l l l l l 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 (b) G O L D Figure 2 E U R and G O L D shares in portfolios for the preference scenario (1,2, 3) The investment in B I T C O I N comes as a surprise - the strongly rising quotations of this cryptocurrency as of December 2020 did not translate into significant shares in portfolio (Figure 2). Throughout the period, regardless ' For the other preference scenarios, the portfolio structures are similar. 265 of the pandemic situation in the country and the world, the shares of B I T C O I N in optimal investment portfolios are negligible and do not exceed 7%. The cryptocurrency, compared to the E U R currency dominating the portfolios in different sub-periods of the ongoing C O V I D - 1 9 pandemic and different scenarios of investor preferences regarding profit, risk and skewness, is not a safe investment. The quotation charts of E U R currency and B I T C O I N cryptocurrency are presented in Figure 3. •BITCOIN •EUR 70000 60000 50000 40000 O 7-—; —< —< q cK o\ o\ o o o o o 3 O - H - H — O - H - H - H Figure 3 Quotations of B I T C O I N and E U R in period October 2019-March 2021 The shares of OIL in portfolios do not exceed 10% throughout the period under review. Regardless of the investor preference scenario and the pandemic period, the asset are perceived as unattractive. Only during periods of easing restrictions (periods 7-10) and positive investor approach due to mass vaccinations (periods 12-16), the shares of OIL in portfolios are non-zero. A similar situation is observed for domestic investment (WIG), for which the highest shares, similar to those before the pandemic but not exceeding 14%, are observed during periods of withdrawal of restrictions imposed on society and an upward trend in financial markets (periods 7-10), as illustrated by the chart in Figure 4. •DJIA •WIG r 17000 16000 15000 23000 21000 19000 17000 o —- H - H - H O 3 o —O - H - H - H O - H - H — Figure 4 Quotations of DJIA and W I G in period October 2019-March 2021 The U S market started to react to the pandemic situation in China already in the first months of 2020 which is reflected in the rapidly declining DJIA index quotations (Figure 4). In portfolios determined for 1 March 2020, the shares of this index are zero. In the following months the situation stabilises. The shares increase and are no less than 7%, despite the successive waves of the pandemic. The events surrounding the spread of the C O V I D - 1 9 coronavirus and its mutation generate fear and uncertainty, i.e. conditions in which gold always appreciates in value - gold retains the purchasing value of capital over time. Since August 2020 we have been observing a decline in gold prices, which may be explained by a familiarisation 266 with the prevailing situation and a reduction in investments in "safe" gold. Consequently, the share of G O L D in portfolios are also declining. 4 Conclusions The rapidly developing coronavirus pandemic in 2020 has not only become a threat to the proper functioning of financial institutions but has also affected the investment decisions made by individuals with a subjective preference regarding selection criteria. In this paper the authors investigate how C O V I D - 1 9 affects the structure of an optimal investment portfolio renewed monthly. Portfolios with components from different markets were analysed, starting with a portfolio determined on the basis of data from the quarter preceding the C O V I D - 1 9 outbreak through the subsequent months of the pandemic. On the basis of the results obtained, the structure of the optimal portfolios was found to depend on events related to the spread of the pandemic. During the period of relaxation of restrictions, the shares of assets were similar to those before the pandemic. The scenarios considered in the proposed model regarding preferences over expected value, variance and skewness did not affect the structure of the optimal portfolios. Currency and gold have proven to be a safe haven for equity investors amid the economic and financial market turmoil associated with the C O V I D - 1 9 pandemic at various stages of its development. The allocation of B I T C O I N with zero or marginal share in optimal portfolios supports the view that this cryptocurrency is not a "safe haven" for investors during the pandemic. Lack of investments in domestic and US-listed stocks during periods of pandemic outbreak turbulence regardless of the preference scenario underline the investor distrust towards these se- curities. References [I] Arditti, F.D. & Levy, H . (1975). Portfolio Efficiency Analysis in Three Moments: The Multiperiod Case. Journal of Finance, 30(3), 797-809. [2] Baur, D . G & Hoang, L.T. (2021). A crypto safe haven against Bitcoin. Finance Research Letters, 38, 101431, https://doi.Org/10.1016/j.frl.2020.101431. [3] Cheema, M . , Faff, R. & Szulczyk, K . (2020). The influence of the C O V I D - 1 9 pandemic on safe haven assets. https://voxeu.org/article/influence-covid-19-pandemic (access 17.04.2021). [4] Chen, M.P., Lee, C.C., Lin, Y . H . & Chen, W Y . (2018). D i d the S A R S epidemic weaken the integration of Asian stock markets? Evidence from smooth time-varying cointegration analysis. Economic Research Ekonomska Istrazivanja, 31(1), 908-926, https://doi.org/10.1080/1331677X.2018.1456354. [5] Chunhachinda, P., Dandapani, K . , Hamid, S. & Prakash, A.J. (1997). Portfolio Selection and Skewness: Evidence from International Stock Markets. Journal of Banking and Finance, 21, 143-167. [6] Davies, R.J., Kat, H . M . & L u , S. (2009). Fund of Hedge Funds Portfolio Selection: A Multiple-Objective Approach. Journal of Derivatives and Hedge Funds, 15(2), 91-115. [7] Del Giudice, A . & Paltrinieri, A . (2017). The impact of the Arab Spring and the Ebola outbreak on African equity mutual fund investor decisions. Research in International Business and Finance, 41, 600-612. [8] Eichner, T. & Wagener, A . (2011). Increases in skewness and three-moment preferences. Mathematical Social Sciences, 61(2), 109-113. [9] Gao, X . , Ren, Y . & Umar, M . (2021). To what extent does C O V I D - 1 9 drive stock market volatility? A comparison between the U.S. and China. Economic Research - Ekonomska Istrazivanja, https://doi.org/10.1080/1331677X.2021.1906730. [10] Fama, E.F. (1965). The Behavior of Stock Market Prices. Journal of Business, 38(1), 34-105. [II] Fernandez, N . (2020). Economic effects of coronavirus outbreak (COVID-19) on the world economy. IESE Business School Working Paper No. WP-1240-E, http://dx.doi.org/10.2139/ssrn.3557504. [12] Khatatbeh, I.N., Bani Hani, M . & Abu-Alfoul, M.N.(2020). The Impact of C O V I D - 1 9 Pandemic on Global Stock Markets. International Journal of Economics and Business Administration, VIII(4), 505-514. [13] Kopariska-Brodka, D., Dudzihska-Baryla, R. & Michalska, E . (2019). The Investor's Preferences i n the Portfolio Selection Problem Based on the Goal Programming Approach. In: W . Tarczyhski & K . Nermend (Eds.) Effective Investments on Capital Markets. Series: Springer Proceedings in Business and Economics (pp. 151-163). Springer. [14] Lai, T Y . (1991). Portfolio Selection with Skewness: A Multiple-Objective Approach. Review of Quantitative Finance and Accounting, 1, 293-305. [15] Levy, H . (1969). A utility function depending on the first three moments. Journal of Finance, 24, 715-19. [16] Markowitz, H . (1952). Portfolio Selection. Journal of Finance, 7(1), 77-91. 267 [17] Menezes, C , Geiss, C. & Tressler, J. (1980). Increasing downside risk. American Economic Review, 70(5), 921-932. [18] Piasecki, K . & Tomasik, E. (2013). Rozktad stop zwrotu z instrumentöw polskiego rynku kapitaiowego. Krakow-Warszawa: Edu-Libri (in Polish). [19] Rajput, H . , Changotra, R., Rajput, P., Gautam, S., Gollakota, A . R . K . & Arora A.S. (2021). A shock like no other: Coronavirus rattles commodity markets. Environment, Development and Sustainability, 23, 6564- 6575, https://doi.org/10.1007/sl0668-020-00934-4. [20] Ramelli, S. & Wagner, A . F . (2020). Feverish Stock Price Reactions to C O V I D - 1 9 . Review of Corporate Finance Studies, 9(3), 622-655, https://dx.doi.org/10.2139/ssrn.3550274. [21] Samuelson, P.A. (1970). The Fundamental Approximation Theorem of Portfolio Analysis in Terms of Means, Variances and Higher Moments. Review of Economic Studies, 37(4), 537-542. [22] Scott, R. & Horvath, P. (1980). On the Direction of Preference for Moments of Higher Order than the Variance. Journal of Finance, 35, 915-919. [23] Sharif, A., Aloui, C. & Yarovaya, L . (2020). C O V I D - 1 9 pandemic, oil prices, stock market, geopolitical risk and policy uncertainty nexus in the U S economy: Fresh evidence from the wavelet-based approach. International Review of Financial Analysis, 70, https://doi.Org/10.1016/j.irfa.2020.101496. [24] Zhang, D., Hu, M . & Ji, Q. (2020). Financial markets under the global pandemic of C O V I D - 1 9 . Finance Research Letters, 36, https://doi.Org/10.1016/j.frl.2020.101528. 268 Robust First Order Stochastic Dominance in Portfolio Optimization Karel Kozmfk1 Abstract. We use modern approach of stochastic dominance in portfolio optimization, where we want the portfolio to dominate a benchmark. Since the distribution of returns is often just estimated from data, we look for the worst distribution that differs from empirical distribution at maximum by a predefined value. First, we define in what sense the distribution is the worst for the first order stochastic dominance. We derive a robust stochastic dominance test for the first order stochastic dominance and find the worst-case distribution as the optimal solution of a non-linear maximization problem. We apply the derived optimization programs to real life data, specifically to returns of assets captured by Dow Jones Industrial Average, and we analyze the problems in detail using optimal solutions of the optimization programs with multiple setups. Keywords: portfolio optimization, stochastic dominance, robustness J E L Classification: G i l A M S Classification: 91G10 1 Introduction The problem of portfolio optimization is a typical problem in economics and finance. Modern methods that deal with portfolio optimization include stochastic dominance, e.g. [7] and [4]. The concept of stochastic dominance allows us to compare two random variables, which in this case represent the return of our final portfolio and a benchmark portfolio. In this work we explore the resistance of the optimal portfolio to the changes of distribution of the returns. To get the optimal portfolio, historical observations are usually used, which represent the empirical distribution. To present the motivation: let us suppose that investor's preferences can be represented by a utility function u (non-decreasing funtion of money) and he makes choices based on the expected utility from the investment E u (R), where random variable R denotes the returns of the investment. Then we are able to order possible portfolios based on their expected utility. But when we do not know the utility function or we want the portfolio to be acceptable for a large amount of people (for example pension funds etc.), we can use the concept of stochastic dominance. In this work, we firstly present the concept on stochastic dominance and define the distributional robust version of the first order stochastic dominance. Then we present the Wasserstrein distance and based on it, we derive a reformulation of the robust stochastic dominance conditions of the first order. We also derive a program to find the worst case distribution. The derived programs are then tested using real life data in the empirical analysis section. 2 Robust stochastic dominance Assumptions and notation Let us have a portfolio of N stocks with weights w = (w\,.., WJV), W = w, where w; = 1, w; > 0, i = 1 , N . The no short selling assumption is not really needed but was used when deriving the tested portfolios. In this work, we often talk about random returns, which are often represented by a random variable and about observed returns, which are represented by scenarios (denoted r ! S - return of asset i in scenario s). Benchmark portfolio weights are denoted r . 2.1 Theoretical background Let us define the first order stochastic dominance (FSD) in accordance with [6]. Definition 1. First we define set 11 as the set of all utility functions. Let X and Y be random variables, we say that X first order stochastically dominates Y (X >FSD Y) if E[u(X)] >E[u(Y)] Vu € FSD X2 «=> FXl(x) < FX2(x), V x e R Now define the robust stochastic dominance in accordance with [ 1 ]. Definition 2. We say that a random variable X dominates robustly a random variable Y in the first order over a set of probability measures Q (X ±-pSD)Y) if Ep[u(X)]>Ep[u(Y)]Vu€UVP€Q. (1) We understand X and Y as random variables denoting return of some portfolios, which consist of stocks, and that the joint distribution of the random returns of the underlying stocks is defined by the distribution P, which we allow to change slightly. The set Q was not specified, we want our portfolio to be prepared for slight changes in the distribution. For this purpose, we select a suitable measure of distance between two distributions and we bound the change of the distribution by a constant, which defines the set Q. 2.2 The Wasserstein distance in discrete framework We use the definition and computation procedures from [8]. We want to define a distance between two probability distributions on (N is the number of stocks). We adjust the general Wasserstein distance definition to our case, where both distributions have the same number of atoms. Definition 3. Let us have two discrete distributions withfinite support. Pi attains values x i , x j with probabilities p i , p r and P2 attains values yi, ...,yr with probabilities q i , q j . The Wasserstein distance of order r (r > 1) corresponds to solving the following linear program: T T m i n Y,Y,&sdr ts & s t=i s=\ T subject to ^ %ts = pt, t = 1 , T 5=1 (2 ) T = qs, s=i, ...,T r=l ? „ > 0 , t,s=\,...,T. where dts - d(xt, ys) denotes a distance between the points xt and ys. For our purposes, we need the flexibility both in values and in probabilities. The first distribution represents our estimate of the distribution based on the observations r9, i = 1 , N ; t = 1 , T and pt - 1 /T, t - 1,.., T, and the second represents the changed distribution for robustness purposes. As for the value of r r = 1 is usually chosen. 2.3 Robustfirstorder stochastic dominance In theorem 1 we did state equivalent conditions for the first order stochastic dominance. We can extend this for the first order robust stochastic dominance X > ? c r ) Y: X >%D Y <=> sup FX,P(X) - Fr,P(x) < 0. (3) PeQ,x€R The condition is understood in the way that the distribution functions of both portfolios depend on a distribution of underlying assets which can vary and is defined by P (we stress this dependence in the subscript of F). Now we can think of the worst case distribution for a given x as the one when in which the supremum is attained or some limit probability distribution i f it is not attained. 270 Definition 4. We say that distribution P is the worst-case distribution for the FSD from Q if the supremum of the LHS in problem (3) is attained for this P for some Now let us have N stocks/assets and let the random return rates have a discrete joint distribution P with realizations Tjt, t = 1 , T ; j = 1 , N , attained with probabilities pt, t = 1 , 2 , T . Let our portfolio have weights w and our benchmark portfolio has weights r . Then we can rewrite the difference of the distribution functions in x as: T T Z P , I [ 5 £ , w,ru , has probability atoms. Using again the Wasserstein distance we try to evaluate the supremum by maximizing: t=i t=i N subject to xk - Y^w ir ik, k=l,...,T T T t=\ s=l T (4) t = 1,...,T t=\ T (=1 n, > - l , Sts > o , pt > o , i = i , N ; t , s - i , r . Now we use the big M to rewrite the indicators using binary variables. We deal with the problem max I r^jv w . r . f < X k ^ , we reformulate it using u,k representing the indicator: max u,k N subject to Z w ir it ^ x k + (1 - utk)M, t,k = \,...,T (5) i=i u,k £ {o,i}, f,fe = i , . . . , r , where M is sufficiently large constant. For the case of the second indicator, there is minus in front of it, which makes it much more difficult to handle. We need to use the inverse inequality, which in this case is sharp inequality. We get a reformulation for max - I ^ ^ T>.r>.( x k ~ vtkM, t,k = I , T (6) (=i vtk€{Q,l}, t,k = \,...,T. The problem is that for software implementation, sharp inequality cannot be used, so we approximate it by adding a very small term, for example 10~5 (we represent it as pt > m, where m (margin) is a small positive constant). So we get a reformulated constraint ZZ{L\T ir it > m+Xk - vtkM, t, k = 1 , T . We use Euclidean norm squared and objective function and the first constraint are merged for implementation purposes. B y using a sharp inequality, then maximum might not be attained, so we approximate it. m basically sets the minimal recognizable difference between the returns of our and the benchmark portfolios. B y setting m to zero, we would consider the same returns (imagine xt - ys for some s, t) actually being higher for the benchmark 271 (even though they are the same), this is the limit version for m —» 0+. We can do this and bear in mind that the returns of benchmark that are the same as some returns of our portfolio actually mean that they are infinitesimally higher for the actual worst case distribution. k = \,...,T t = \,...,T s=l,...,T (7) t,k= l,...,T t,k= l,...,T t,k= l,...,T i= l,...,N;t,s,k = l,...,T We can understand the optimal values of ru and pt defining a distribution as the worst case distribution in the sense of definition 4. We formulate the derived results in the following theorem. Theorem 2. Let X and Y be random variables denoting returns of a portfolio defined by weights w and r . Let us have observed historical returns r9 i = 1, ...,N;t = 1, ...,T. Let us have Q a set of probability measures defined on R w with T atoms: determined by returns rn,i — 1 , N ; t = 1 , T and probabilities pt, t — 1 , T defined as a neighborhood of the empirical distribution. Let the neighborhood be defined with the use of the Wasserstein distance and let the distance on M.N be defined as Euclidean norm squared, i.e. let x, y e JLN , then d(x,y) = (x{ - yi)2 . Then X dominates robustly Y in the first order over the set of probability measures Q (X >pSD Y) if and only if there exists a right open neighborhood of 0 such that for each value m from this neighborhood the optimal value of the problem (7) is less than or equal to zero. We can also use the program with m — 0 and then optimal value being less than or equal to zero is a sufficient condition for the robust stochastic dominance. B y setting m — 0 we enlarge the set of feasible distributions so if robust F S D holds for even larger set, then it certainly holds for the smaller set. Also i f we find m for which there is a feasible solution and the objective value is positive, then the robust stochastic dominance cannot hold. We can generalize the test for probabilities different from i/r. Let us have general probabilities p®, t = 1, ...,T satisfying YJJ=I Pt = 1> then i f we look at the program, the only place we use the observed distribution is in the Wasserstein distance, so we can easily generalize the program by replacing the constraint £ J=i fts = f, t = 1 , T by the constraint 2 j = i fts = t = 1 , T . 3 Empirical analysis 3.1 Data We used stock prices of assets covered by the Dow Jones Industrial Average (DJIA) index, which consists of 30 largest and most traded American companies. The dataset used consists of observations from April 2008 to July T I T \ max V V pt {u,k - v,k) • bk €U,ru,Pt,utk,vtk,bk J N subject to xk = ^ Wirik, T T N t=\ s=l (=1 = J, s=l T J]&s = Ps, t=\ N ^ WirU < xk + (1 - Utk)M, =1 N Y^ru >m+xk- v,kM, Utk € {0,1}, Vtk € {0,1} rit > - 1 , tjts > 0, bk > 0, pt > 0, T k = \ 272 2018. Quarterly returns were used and because the complexity of the problem was very high, we used only first 5 observations for the final experiments. In the dataset, Apple has the highest mean, followed by V (Visa) and U N H (United Health). The lowest mean return had X O M (Exxon Mobil). For the optimization, G A M S software ([2]) was used. 3.2 Worst case distribution In this section, we analyzed the worst case distribution using (7) for the 2 following portfolios. The first maximizes mean return and stochastically dominates the benchmark portfolio in the first order. How to achieve the portfolio can be found in [5], the portfolio contained V (0.354), M C D (0.391), A A P L (0.255), where the numbers in brackets represent weights. The second one is a benchmark portfolio with weights 1 /N strategy - in our notation T i = 1/30,1= 1, ...,30. We test whether this portfolio also robustly dominates the benchmark and observe the worst case distribution. We analyzed the development of the worst case distribution in dependence on the value of e, which defines the radius of the neighborhood of the empirical distribution. Investor with such portfolio wants to know which is the worst possible distribution for him. The robust F S D test given by (7) faces great challenges given that the problem is not only non-linear and nonconvex, but also mixed integer. We used B O N M I N solver for mixed integer non-linear programming. Using more than 5 scenarios caused computational problems, solver converging to an infeasible solution. We chose the values of e as 0 for the first case and then starting with 0.001, we doubled the value of epsilon, ending with 0.128, when the when the difference of the distribution functions was maximal (value 1). The computation took around 2 minutes for all the considered values of e (PC with 16 G B R A M and Intel core i5 6500 Skylake). The constraint for the distance of distributions was fulfilled as equality, the value was slightly lower on the left side of the equality for the last value of e (0.127 < 0.128). This does makes sense, there was no option for the objective to further improve as it was already 1. The probabilities were changed in the worst case scenario for the lower values of e, because the change of probability could make the difference between cumulative distribution functions larger. A s the program was able to make the difference close to 1, there was no need to adjust the probabilities as all the values of benchmark had to be larger than the values of our portfolio. We can see multiple cumulative distribution functions plotted in fig. 1. For e — 0 it is the problem under the empirical distribution, then with the increasing e we can see the distribution function plotted under the worst case distribution. We can see the rise of the C D F for our portfolio in one part of the graphs, because the F S D condition is strictly the difference between CDFs. The fact that the lowest return is actually where the difference is the highest is connected to the fact that the portfolio was created such that the conditions are fulfilled as an equality on the left tail (lower returns) so it is easier (in terms of distance of distributions) to violate those conditions. Even though it is not visible from the graph, the F S D condition is violated for e - 0.001, the difference in the returns is so small it cannot be captured by the graph, the critical points are return -0.13538 for our portfolio and -0.13536 for the benchmark. This is the m (margin) we used for the computation, this difference can be made arbitrarily small and the F S D conditions would be violated. The situation is the same for other depicted e, for example for e - 0.064 the returns are -0.10312 for our portfolio and -0.10311 for the benchmark. 4 Conclusion In this work we introduced the concept of stochastic, robust optimization and the Wasserstein distance. We defined the worst case distribution for the first order stochastic dominance. Based on the equivalent conditions, we derived test of robust stochastic dominance of the first order and program to find the worst case distribution. In the empirical part, we tested all the derived program on real life data, specifically on returns of assets captured by Dow Jones Industrial Average. We analyzed the development of the worst case distribution with increasing value of the radius of the neighborhood around the empirical distribution. The results for all the tests were confronted with intuition and expectations. The main features were captured graphically and by running the programs with multiple set ups, we were able to understand the behavior in detail. A l l of the tests showed some level of numerical instability because the programs are non-linear, non-convex and mixed integer. Even though the programs faced numerical challenges, we were able to get results, which made good sense, followed the definitions and used the strictest parts of the dominance conditions. 273 Figure 1 C D F s for empirical distribution and for the worst case distribution for different values of e CDF for the empirical distribution,e=0 our portfolio benchmark ~ I -0.15 ~ I -0.05 CDF for worst case distribution,£=0.001 CDF for worst case distribution,£=0.004 -0.15 -0.10 -0.05 0.00 0.05 0.10 0.15 Acknowledgements This work was partially supported by Czech Science Foundation (grant 19-2823IX) and S V V project of Charles University n. 260580. References [1] Dentcheva, D., & Ruszczyhski, A . (2010). Robust stochastic dominance and its application to risk-averse optimization. Mathematical Programming, 123(1), 85-100. [2] G A M S Development Corporation. General Algebraic Modeling System ( G A M S ) Release 24.7.4. Washington, D C , U S A , "http://www.gams.com". [3] Hanoch, G . , & Levy, H . (1969). The efficiency analysis of choices involving risk. The Review of Economic Studies, 36(3), 335-346. [4] Kopa, M . , Kabasinskas, A., & Sutiene, K . (2021). A stochastic dominance approach to pension-fund selection. IMA Journal of Management Mathematics. [5] Kuosmanen, T. (2004). Efficient diversification according to stochastic dominance criteria. Management Science, 50(10), 1390-1406. [6] Levy, H . (2015). Stochastic dominance: Investment decision making under uncertainty. Springer. [7] Moriggia, V., Kopa, M . , & Vitali, S. (2019). Pension fund management with hedging derivatives, stochastic dominance and nodal contamination. Omega, 87, 127-141. [8] Pflug, G . C , & Pichler, A . (2014). Multistage stochastic optimization. Switzerland: Springer International Publishing. 274 System Dynamic Model of Beehive Trophic Activity Kratochvílová Hana1 , Rydval Jan2 , Bartoška Jan3 , Chamrada Daniel4 Abstract. The article proposes the system dynamic model of beehive trophic activity, including the influence of weather and daily rhythm. The proposed model is based on a stock and flow diagram (SFD). B y designing a S F D model, the authors define the dynamic behavior of key aspects of bee swarm prosperity, distinguishing between the internal and external activity of bees during continuous consumption of stocks in the beehive. The article will also outline the procedure on how to estimate the vitality of bee swarm i n the long term, because due to frequent bee mortality, bee vitality has become one of the key topics in the field of nature conservation and agricultural production. The results presented i n the paper are intended to help verify the hive weights and data collected from field research. The authors' suggestions are based on long-term field research using hive weights as well as individual beekeeping observations. The proposals are based on causal relationships between beehive weight, humidity and temperature i n the beehive, outdoor temperature, and et al. Keywords: Bee Breeding; Honey and Nectar Stocks; Landscape Fertility; Mathematical model; Temperature, Weight and Humidity of Beehive; Stock and FlowDiagram; System Dynamic Model. J E L Classification: C44 A M S Classification: 90C15 1 Introduction International academic and professional literature does not provide very extensive information on the issue of beehive trophic activity. Only fragments that deal with terms as fertility or trophies can be traced, and only a small number of authors focuses on sophisticated mathematical models, so any relevant informations are very rare. Nevertheless, the vitality of bee colonies can be described as an important determinant in the area of nature protection and agricultural production at the global level, which are topics that are often mentioned in many countries around the world. The term "landscape fertility" itself has been present i n the literature i n the observed context since the beginning of the 21st century. It usually deals with the explanation of this concept [8] i n the biological and environmental context and also [13] in connection with fertilization, biochemistry, urban management and the functions of the natural environment. [13] builds presented ideas on several sources of earlier literature, i n which the effects of soil chemical composition on landscape fertility are most often discussed. [1] then focuses directly on the behavior of bee colonies and i n the proposed mathematical model compares the usability of the landscape fertility i n relation to the behavior and performance of bees. His research work examines and evaluates the effect of pollen on the of the honey bee colonies dynamics in the context of the possible impact of adverse events (e.g. pesticides, parasites, nutritional stress). The use of system dynamics models i n living nature is rather a new scientific discipline, that develops especially due to computer technology since 1990s. [10] presents a model that focuses on the analysis of the growing mortality of bee colonies i n the observed colonies. In his study, he creates a dynamic model to identify key factors that have a major impact on the growth and survival of hives. Its analysis is based on a three-year follow-up, on the basis of which a simulation of possible population growth or decline is created. The author concludes that the fluctuations were caused by the high sensitivity of the bee colony to the composition of food sources and sharp changes i n temperature during the changing seasons. [4] addressed the issue of modeling the dynamics of biological systems in his article. Its outputs can be used as a basis for the general application of system dynamics models i n this area. The author presents the possibility of using selected types of organic, genetic, population or biochemical models. To display the dynamic complexity of system and understand animal behavior, Krejci et al. [7] uses a system dynamics model, especially a stock and flow diagram to capture a livestock system behavior. 1 Czech University of Life Sciences Prague, Department of Systems Engineering, Kamýcká 129, Prague, ha.kratochvilova@gmail.com. 2 Czech University of Life Sciences Prague, Department of Systems Engineering, Kamýcká 129, Prague, tydval@pef.czu.cz. ' Czech University of Life Sciences Prague, Department of Systems Engineering, Kamýcká 129, Prague, bartoska@pef.czu.cz. 4 Czech University of Life Sciences Prague, Department of Systems Engineering, Kamýcká 129, Prague, chamrada@pef.czu.cz. 275 The aim of this paper is to create and present a model of trophic activity of bees, including the influence of weather and circadian rhythms on bee activity. This model will provide an idea of the short-term trophic activity of bees, including honey production, including the effect of outdoor temperature and the length of sunlight. The model will be designed by using the basic elements of system dynamics and it will be verified using the date obtained from beehive weights and data collected from field research. 2 Material and methods 2.1 System Dynamics and Stock and Flow Diagram System Dynamics is a discipline that uses modeling and computer simulation to analyze, understand, and improve complex dynamic systems [5], [11] and [14]. The main idea of system dynamics is, that the system behavior is determined mainly by its own structure, structure elements and by the interconnections between them [9], [11], System dynamics methodology is based on the feedback concepts of control theory [5], the principles of cognitive limitations, mental modelling [6] and bounded rationality [12]. System dynamics is an appropriate technique to handle complex systems, to understand them and to improve system thinking and system learning. To describe and define a system using system dynamics, a Causal Loop Diagram (CLD) is used firstly, and subsequently, a Stock and Flow Diagram (SFD) is created to enable mathematical modelling of the system. The basic building blocks used in the S F D with icons and their interpretation are shown i n Table 1. Symbol Mathematics Interpretation Stock X Flow o Variable dY/dX > 0 In the case of accumulations, Y=JC t(X + -)dt + Yto dY/dX < 0 In the case of accumulations, Y=JC t(-X + -)dt + Yto Stocket) = f Onflow - Outflow)dt "'to + 5tocfc(t0) All else equal, if X increases (decreases), thenY increases (decreases) above (below) what it would have been. In the case of accumulations, X adds to Y. All else equal, ifXincreases (decreases), thenY decreases (increases) below (above) what it would have been. In the case of accumulations, X subtracts from Y. Delay mark Stock variable Flow variavle Stocks outside model boundary Variable Causal loop Reinforcing (+) Causal loop Balancing (-) Table 1 Symbols of the stock and flow diagram (based on [14]) The Stock and Flow diagram represents the structure of the system i n terms of stock and flow, and usually follows the Causal Loop Diagram. The stock represents the state (or condition) of thy system and the flow is changed by decisions based on the condition of the presented system [2]. The S F D is the structure of the system and can be simulated to generate the dynamic behavior of the presented system [14], it is represented by integral finite difference equations involving the variables of the presented feedback loop structure of the presented system and simulates its dynamic behavior. When a model is developed and represented by the SFD, model validation is conducted to develop confidence in the model. The validity and usefulness of a dynamic model should be assessed. Testing means comparing the model with empirical reality for accepting or rejecting the model, and validation means to create confidence in the usefulness of the created model. Sterman [14] lists many different tests for testing the created model. For example, the Test of Model Structure and Dimensional Consistency are the most basic tests. Dimensional Consistency; it should be always specified the units of measure for each variable, of course with real world meaning. The Behavior Reproduction Test should assess a model's ability to reproduce the dynamic behavior of a real system. The Mean Absolute Percent Error ( M A P E ) is one of the commonly used tools for measuring the average error between the simulated and actual real values. 276 1 \ 1 \ A m — A x MAPE = - > — — ; (multiply by 100 for %) (1) where: ^represents the data series, Xm represents the model outputs and n is the length of the data series. 2.2 Beehive weighing-machines on the Včelstva Online web portal Beehive weighing-machine is an autonomic electronic mechanism under the beehive. It reads weight, inside temperature, outside temperature, and humidity. Beekeepers use a beehive weighing-machine for online observation of bee colony condition at the station of bee colony. The authors of this paper are involved in a research project aimed at the operation and development of online beekeeping web portal Včelstva online (https://vcelstva.czu.cz/) using bee scales for beekeepers from the public. (No. 2019B0001, Internal Grant Agency, F E M C U L S Prague). This paper follows this previous research [3], The Včelstva Online web portal provides basic user functions for beekeepers: hive diary, records of locations with bee habitats, treatment records, etc. It also provides functions for beekeeping associations: evidence of beekeepers, treatments reports, etc. For citizen science offers functions [3]: collecting phenological records, collecting data from beehives (weight, inside temperature, outside temperature, humidity). Records from the beehive weighingmachine and phenological records from beekeepers are collected on the web portal - paired data can be used for monitoring the landscape fertility. 3 Results and Discussion 3.1 System Dynamic Model of Beehive Trophic Activity The S F D of the trophic activity of bees was created based on C L D , results of our previous research [3] and continuous results from the Včelstva Online web portal (the research project No. 2019B0001, Internal Grant Agency, F E M C U L S Prague). This model represents the bees' behavior during a nectar collection season i n the spring and it simulates the bees' behavior for one week. The key point of this simplified model of the beehive trophic activity (Figure 1) is the weight of the beehive, which consists of a stock of honey and sugar solution, the weight of bees, and the weight of beehive construction. The weight of beehive is related to another monitored variables - inside temperature of beehive, outside temperature outside of the beehive and beehive's humidity. These variables are collected by beehive weighing-machines. These variables are closely connected and indirectly describe the trophic activity of bees. (Figure 1). The S F D of the trophic activity of bees represents simplification and abridgment of biologic process, which takes place in a beehive with distinction of day and night time in the time during the main beekeeping season. Very important i n the model presented by the S F D are two main feedback loops. The reinforcing feedback loop (in the model displayed by R) describes nectar collection and honey production. With the suitable outside temperatures, healthy bees are physically active in the landscape, collect nectar and process it into honey. The bees collect nectar i n the area until the stock of nectar i n the landscape is used up, which describes the balancing loop (B). The health of the bees is affected not only by the stock of honey, but also by humidity inside the beehive and by a current temperature. The bees tend to balance a difference between outside and inside temperature in order to preserve a usual (optimal) inside temperature. In summer they cool themselves by increasing the humidity inside the hive and when it is cold outside, they are moving (physically active inside the beehive) to raise the inside temperature. Increased physical activity causes excessive honey consumption, which can decrease the vitality of bees. When their vitality decreases, their death rate increases. To help them survive winter, the beekeeper can feed them with sugar solution, which supplements the honey. The honey production is affected by the landscape fertility, which is determined by the species diversity of plants, vegetation and crops in the landscape, stock of nectar and stock of pollen. The production is also influenced by the physical activity of bees i n the landscape, which is caused by warm outside temperature and light through a day. Cold outside temperature causes low physical activity of bees in the landscape and thus low nectar collection and honey production. In this simplified model representing the behavior of bees, a Test of Model Structure was performed by comparing the structure of the model with knowledge of the real system, simplified for the situation represented by the SFD. The Dimensional Consistency was also conducted. After inserting the real data of outside temperature into the model, the development of the weight values was determined and i n comparison with the real data, M A P E = 5.49 277 % was calculated according to formula 1. The highest deviation from the real data was 13.80 %. The whole model including simulation runs was developed in Vensim software (industrial strength simulation software for improving the performance of real systems). Figure 1 Stock and Flow Diagram - Beehive Trophic Activity 3.2 Trophic Activity of Bees For beekeepers is very important to monitor the trophic activity of bees (i.e. activities aimed at procuring food supplies for bees) continuously for procuring prospering bee breeding. The trophic activity of bees during the season consists of gathering food supplies from the beehive vicinity (radius 5 km) i n the form of nectar and pollen. Bees make the gathering of food supplies every day i n relation to outside temperature, when they leave the beehive (the weight of beehive decreasing) and with the time lag bees return (the weight of beehive increasing). Mainly, the observation of the outside temperature and inside temperature of the beehive is crucial because the fluctuation of temperature could endangered bees. The too low or too hight outside temperature could cause impossible or decreased gathering of food supplies, too low or to hight inside temperature could be endangered beehive (potential death of bees i n the consequence of freeze or overheating). The trophic activity of bees, especially the physical activity of bees i n the landscape, is influenced by outdoor weather, especially outside temperature, and the length of sunlight. The physical activity of bees in the landscape is directly affected by the outside temperature. If the outside temperature is low, the physical activity of bees also decreases, and i f the outside temperature is too low (less than 12°C), the bees do not fly outside at all. The real data were used to verify the functionality of the model by conducting one simulation run (the data describe weekly observations, i.e. 168 hours; the one-week period is the usual period of monitoring beehives on the Včelstva Online web portal; the weekly collection of the observations took place at a selected bee habitats during the beginning of the beekeeping season). After real data entry of the outside temperature to the model, the model simulates the 278 physical activity of bees in landscape (Figure 2). Based on the model outputs, it is evident that at lower temperatures there is lower activity of bees. To simplify, only outdoor temperature was included as a weather factor in this model. But of course, other weather factors such as rainfall, sunshine intensity and others also influence the activity of the bee colony. For simplicity's sake, the model does not operate with the inside humidity of the hive as a stock variable. This simplification is expected to be removed i n the further development of the model. • Outside Temperature Physical Activity of Bees in Landscape m m _ _ t - ^ t - ^ c o c o o ^ a ^ T-t \0 -r-t \0 -r-t \D ^" ^ lil i/i *o >oTime (Hour) Figure 2 Physical Activity of Bees O f course, the collection of pollen and nectar needed for honey production also depends on the physical activity of the bees in the landscape. If the environment is cold outside, then with lower physical activity of the bees there is a reduced collection of nectar and thus a lower production of honey. Even this situation can be simulated by the presented model and it is clear from the graph (Figure 3) that i n cold weather the production of honey is very low, while at higher outside temperatures the production of honey is higher. • Outside Temperature — — Stock of Honey and Sugar Solution r-t \D t CO CO C Time (Hour) Figure 3 Honey Production of Bees The outside temperature does not only affect the physical activity of the bees i n the landscape, but also i n the beehive. It also affects the activity of bees inside the beehive. If the outside temperature is too low, the activity of the bees inside the beehive increases, thus also the inside temperature increases to be as close as possible to their conformal zone around 24°C. In fact, no matter how the temperature fluctuates outside, the average internal temperature of the beehive should be around this conformal temperature. 3 6 3 4 3 2 3 0 V 2 6 ^ 2 4 _ 2 2 3 2 0 to 1 8 1 6 Cu 1 4 B 1 2 o3 1 0 E- 8 6 4 2 0 Average Inside Temperature •Inside Temperature of Bee Cluster Average Outside Temperature Outside Temperature 1—I L/} CO t T—I i> co p= co m c Time (Hour) Figure 4 Outside and Inside Temperature of Beehive 279 Measuring the weight and temperature of beehives has been a common research and beekeeping practice for many decades. The contribution of followed research consists of online collected data due to the web portal Včelstva Online and i n proposing S F D for the trophic activity of bees. Although the presented real data, which was used for model verification, could be interpreted in a much wider context. Their course gives evidence of the function and usability of the model for potential prediction of the trophic activity of bees at the real beehive station. 4 Conclusion The paper deals with creation and application of System Dynamic Model in the form of Stock and Flow Diagram. The proposal follows previous works of authors, especially the model i n the form of Causal Loop Diagram [3], The S F D of the trophic activity of bees is designed and verified on the basis of real data and knowledge obtained from long-term research of authors of Citizen Science type (beekeepers from the public could use beehive weighing-machines and web portal Včelstva Online). The results provide a partial point of view that applies to the field of breeding and protection of bees i n the case of prediction of elementary parameters of beehive of the beehive station. The presented SFD is possible to use as a basic concept for further development of web portal Včelstva Online (e.g. for prediction of beehive condition for beekeepers regards to expected weight) or for further research i n the field of fertility landscape for bees - for different types of landscape can be expected different trophic activity of bees. The result is necessary to put to professional discussion and further verification against real data. Plans for further extensions and development of the model are mainly to include other weather conditions influencing bees' activity, such as rainfall, sunshine intensity, possibly windiness, and scaling of colonies by population size, among others. Furthermore, modelling the effect of nectar and pollen supply in the surrounding area could prove helpful. 5 Acknowledgements This research is supported by the grant No. 2019B0001 "Monitoring a modelování trofické aktivity včeľ of the Internal Grant Agency of the Faculty of Economics and Management, Czech University of Life Sciences Prague. References [I] Bagheri, S. (2019). A mathematical model of honey bee colony dynamics to predict the effect of pollen on colony failure. PLoS ONE 14(11): e0225632. [2] Bala, B . K . , Arshad, F . M . & N o h K . M . (2017). System Dynamics: Modelling and Simulation. Springer Texts in Business and Economics, I S B N 978-981-10-2043-8. [3] Bartoška, J., Šubrt, T., Rydval, J., Kazda, J., Stejskalová, M . (2020). SystemDynamic Conceptual Model for Landscape Fertility of Bees. In: Proceedings of the 38th International Conference On Mathematical Methods in Economics. Brno: Mendel University. I S B N 978-80-7509-734-7. [4] Bruce (2014). Modeling Dynamic Biological Systems. Springer International Publishing, 2nd Edition. I S B N : 978-3-319-05614-2. [5] Forrester, J.W. (1968). Principles of system dynamics. M I T Press, Cambridge, M A . [6] K i m , D . H . & Senge, P . M . (1994). Putting systems thinking into practice, Syst. Dyn. Rev., 10 (2-3), 277-290. [7] Krejci, I., Moulis, P., Pitrova, J., Ticha, L , Pilar, L . and Rydval, J. (2019). Traps and Opportunities of Czech Small-Scale Beef Cattle Farming. Sustainability. 11 (15), ISSN 2071-1050, D O I : 10.3390/sul 1154245. [8] Lechmere-Oertel, R. G. (2005). Landscape dysfunction and reduced spatial heterogeneity i n soil resources and fertility i n semi-arid succulent thicket, Austral Ecology. 30 (6) 615-624. [9] Meadows, D . H . (2008). Thinking in Systems. A Primer. Chelsea Green Publishing Company. [10] Russel (2013). Dynamic modelling of honey bee (Apis mellifera) colony growthand failure. Ecological Modelling 265 (2013) 158-169. [II] Senge, P . M . (2006). Fifth Discipline: The Art and Practice of the Learning Organization. Random House Business Books. [12] Simon, H . A . (1979). Rational decisionmaking in business organizations, Am. Econ. Rev. 69 (4), 493-513. [13] Smith K . T. (2010). Biogeochemistry and landscape fertility. The Landsculptor, February: 43-45. [14] Sterman, J.D. (2000). Business Dynamics: System Thinking and Modelling for a Complex World. Irgwin/McGraw-Hill, Boston. 280 An Analysis of Dependence between German and V4 Countries Stock Market Radmila Krkoskova1 Abstract. The topic of relations between individual markets has been frequently discussed recently. Especially on the stock markets, we can watch a tendency of the more developed markets to affect developments on the less developed markets. This is also valid for the V 4 stock markets, where it is potential to anticipate a strong influence of the German stock market. There has been used the Granger causality. Quarterly data for the period from 2005/Q1 to 2020/Q4 was used for the analysis. This period has been selected because all of the V 4 countries have been members of the European Union since 2004.The EViews software version 11 was used for the calculations. Variables used in this research are the stock exchange indices of the countries. The P X , S A X , B U X , W I G 20 and D A X stock indices are considered to be the crucial representatives of individual stock markets in this work. The results show that German D A X stock index was Granger-causing the development of the Czech (PX), Hungarian ( B U X ) and Polish (WIG 20) stock indices. Keywords: A D F test, Granger causality, stock market, V 4 J E L Classification: C19, F65, G15 A M S Classification: 62P20, 91B28 1 Introduction The Visegrad Four (V4), an informal grouping of the Czech Republic, Slovakia, Poland and Hungary, commemorates 30 years since its inception. According to recent research the biggest economic leap in three decades was made by Poland, but the economic leader of the countries is still the Czechia. Some analysts also believe that today Visegrad has a rather political significance that outweighs the economic one. As far as the economic level is concerned, the V 4 countries have, above all, a very strong individual economic connection with Germany, both in terms of investment and foreign trade. Economic flows within the V 4 region are weaker than between the individual V 4 countries and Germany. For example, the share of foreign trade in goods in 2020 in the case of the Czech Republic with the V 4 countries was less than two-thirds of what it did with Germany. Although the starting positions of individual states were relatively different in the early 1990s and the elements of economic transformation also varied considerably, all four countries managed to compare the values of macroeconomic indicators in thirty years. Selected macroeconomic indicators of the V 4 countries and Germany are shown in Table 1. Indicatior GDP per capita, PPP (U.S.dollars) Unemployment rate (%) Inflation (%) Country/Y ear 1991 2019 1991 2020 1991 2020 Czech Republic 20962 40836 4.1 3.1 56.7 3.2 Slovakia 11728 31966 11.8 7 61.2 1.6 Hungary 16477 32644 12.3 4.3 35 2.4 Poland 10517 33222 11.8 3.3 59.4 3.4 Germany 38360 53785 5.3 4.3 4 0.4 Table 1 Selected macroeconomic indictors Table 1 shows that in 2020 Slovakia reached the highest value of unemployment rate and it was 7% whereas values of other countries were lower. A s for inflation, Poland has 3.4%, and it is the highest value from those countries. Germany has the highest value of G D P per capita. 1 School of Business Administration in Karviná, Silesian University in Opava, Department of Informatics and Mathematics, Univerzitní náměstí 1934/3, 733 40 Karviná, Czech Republic, e-mail: krkoskova@opf.slu.cz 281 2 Literature Review and Data 2.1 Literature Review From the point of view of the markets of the V 4 countries (Czech Republic, Hungary, Poland, Slovakia), it is possible to assume a significant influence of the German stock market in particular. This is due not only to geographical proximity but also to strong economic ties, as shown by several studies (Baláž and Hamara, [3]; Elekdag et al.,[5]; Komárek et al., [12]; Taušer et al., [20]. In these countries, German capital is strongly active in the form of foreign direct investment, and so this correlation is logical. Developments in the stock markets of Central and Eastern Europe (CEE) are a topic that has been addressed by many authors. Hegerty [ 10] deals mainly with the effects of oil price volatility on their stock markets. He concluded that the impacts vary significantly, depending on the level of the country's economic level. Ison and Hudson [11] addressed the question of whether it is possible to predict developments in the stock market markets of the C E E region by analyzing previous price movements. Arendáš and Chovancová [1] found that the stock markets of the V 4 countries, with the exception of the Slovak stock market, tend to perform significantly better in the winter than in the summer half of the year. In the case of the Slovak stock market, the difference between the two half-years is negligible. Arendáš et al., [2] describe the influence of German stock market on stock market of V 4 countries. Pietrzak et al. [16] and Cevik et al. [4] examined the interdependencies between the individual stock markets of the C E E countries. Reboredo et al. [17] examined the dependencies between the stock markets of the Czech Republic, Hungary, Poland and Romania. They found that there was a strong positive dependence between the stock markets of the Czech Republic, Hungary and Poland. The relationship between these three stock markets and the Romanian stock market is significantly weaker. The effects of the global financial crisis on the stock markets of the C E E region were also addressed by Olbrys and Majewska [15], while Vychytilová [21] focused mainly on the post-crisis development of the stock markets of the V 4 countries. 2.2 Data and Methods The aim of this paper is to examine the existence of a causal relationship between the German stock market and the stock markets of the V 4 countries in the period 2005-2020. Quarterly data for the period from 2005/Q1 to 2020/Q4 was used for the calculations. A l l values were seasonally adjusted and were considered in logarithmic terms. The EViews software version 11 was used for the calculations. Time series were obtained from the Bloomberg database. The German stock market is represented by the D A X stock index, which includes the 30 most important German joint stock companies. The stock markets of the V 4 countries are represented by their main stock indices, namely: B U X (Hungary), P X (Czech Republic), S A X (Slovakia) and W I G 20 (Poland). There is tested following hypothesis: There is a Granger causality between the individual stock markets of the V 4 countries and the German stock market, when the German stock market represented by the D A X stock index influences the development of the stock markets of the V 4 countries, represented by the P X , B U X , W I G 20 and S A X stock indices. 350 300 250 200 150 100 50 0 L/l L/l IX) 00 00 c n o e,Vi = 1,2, ...,m, VjH > e,Vj = 1,2, ...,n, where the model is constructed for unit H, which is one of the p units. Input resp. output variable is arranged in matrix X= {xik, i= 1, 2, ... ,m,j= 1, 2, ... ,p} resp. Y= {yik, i = 1, 2, ... , n,j = 1, 2, ... ,p}. The e is the socalled infinitesimal constant. The Malmquist productivity index (MI) is used to calculate the change in efficiency over time. This index allows us to monitor separately the change in technical efficiency (the so-called catch-up effect) and changes in the production frontier (frontier-shift). Values greater than one in the index itself or its sub-components mean an improvement in the given area, values less than one mean a deterioration. The M I can be defined as the geometric mean of two efficiency ratios, where one is the efficiency change measures by the period 1 technology and the other is the efficiency change measured by the period 2 technology: \-lV2 E ~ M . A u . Vu r I " " l U u . V n l Ml = To calculate the M I , it is therefore necessary to solve four linear programs (i.e. four C C R models in this paper), which correspond to the four terms that make up the M I . Technical details about all the above stated procedures can be found in [3]. The calculations are performed in the M A T L A B R2021a computational system and in D E A SolverPro version 15. 3 Results The results of the efficiency scores of individual countries in particular years are recorded in Figure 1. In 2015, there are four fully efficient countries: Austria, Belgium, Netherlands and Spain. In 2016, Denmark joined this group. A l l these results are supported by the European Commission's reports [4]. Austria has earned its privileged position mainly on above-average labor productivity with regard to the size of the working-age population. Furthermore, it has taken several initiatives to support entrepreneurship and, in particular, start-ups. The fact that the production index in the sector increased by 20% (the index is only available for the total period from 2010 to 2019) also corresponds to the full efficiency of Austria, and an increase of 15% in the number of enterprises was also observed here. G D P and volume index of production are increasing in Belgium over the period. Efficiency has also been influenced by the increasing investment ratio and mean equalized net income followed with total population living in cities. In Netherlands the volume index of production since 2013 and the number of enterprises has significantly increased. The Netherlands is among the most productive economies in the E U , with a 29% higher labor productivity rate than the E U average. The unemployment rate is 2.2% and it is below the E U average of 6%. The Netherlands is also 25th out of 190 countries in terms of starting up a business. Denmark is a leader in eco-innovation and sustainable construction. The Danish government is systematically trying to strengthen productivity and employment in the building sector. In addition to Denmark, France, Germany, Italy, Sweden, Slovakia and the U K are among the countries with high efficiency scores during the first two years. B y contrast, countries such as Bulgaria, Hungary, Romania and Lithuania have the lowest efficiency scores. In terms of absolute efficiency values, Bulgaria lags the most. In the case of this country, several problem areas can be identified. Examples include: the continuing shortage of a skilled and professional workforce, the business environment remains heavily regulated, the lower digital skills of its workforce, labor productivity significantly below the E U average, late payments and a large budget deficit [4]. 289 In 2017, we have data on only 17 countries and the results of efficiency in this period therefore represent a less comprehensive view of the surveyed sector. However, despite this problem with 2017 data availability, Austria, Belgium and Netherlands are among the top countries. Unfortunately, the last of the fully efficient countries from previous years (Spain) did not have data available. Even in the last monitored year, Bulgaria is in the last position in terms of the order derived from the resulting efficiency. However, Hungary, for example, improved, skipping Greece, which took the second worst place in terms of efficiency in 2017. A detailed view of efficiency changes over time was performed using the Malmquist index. The changes in the efficiency of individual countries calculated on the basis of the breakdown of the Malmquist index are presented in Figure 2. A s it is not possible to use an unbalanced data set when calculating the Malmquist index, the index was calculated only for 17 countries that have values available for the entire reference period. E f f i c i e n c y s c o r e in 2 0 1 7 1.0 I 0.8 0.6 0.4 0.2 1.0 0.8 0.6 0.4 0.2 E f f i c i e n c y s c o r e in 2 0 1 6 wmmm1.0 0.8 0.6 0.4 0.2 E f f i c i e n c y s c o r e in 2 0 1 5 mmmrT1 HP Figure 1 Efficiency scores of individual countries in particular years Countries that have values higher than 1 in Figure 2 were able to increase their efficiency over time. Conversely, for countries with values below 1, efficiency has decreased over time. A value of 1 means that there have been no changes in efficiency in the case of a given country. It was found that the largest efficiency gains from 2015 to 2016 were recorded in Bulgaria and the Netherlands. The government of Bulgaria invested a large amount to renovate and modernize new highways, railway, and metro in this period. Also, the European Structural Invest Fund (ESIF) provided Bulgaria with development finance. When comparing the changes from 2015 to 2016, efficiency dropped dramatically in the case of Lithuania. On the other hand, between 2016 and 2017 Lithuania is one of the countries with the biggest increase in efficiency. In 2016 the consumer confidence indicator in Lithuania was -7.6 which is below the E U average of -6.3. This fact reflects the continuous risk perception in the construction sector. However, all confidence indicators have generally improved later. After 2016, Lithuania also improved in terms of dealing with construction permits and 290 it is considered as a moderate innovator [4]. A similar development in efficiency as for Lithuania has been observed in the Czech Republic. When monitoring the efficiency changes between 2016 and 2017 (upper part of Figure 2), it can be stated that in this period there were no such dramatic changes as in the previous period. In general, it can be stated that the efficiency of countries increased more often than decreased. The median and average value of the change in efficiency is equal to 1.01 in this period. Efficiency change from 2016 to 2017 1.10 I 1 1 1 1 1 — i 1 1 1 1 1 1 1 1 1 1 Figure 2 Change in the efficiency of individual countries based on the decomposition of the Malmquist index 4 Discussion When evaluating the efficiency of the construction sector, it was found that most countries were identified as inefficient. Only around seven countries maintained a high efficiency score during the whole period. The construction sector is currently still a relatively little explored area and there are very few studies focusing on the efficiency of this sector. However, it can be stated that the results of this study are in agreement with the analyzes performed on the microdata in the study [15]. Study [15] showed that companies within the Czech construction industry were not doing their best during 2013 to 2015. Even in the case of our evaluation at the level of individual E U countries, the Czech Republic is among the ten worst countries in 2015 (as in 2016 and 2017). The reliability of the performed analyses is also supported by the results of study [11], where multiple problems were identified in the case of the U K . Although it was not possible to fully analyse the U K due to the unavailability of data, from 2015 to 2016 the U K was always classified as an inefficient country. Due to the poor availability of data for 2017, it is necessary to take the resulting values of efficiency in this period only as a basic view of the issue. For a more detailed description of construction sector, it would be better to do a deeper survey in the construction sector by extending the timeline and including all E U countries. Using a longer period of time, the plausibility of D E A models could be verified using competitive (parametric) meth- ods. 291 5 Conclusion In this article, we focused on evaluating the efficiency of the construction sector at the level of individual E U countries. The results showed that Austria, Belgium, Denmark, the Netherlands and Spain are among the most powerful countries. B y contrast, Bulgaria lags behind the others the most. Based on the analysis of changes in efficiency, it can be stated that over time, changes (either in the positive or negative direction) in efficiency have decreased. In general, it can be said that the selected countries have succeeded in improving the efficiency of the sector over the years. Governments are striving for higher company competitiveness and sustainable and ecological development of the construction sector. Among other things, the pressure to invest in equipment innovation and modernization is increasing. However, the importance of construction in the economy is still overlooked from a research perspective. Acknowledgements This article was supported by the grant N o PEF/TP/2021002 of the I G A P E F M E N D E L U Grant Agency . References [I] Ahmad, T. & Thaeheem, J. M . (2018) Economic sustainability assessment of residential buildings: A dedicated assessment framework and implications for B I M . Sustainable Cities and Society, 38, 476—491. [2] Banker, R. D., Charnes, A . , Cooper, W . W . (1984). The Some models for estimating technical and scale inefficiencies in data envelopment analysis. Management science, 30(9), 1078-1092. [3] Cooper, W . W., Seiford, L . M . & Tone, K . (2007). Data Envelopment Analysis: A Comprehensive Text with Models, Applications, References and DEA/Solver Software. New York: Springer Science & Business M e ­ dia. [4] European Commission. (2021). Country fact sheets [online]. Available at: https://ec.europa.eu/growth/sectors/construction/observatory/country-fact-sheets_en. [Accessed: 29 January 2021]. [5] Farrell, M . J. (1957). The measurement of productive efficiency. Journal of the Royal Statistical Society, 120(3), 253-290. [6] Gaebert, T. & Staňková, M . (2020). Efficiency Development in the German Pharmaceutical Market. Acta Universitatis agriculturae et silviculturae Mendelianae Brunensis, 68(5), 877-884. [7] Galitskova, Y . & Mikhasek, A . (2017). Efficiency of construction waste recycling. MATEC Web of Conferences, 117(47), 00055. [8] Charnes, A . , Cooper, W . W . & Rhodes, E. (1978). The measurement of productive efficiency. Journal of the Royal Statistical Society, 2(6), 429—444. [9] Klepeis, N . E . et al. (2001). The National Human Activity Pattern Survey (NHAPS): A resource for assessing exposure to environmental pollutants. Journal of Exposure Analysis and Environmental Epidemiology, 11(3), 231-252. [10] Langston, C . (2014). Construction efficiency: A tale of two developed countries. Engineering, Construction & Architectural Management, 21(3), 320-335. [II] Meng, X . (2012). The effect of relationship management on project performance in construction. International Journal of Project Management, 30(2), 188-198. [12] Staňková, M . (2020). Efficiency comparison and efficiency development of the metallurgical industry in the E U : Parametric and non-parametric approaches. Acta Universitatis agriculturae et silviculturae Mendelianae Brunensis, 68(4), 765-774. [13] Staňková, M . & Hampel, D . (2020). Efficiency Assessment of the U K Travel Agency Companies - Data Envelopment Analysis Approach. Mathematical Methods in Economics 2020: Conference Proceedings. Brno: Mendelova univerzita v Brně, 550-556. [14] Staňková, M . & Hampel, D . (2019). Bankruptcy Prediction Based on Data Envelopment Analysis. Mathematical Methods in Economics 2019: Conference Proceedings. České Budějovice: Jihočeská univerzita v Českých Budějovicích, 31-36. [15] Staňková, M . & Hampel, D . (2018). Efficiency Comparison in the Development of Building Projects Sector. Mathematical Methods in Economics 2018: Conference Proceedings. MatfyzPress: Praha, 503-508. 292 The path-relinking based search in unit lattice of m-dimensional simplex Marek Kvet1 , Jaroslav Janáček2 Abstract. Path-relinking based searches proved to be very powerful optimization tools, when applied to the problems, solutions of which can be represented as a subset of m-dimensional unit hypercube vertices. The efficiency of the searching strategies can be considerably improved, if the starting set of solutions is uniformly deployed over the set of feasible hypercube vertices. The corresponding /^-location problems solvable by the original path-relinking based methods assume that exactly p locations must be chosen from a set of m possible places. This way, each solution can be described by a zero-one m-dimensional vector, which coincides to a hypercube vertex. If the /^-location problem is generalized so that its formulation admits to locate more than one facility at one place, then the set of feasible /^-facility location solutions extends from the subset of hypercube vertices to a subset of integer point belonging to a facet of the simplex. Within this contribution, we suggest a pathrelinking method, which is able to cope with the generalized set of feasible solutions of the /^-facility location problem with multiple locating. Furthermore, we provide the generalized path-relinking search with an extension of a uniformly deployed set to obtain a starting set suitable for a search in a simplex facet. The appended computational study proves that the generalized path-relinking based search can reach a near-to-optimal solution of the real-world location problem. Keywords: location problems, multiple facility location, path-relinking method, generalized set of solutions J E L Classification: C44 A M S Classification: 90C06, 90C10, 90C27 1 Introduction The recent model of the Emergency Medical Service (EMS) system design problem reflects two substantial generalizations. The first of them takes into account the fact that demands for service emerge randomly and the ability of the service centers are limited. Consequently, the generalized approach has to model the cases, when a current demand cannot be served from the geographically closest service center, but from the nearest available one. This generalization is based on introducing a series qi, qj, qr of probability values, where qt is the probability of the case, in which the fe-th nearest center to a demand location j is the first available one, because the nearer centers are busy with servicing previously occurred demands [6, 7, 10, 13]. The second generalization of originally binary mathematical programming model is connected with the usually used data model of a serviced region. The serviced region is described by a road network graph with finite set of nodes, at which demands for service occur and the service centers are located at. The original binary models enable only to choose a given network node for a center location or leave it unused [1, 3, 11, 12]. Here, it is not possible to equip a service center with more than one facility, such as an ambulance vehicle, without having to create a more complex model with an enormous number of allocation variables and capacity constraints. The first generalization of the emergency system model enabled to perform the second model generalization, in which more than one facility can be assigned to the same network node in order to locate a service center [5]. Obviously, the more complex the associated mathematical models are, the more difficult is the challenge to find the optimal solution of the problem. Common exact methods [1,3] usually suffer from unpredictable time demands, because a big part of total computational time is spent by verifying the optimality of the best-found solution. Therefore, emphasis is currently being placed on the development of effective heuristic and metaheuristics methods [2, 4, 14, 15]. Among all the directions that research in this area takes, we will limit ourselves only to 1 University of Žilina, Faculty of Management Science and Informatics, Univerzitná 8215/1, 010 26 Žilina, Slovakia, marek.kvet@fri.uniza.sk 2 University of Žilina, Faculty of Management Science and Informatics, Univerzitná 8215/1, 010 26 Žilina, Slovakia, jaroslav.janacek@fri.uniza.sk 293 the algorithms based on processing a population of feasible solutions. In this contribution, we suggest a pathrelinking method, which is able to cope with the generalized set of feasible solutions of the /^-facility location problem with multiple center locating. Our suggestions are experimentally verified using real data obtained from existing E M S system operating on the road network of Slovakia. 2 Generalized Emergency System Design Model The emergency system design problem can be formulated as a task to deploy p facilities in a set of network nodes to minimize average response time of the system to the users' demands. It is assumed that the demands can emerge only at nodes 1, n of the network and the frequency of the randomly occurring demands at the node j is estimated by a constant bj. Let a set I of m network nodes be specified as a set of possible service center locations. The network time-distance necessary for traversing the network from possible center location iel to a demand location j will be referred as dy. It is also assumed that the probability values qi, ...,qr are at disposal. The substantial decisions on the facility deployment will be modelled by a vector ye {Z+ }'" of nonnegative integer components, where yi for i = 1, m gives the number of facilities located at the possible service center location i. After these preliminaries, the following expressions (1) and (2) can formulate the combinatorial model of the generalized E M S system design problem. Minimize / ( y ) = X ^ Z ? A ( y , * , A ; W Subject to: y e Y = | x e {0,...,p}m ,j^x,. = p j (2) The mapping e: {0, ..., p}m x {1, r} x {1, « } — 1 , m} is defined according to the following course of actions. e(y, Kj) 0. Order subscripts ieK={i=l, m: y(i) > 1} into a sequence z'(l), i(\K\) according to increasing distances of the node i from the node j, i.e. dni\j < d-,(2),j < ... < di(\K\)j. 1. Define e(y, k, j) = i(t) so that the following inequalities hold for /: y(i(\)) + ...+ y(i(t - 1)) < k {1, p} and y2 : L2 —> {1, ...,/?}, which give the number of facilities located at individual centers of the considered solution. The procedure is described below. PathRelinkingGP( Ľ, y', L2 , y2 ) 0. Initialize Ľ*, y* by argmin {fiL1 , y1 ), f\L2 ,y2 )} and the initialize the following sequence of location subscripts: Lr = {is Ľ n L2 : y'(i) = f (i)}, L+ = {isĽ n L2 : y'(i) < f (i) },L = {isĽ n L2 : y'(i) > y2 (i)}, L" = Ľ - L2 and Le = L2 - V. 1. If M(Ľ, y1 , L2 , y2 ) > 2 perform step 2, else return Ľ*, y* and stop the run of procedure. 2. Find ľ, j* = argmin{f(Relocation(V', y1 , i, j)): is Ľ u L" , jsL+ u U } and update L1 , y1 = Relocation(V, y', i*, f). Update L*, y* by argmin{f(Ľ, y'),f(L*, y*)}. Swap L1 , y1 with L2 , y2 . Redefine Lr = {is Ľ n L2 : y'(i) = f (i)}, L+ = {is Ľ n L2 : y'(i) < f (í) }, L = {is Ľ n L2 : y'(i) > y2 (i)}, Ľ1 = L1 - L2 and Le = L2 -Ľ. G o to step 1. Comments: Function Relocation{L, y, i, j) returns solution L, y changed according to the following rule. Relocation(L, y, i,j) y(i) =y(i) - 1; \fy(j) = 0, then L = L - {i}. If js U, then L = LVJ {j} and y(j) = 1, else y(j) = y(j) + 1. Return L, y. The symbol M(Ľ, y', L2 , y2 ) stands for Manhattan distance of solutions Ľ, y' and L2 , y2 computed by (4). M ( L 1 , y , L 2 , y 2 ) = X ( y 2 ( 0 - y ( 0 ) + E ( y 1 ( 0 - y 2 ( 0 ) + E y 1 ( 0 + E y 2 ( 0 (4) í'e Ú ÍE L í'E L" Í'e Ľ It must be noted, that the complexity of obtaining i*,j* in the step 2 is \L u L'l \ x \L+ u U\, while the complexity of associated step of the original path-relinking method is only | Ln \ x | U\. This may influence efficiency of the approaches based on usage of path-relinking method, when applied to the /^-location problems with multiple facility location. 4 One-To-All Strategy with Extension One-to-all strategy proved to be very efficient way of search, when used for original emergency service system design modelled by means of the binary linear programming [8]. This original approach starts with so-called uniformly deployed set of feasible solutions, which was considered an initial swarm of particles. The solution with minimal objective function value was used as a leader of the particle swarm optimization process. Contrary to the other particle swarm optimization strategies, positions of particles were not be changed with exception of the leader's position, which was updated after each inspection of the shortest path connecting the current leader position to a particle position. The inspection of the path was performed by the original path-relinking method adjusted for the problem (1), (3). A direct application of the above strategy together with usage of the uniformly 295 deployed set to the /--location problem with multiple facility location is not possible due to the inborn property of the path-relinking method. The method is unable to inspect any solution lying outside the sub-space restricted by the different components of the input solutions. To employ benefit of uniformly deployed sets, an extension of them has been suggested to enable the search in the neighborhood of a promising solution outside the m-dimensional unit hypercube. We came from the obvious preposition that the model (1), (2) enables to locate at most r facilities to a one service center location. Furthermore, we did a heuristic assumption that the most important possible service center locations are those, which have the biggest population. Then, the extension algorithm, which can adjust solutions of a uniformly deployed set for the one-to-all strategy solving the /--location problem with multiple facility location, can be constituted in the following way. Input of the algorithm is represented by parameter noPref and a uniformly deployed set S = {P1 , P2 , P^s } of /--tuples of service center locations. Each /--tuple P'eS corresponds to one feasible solution of the problem (1), (3), i.e. it is a vertex of an m-dimensional unit hypercube. The denotation of "uniformly deployed" means that Hamming distance between each pair of vertices of S is bigger than a given threshold, the value of which is usually near to the value 2p. The output of the algorithm is a set E of feasible solutions W, y' of the problem (1), (2), where some of the solutions lie outside the unit hypercube. Extension(S, noPref) 0. Compute noOij) for each possible center location j=l, m, where noOij) is the number of occurrences of j in elements of S. Determine list LPref of noPref maximally populated possible center locations. 1. Process /--tuples P1 , P2 , P^ according to step 2 and, after having processed all P\ go to step 3. 2. Find location ke P', which belongs to LPref. If there is no such k, then define W = P' and y'(j) = 1 for each jeW. Otherwise, initialize W = {k} and y'(k) = 1 and perform the following decisions for each other element jeP' - {k}: If y'(k) < r and noOij) > 1, then set y'(k) = y'(k) + 1 and noOij) = noOij) -1, otherwise set W = W u {j} and y'ij) = 1. 3. Return £ = { W ' , y , Wls l,ys l}. 5 Computational Study The further reported numerical experiments were suggested to prove or disprove the hypothesis that the pathrelinking method adjusted for processing solutions of the /--facility location problem with multiple facility location is able to reach as excellent results as the algorithms destined for problems defined only on a unit hypercube in m-dimensional space. Used benchmarks were derived from real emergency health care system, which operates in eight regions of Slovak Republic. For each self-governing region, i.e. Bratislava (BA), Banská Bystrica (BB), Košice (KE), Nitra (NR), Prešov (PO), Trenčín (TN), Trnava (TT) and Žilina (ZA), all cities and villages with corresponding number of inhabitants bj were taken into account. The coefficients b, were rounded to hundreds. The set of communities represents both the set J of users' locations and the set / of possible center locations as well. The coefficients qk for k=l...3 were set at the values: qi = 0,77063, q2 = 0,16476 and qs = 1 - qi - qj. These values were obtained from a simulation model of existing emergency medical system in Slovakia [10]. The optimal solutions of all studied problem instances are available and the associated objective function values can be found in [5]. We report them also in the column of Table 1 denoted by OptObjF. Table 1 contains also the problem sizes expressed by the values of m and p respectively. Table 1 Basic benchmarks characteristics and the optimal objective function values Region m P OptObjF B A 87 25 18450 BB 515 46 38008 K E 460 38 40711 NR 350 36 40987 PO 664 44 46884 TN 276 26 31260 TT 249 22 36401 Z A 315 36 36929 Since the suggested path-relinking method processes a set of solutions, the uniformly deployed set can serve as a source of necessary data for the algorithm. The process of uniformly deployed set construction and usage are 296 reported in [8] and [9]. The common property of a uniformly deployed set consists in the fact that an arbitrary permutation of the locations generates a new uniformly deployed set with the same characteristics. We used this property to obtain ten different starting sets for each self-governing region. Therefore, we report the average results of ten runs. For completeness, the numerical experiments reported in this paper were performed on a notebook equipped with the Intel® Core™ i7 3610QM 2.3 G H z processor and 8 G B of memory. The presented algorithms were implemented in the Java language making use of the NetBeans I D E 8.2 environment. Let us now focus on performed numerical experiments and on discussing the obtained results. As the usage of the generalized path-relinking method was conditioned by an extension of the input uniformly deployed set, a special attention was devoted to the setting of the parameter noPref. This number of the most populated service centers was determined proportionally to the number m of the possible service center locations. We report results obtained by performing three series of experiments for various percentages of m, which were used for noPref determination. The percentages were 1.5%, 10% and 20% subsequently. The results are summarized in the following Table 2, Table 3 and Table 4, which have the same structure. Each row of the table corresponds to one problem instance. The left part of each table contains the optimal objective function value OptObjF taken from [5]. This value is reported to make the path-relinking method results quality evaluation more convenient. The results obtained by suggested heuristic approach are reported by the following five values: The column denoted by ObjF contains the objective function value of the resulting solution, which was obtained in the computational time in seconds denoted by CT. The symbol noY denotes the average resulting number of service centers. Finally, Count_2 and Count_3 denote the average number of centers, where two or three facilities were located. Table 2 Results of numerical experiments for noPref = 1% of m Region OptObjF ObjF CT noY Count_2 Count_3 BA 18450 18752 0.68 24.70 0.30 0.00 BB 38008 38106 52.54 43.40 1.60 0.50 K E 40711 40734 28.80 37.00 1.00 0.00 NR 40987 41061 10.70 34.00 2.00 0.00 PO 46884 47012 65.80 41.10 2.90 0.00 TN 31260 31545 6.31 25.00 1.00 0.00 TT 36401 36780 3.67 21.20 0.80 0.00 Z A 36929 37030 8.78 35.00 1.00 0.00 Table 3 Results of numerical experiments for noPref = 10% of m Region OptObjF ObjF CT noY Count_2 Count_3 BA 18450 18717 0.69 24.60 0.40 0.00 BB 38008 38193 53.64 44.40 1.40 0.10 K E 40711 40761 29.22 37.30 0.70 0.00 NR 40987 41109 10.78 34.20 1.80 0.00 PO 46884 47132 66.93 42.00 2.00 0.00 TN 31260 31550 6.31 25.00 1.00 0.00 TT 36401 36801 3.66 21.30 0.70 0.00 Z A 36929 37070 8.82 35.20 0.80 0.00 Table 4 Results of numerical experiments for noPref = 20% of m Region OptObjF ObjF CT noY Count_2 Count_3 BA 18450 18717 0.70 24.60 0.40 0.00 BB 38008 38255 53.96 45.20 0.80 0.00 K E 40711 40809 29.12 37.70 0.30 0.00 NR 40987 41351 10.95 35.20 0.80 0.00 PO 46884 47230 67.53 42.80 1.20 0.00 TN 31260 31556 6.33 25.40 0.60 0.00 TT 36401 36807 3.66 21.40 0.60 0.00 Z A 36929 37133 8.84 35.60 0.40 0.00 297 6 Conclusions The main research topic of this paper was focused on extending the path-relinking method to be able to comply with the /^-location problem with multiple facility location. Mentioned algorithm adjustment could change its previous characteristics. Therefore, the performance of the algorithm was studied and the obtained results were analyzed from the point of solution accuracy. Furthermore, we provided the readers with the generalized pathrelinking search with an extension of a uniformly deployed set to obtain a starting set suitable for a search in a simplex facet. Since the performance and resulting solutions may be affected by algorithm parameter, we performed a case study to study mentioned possible impact. The obtained results reported in a separate section showed that the generalized path-relinking based search can reach a near-to-optimal solution of the real-world location problems in a acceptably short computational time. Thus, we can conclude that suggested heuristic method combined with the usage of uniformly deployed sets of solutions brings excellent results and may be practically used by the operations researchers and other professionals responsible for decision-making. Acknowledgements This work was supported by the research grants V E G A 1/0089/19 "Data analysis methods and decisions support tools for service systems supporting electric vehicles", V E G A 1/0689/19 "Optimal design and economically efficient charging infrastructure deployment for electric buses in public transportation of smart cities", and V E G A 1/0216/21 "Design of emergency systems with conflicting criteria using artificial intelligence tools". This work was supported by the Slovak Research and Development Agency under the Contract no. APVV-19-0441. References [I] Avella, P., Sassano, A . & Vasil'ev, I. (2007). Computational study of large scale p-median problems. Mathematical Programming, 109, pp. 89-114. [2] Doerner, K . F., Gutjahr, W . J., Hartl, R. F., Kárali, M . & Reimann, M . (2005). Heuristic Solution of an Extended Double-Coverage Ambulance Location Problem for Austria. Central European Journal of Operations Research, 13(4), pp. 325-340. [3] Garcia, S., Labbé, M . & Marín, A . (2011). Solving large p-median problems with a radius formulation. INFORMS Journal on Computing, 23(4), pp. 546-556. [4] Gendreau, M . & Potvin, J. (2010). Handbook of Metaheuristics, Springer Science & Business Media. [5] Janáček, J. (2021). Multiple p-Facility Location Problem with Randomly Emerging Demands. In: Strategic Management and its Support by Information Systems 2021, Technical University of Ostrava, in print [6] Janáček, J. & Kvet, M . (2021). Efficient incrementing Heuristics for Generalized p-Location Problems. In: Central European Journal of Operations Research, Springer, in print [7] Janáček, J. & Kvet, M . (2020). Discrete self-organizing migration algorithm. In: Croatian Operational Research Review 11(2), pp. 241-248. [8] Janáček, J. & Kvet, M . (2020). Uniform Deployment of the p-Location Problem Solutions. In: Operations Research Proceedings 2019: Selected Papers of the Annual International Conference of the German Operations Research Society, Dresden, Germany, September 4-6, 2019: Springer, pp. 315-321. [9] Janáček, J. & Kvet, M . (2019). Usage of Uniformly Deployed Set for p-Location Min-Sum Problem with Generalized Disutility. In: SOR 2019 proceedings, pp. 494-499. [10] Jankovič, P. (2016). Calculating Reduction Coefficients for Optimization of Emergency Service System Using Microscopic Simulation Model. In: 17th International Symposium on Computational Intelligence and Informatics, pp. 163-167. [II] Jánošíkova, Ľ. & Zarnay, M . (2014). Location of emergency stations as the capacitated p-median problem. In: Quantitative Methods in Economics (Multiple Criteria Decision Making XVII), pp. 117-123. [12] Karatas, M . & Yakicia, E. (2019). A n analysis of p-median location problem: Effects of backup service level and demand assignment policy. European Journal of Operational Research, 272(1), pp. 207-218. [13] Kvet, M . (2014). Computational Study of Radial Approach to Public Service System Design with Generalized Utility. In Digital Technologies 2014, pp. 198-208. [14] Kvet, M . & Janáček, J. (2020). Spider network search strategy for p-location problems. In: CINTI 2020: 20th International Symposium on Computational Intelligence and Informatics, Budapešť, 2020, pp. 49-54. [15] Rybičková A . , Mocková D. & Teichmann D . (2019). Genetic Algorithm for the Continuous LocationRouting Problem, Neural Network World 29(3), pp. 173-187. 298 Portfolio discount factor evaluated by oriented fuzzy numbers Anna Lyczkowska-Hanckowiak1 , Krzysztof Piasecki2 Abstract In financial portfolio management, utilizing oriented fuzzy numbers is more useful than utilizing fuzzy numbers. Moreover, a portfolio analysis based on fuzzy discount factor is simpler than portfolio analysis based on return rate. For this reason, we consider here a discount factor evaluated by oriented fuzzy number. The main goal of our paper is to find an analytical formula describing portfolio expected discount factor as a function of expected discounts factors of portfolio components. In our considerations, we take into account the fact that addition of oriented fuzzy numbers is not associative. Therefore, we propose to calculate separately the weighted sum of positively oriented discount factors and the sum of negatively discount factors. Then the portfolio discount factor is obtained by weighted addition of these sums. Such a procedure for determining the discount factor of the portfolio is justified by economic premises. Keywords: discount factor, portfolio, oriented fuzzy number J E L Classification : C44, G i l A M S Classification : 03E72 1. Introduction Imprecision of financial data is usually modelled by fuzzy numbers (FNs). In [9] it is demonstrated that for a portfolio analysis, oriented FNs are more useful than FNs. Moreover, in the case of financial data imprecision the expected discount factor is a more convenient portfolio analysis tool than the expected return rate [10, 11]. For these reasons, the main aim of our paper is to study portfolio discount factor for the case when portfolio assets are evaluated by trapezoidal oriented FNs. 2. Trapezoidal oriented fuzzy numbers - basic facts The symbol T(M) denotes the family of all fuzzy subsets in the real line UL Fuzzy number (FN) is usually defined as a fuzzy subset of the real line UL The most general definition of F N was formulated by Dubois and Prade [1]. The set of all F N we denote by the symbol IF. A n y F N may be represented in following way Theorem 1 [2]. For any F N L there exists such a non-decreasing sequence (a, £>, c, d) c R that L(.a,b,c,d,LL,RL) = L E TÍM) is determined by its membership function •íi (- la, b, c,d,LL,RL) E [ 0 , l ] R described by the identity r 0, x £ [a,d], LL(x), x E [a,b[, 1, xe[b,c], ( 1 ) 0. (2) The notion of ordered F N is introduced by Kosiňski et al [3, 4]. From formal reasons, the Kosihski's theory is revised in [7]. In revised theory, the notion of ordered F N is narrowed down to the notion of oriented F N (OFN) defined as follows: [iL(.x\a,b,c,d,LL,RL) =< 1 WSB University inPoznari, Institute of Economy and Finance, ul. Powstaricow Wielkopolskich 5, 61-895 Poznari, Poland: anna.lyczkowska-hanckowiak@wsb.poznan.pl 2 WSB University inPoznah, Institute of Economy and Finance, ul. Powstaricow Wielkopolskich 5, 61-895 Poznari, Poland: krzysztof.piasecki@wsb.poznan.pl. 299 Definition 1 [7]. For any monotonic sequence i.a,b,c,d) c E , O F N L(.a,b,c,d,SL,EL) = L is the pair of orientation a,d = (a,d) and F N £ £ IF described by membership function pL(- l a , b , c , d , 5 i , £ ' i ) £ [ 0 , l ] R g i v e n by the identity 0, x£ [a, d] = [d, a], pL(x\a,b,c,d,SL,EL) = < SL(x), x E [a,b[ = ]b,a], 1, x e [fc,c] = [c,Z>], (3) • EL(x), x e]c,d] = [d,c[, where the starting function SL E [0,l[^a,b ^ and the ending function EL E [0,l\}c,d ^ are upper semi-continuous monotonic ones meeting the condition (2) Remark: The identity (3) additionally describes such modified notation of intervals which is used in the O F N theory. The notation 3 = 3C means that "the interval 3 may be equivalently replaced by the interval K". The relationships between FNs, ordered FNs, and OFNs are discussed in detail in [9]. The symbol IK denotes the space of all OFNs. If a < d then O F N £ ( a , b,c,d,SL,El) has the positive orientation a, a which informs us about possibility of an increase in approximated number. If a > d, then O F N L\.a,b,c,d,SL,EL) has the negative orientation a,d which informs us about possibility of a decrease in approximated number. If a = d, then O F N £ ( a , a, a, a,SL,EL) = [a] describes the real number a e l . A special case of OFNs are trapezoidal fuzzy numbers (TrOFNs). Definition 2. [6] For any monotonic sequence (a, b, c, d) c E , TrOFN Tr(a,b,c,d) = f is O F N f € K pT(x) = pTr(x\a,b,c,d) = 0, x g [a, d] [d,a], x—a b-a' i e [a,b[ ]a,b], 1, i e [b, c] [c,b], x-d c-d ' i e ]c,d] = [c,dl (4) The symbol KTr denotes the space of all TrOFNs. The space of all positively oriented TrOFNs is denoted by the symbol K*r. The space of all negatively oriented TrOFNs we denote by the symbol K f r . Let symbol * denotes any arithmetic operation defined in E . By symbol Ff] we denote an extension of arithmetic operation * to IK. Kosinski has defined arithrretic operators on ordered FNs in an intuitive way. The addition and dot product extended to the space LK have a very high level of formal complexity [9]. Therefore, in many applications researchers limit their calculations to arithmetic operations determined on the space LKr r . In line with the Kosinski's approach, we can extend basic arithrretic operators to the case of LKr r in such way that for any pair (rr(a,b,c,d),Tr(p — a,q — b,r — c,s — d ) ) E IK|r and B E US, arithmetic operations of extended sum EE3 and dot product • are defined as follows [6]: Tr(.a,b,c,d) EE3 7V(p - a,q - b,r - c,s - d) = 7>(min{p, q} , q, r, max{r, s}), (q < r) V (q = r A p < s), ^ . 7>(max{p, q] , q,r, min{r, s}), (q > r) V (q = r A p > s). B • fria.b.c.d) =fr(B -a.B-b.B • c,B -d). (6) In general, the TrOFNs addition is not associative [9]. Moreover, for any pair {Tr(a,b,c,d)jr(e,f,g,ti)) E (lKjr U EJ2 u (LKfr u E ) 2 we have [9] fr(a,b,c,d) EB fr(e,f,g,h) = fria + e,b + f,c + g,d + h). (7) A n y monotonic unary operator A —> E may be extended to TrOFN case in the following way. Using the Kosinski's approach, we define an extended unary operator G: KTr 3 HI —» IK as follows: KG(a),G(b),G(c),G(d),SL,EL) = G^r(a,b,c,d)), (8) where the starting function and the ending function are given by formulas V yE[G(a),G(b)[ SL(y) = ^ ^ , (9) V yE]G(c),G(d)] E L ( y ) = ^ ^ . (10) 3. Expected discount factor Let us assume that the time horizon t > 0 of an investment is fixed. Then, the asset considered here is determined by two values: • Anticipated future value (FV) Vt, • Assessed present value (PV) V0 . 300 The basic characteristic of benefits from owning this asset is a simple return rate rt given by the identity: In [11], it is justified that F V is a random variable Vt: H —> R+ . The set, H , is a set of elementary states, a>, of the financial market. In a classical approach to a return rate estimation, PVis identified with the observed quoted price P. Thus, the return rate is a random variable determined by identity: rtGu) . (12) Uncertainty risk is a result of a lack of knowledge about the future state of affairs. In practice of financial markets analysis, the uncertainty risk is usually described by the probability distribution of return rate (12) which may be given by a cumulative distribution function Fr (• If): R —» [0,l]. W e assume that the expected value, r, of this distribution exists. Then also the expected discount factor (EDF) v exists and is determined by the dependency v = ( l + f ) - 1 . (13) If we take together (11) and (12), then we obtain a following formula describing the return rate rt = rt (y0,ai) = , / - 1 . (14) It implies that the expected return rate may be expressed in a following way *fro> = - W Fr (yIf) = 1. (15) Thanks to that we detemiine the imprecise EDF V : R+ -> H&+ as a unary operator transforming P V as follows V(K0 ) = ( ^ ) - 1 = f . i { ; . (16) If P V is imprecisely estimated then it may be evaluated by TrOFN PV =fr(Vs,Vf,VL,Ve) (17) where the monotonic sequence (l^, V,, P, Vt, 14) is determined in following way • [V^., c ^ + is a n interval of all possible values of PV, , Ve\ is an interval of all prices which do not noticeably differ from a quoted price P. If we predict a rise in price then P V is described by a positively oriented TrOFN. If we predict a fall in price, then P V is described by a negatively oriented TrOFN. Then using the Kosinski's approach, we define the imprecise EDF by an extension V: KTr -» IK of a unary operator (68). The identities (8), (9), (10), and (17) imply that the imprecise EDF v(W) is given by TrOFN V(m = T r ^ f ^ , V f , ^ ) = V^r{vs,Vf,Vlx)\ (18) In [8] it shown that analogous extension "R ( P V ) is such O F N which is not TrOFN. In considered case it causes that for portfolio analysis, imprecise EDF is more useful than imprecise expected return rate. 4. Expected discount factors for portfolio By a financial portfolio we will understand an arbitrary, finite set of assets. A n y asset is determined as fixed security in long position. On the other hand, any portfolio also is a security. Let us consider the case of a multiasset portfolio n*, built of securities Yt. W e describe this portfolio as the set n* = [Yt: i = 1,2, ...,n}. A n y security Yt is characterized by its price PL e R+ , by its imprecise P V evaluated by TrOFN PVt = Triy^.V^ ,V^) ' (19) and by its EDF vt determined by (13). Taking into account all the above, we evaluate any security Yt by its imprecise EDF % =rr(D?,D?,D?,D?) =rr(v?.f,V?.f,V? -f,V? •»). ( 2 0 ) A portfolio P V is always equal to the sum of its components' PVs. In the case where the components' PVs are estimated by TrOFNs, addition is not associative. Then multiple addition depends on the order of the summands. This implies that a portfolio's PV, given as any sum of its components' PVs, is not explicitly determined. Therefore, in the considered case, any method of calculating the portfolio P V should be supplemented with a reasonable method for the ordering of the portfolio components. We will use a method of ordering the assets proposed and justified in [6]. A t the outset, we distinguish the portfolio of rising securities 7 r + = {YiEn*:PVi E K*r} and the portfolio of falling securities n~ = n*\n+. Then, using (7) we calculate the P V of portfolio n+ , denoted by the symbol P F+ ,and the P V of portfolio n~, denoted by the symbol PV + . W e have W* = rr(Vs M ,Vf M ,VL M ,Ve M ) = Tr(£Yi-n+ Vs (0 X Y m + i f V® . ^ V ? ) . (21) 301 PV- = Tr{Vs { ~\v^,v;-\ve^) = rr{Y.Yl,n- VS U) , 2 ^ - V® Xyl&r V® .Zy^-V®). (22) Finally, we calculate the P V of portfolio n*, denoted by the symbol PV* .We get PV* = PV+ mPV' =fr(Vs M ,Vf M ,Vl M ,Ve M ) mfr(Vs ( -~\vf~\vi ( ~) ,Ve ( ~) )= fr(Vs U) ,Vf U) ,Ve U) ). (23) Now we can start calculating EDFs of considered portfolios. The values of portfolios n+ ,n~, n* are respectively calculated in following way M + = I Y i e n + P i , M~ = IYien-Pt, M*=M+ + M~. (24) The share of the asset Yt e n+ in the portfolio n+ and the share p{~ of the asset Yt e n~ in the portfolio n~ are given by formulas The share q + of portfolio 7T+ in the portfolio n* and the share q~ of portfolio n~ in the portfolio n* are given by formulas + M+ M~ 1 = —. 1 =— • (26) The EDF v + of portfolio 7 r + , the EDF v~ of portfolio 7r~ and the EDF v* of portfolio n* are calculated as follows Due results obtained in [8,9,10] and (7) we have that: • the imprecise E D F V+ of portfolio n+ is given by the formula _ K Y n ( + ) n ( + ) n ( + ) n ( + ) ^ - .r ^ v * = • (ar i e j r + g£ • 1?)) = the imprecise EDF V ~ of portfolio 7T~ is given by the formula v- = R(DJ ( -) .D/-) >DI ( -) >D- ( -) ) = * - • ( 0 r ; e j r + ( | • 1?)) = = Tr\hY.€n-——-Ds ,LY.^-——-Df ,LY.^-——-Dl ,LY.^-——-De J, (29) the imprecise EDF V* of portfolio n* is given by the formula V* = Tr(D?,D^,D?,Di ;) )= • V + ) EB • V " ) . (30) 5. Case study Our case study builds on data already discussed in a different context [5]. W e observe the portfolio it* composed of company shares included in WIG20 quoted on the Warsaw Stock Exchange (WSE). The portfolio TT* contains the following securities: 1 share Y1 issued by the stock company A L R , 1 share Y2 issued by the stock company CCC, 1 share Y3 issued by stock company C D R 1 share Y4 issued by the stock company CPS, 1 share Y5 issued by stock company DNP, 1 share Y6 issued by the stock company LTS, and 1 share Y7 issued by stock company M B K . Based on a session closing on the W S E on January 28, 2020, for each observed share we assess its P V equal to TrOFN PVT describing its Japanese candle [8]. Shares' PVs, obtained in such a manner, are presented in Table 1. For each portfolio component Yt, we determine its quoted price Pt as an initial price on 29.01.2020. A l l considerations in the paper are run for the quarterly period of the investment time. Therefore, each security V^is tentatively characterized by its quarterly EDF vt of portfolio components. Using (30), for each security Yt we calculate its imprecise EDF V*. A l l these valuations are presented in Table 1. The shares Y2,Y3,Y4,Y5 belong to portfolio n+ of rising securities. The shares Y1,Y6,Y7 belong to portfolio n~ of falling securities. Respectively using (21), (22), and (23) we calculate PVs of portfolios n+ , n~, n* PV+ = fr(536.27,541.10, 546.82,541.20), PV~ = 7Y(478.30,476.70,467.96,464.10), PV* = 7V(1017,80,1017.80,1014.78,1005.30). In the next step, using (27) we calculate EDFs of portfolios n+ , n~, n* v + = 0.866306, v~= 0.93368, v* = 0.896089. Finally, respectively using (28), (29), and (30) we calculate imprecise EDFs of portfolios n+ ,n~, n* V+ = fr(0.848537, 0.856179,0.865230, 0.870578), 302 y- = 7Y(0.955379,0.952183,0.934726, 0.927016), V* = 7V(0.898614, 0.898614,0.895948,0.895524). Share P V Price 11)1 Imprecise E D F Yi 7V(27.42; 27.30; 27.00; 26.84) 27.00 0.9744 7V(0.9896 0.9852; 0.9744; 0.9686) Y2 7V(83.35; 88.00; 88.00; 89.65) 88.00 0.9646 7V(0.9136 0.9646; 0.9646; 0.9827) Y3 7V(271.50; 271.50; 276.30; 276.30) 277.00 0.8006 7V(0.7847 0.7847; 0.7986; 0.7986) Y4 7V(26.42; 26.60; 27.04; 27.34) 27.20 0.9439 7V(0.9168 0.9231; 0.9384; 0.9488) Ys 7V(155.00; 155.00; 155.10; 157.30) 155.30 0.9370 7V(0.9352 0.9352; 0.9358; 0.9491) Y6 7V(83.88; 83.40; 81.16; 80.26) 81.44 0.9047 7V(0.9318 0.9265; 0.9016; 0.8916) Y7 7V(367.00; 366.00; 359.80; 357.00) 359.00 0.9369 7/r(0.9578; 0.9552; 0.9390; 0.9317) Table 1. Evaluations of portfolio components Comparing dependences (28) and (29) with dependency (30) raises the following question: whether there are constants a and B satisfying the conditions a • 0.848537 + B • 0.955379 = a • Ds + + B • D~ = D* = 0.898614, (31) a • 0.856179 + B • 0.952183 = a • Df + + B • Df = Df = 0.898614, (32) a • 0.865230 + B • 0.934726 = a • + B • D{~ = D{ = 0.895948, (33) a • 0.870578 + B • 0.927016 = a • D+ + B • D~ = D*e = 0.895524. (34) Using (7) and (30), we get the unique solution a = 0.557987 and B = 0.442013 of the system of equations (32) and (33). Then, by checking equation (31), we get a- Ds + +B - A T = 0.557987 • 0.848537 + 0.442013 • 0.955379 = 0.895763 * 0.898614 = Ds*. This shows that linear portfolio analysis is not possible for considered here portfolio n*. It is obvious, that this conclusion can be generalized for any portfolio n*. 6. Final remarks Obtained results may provide theoretical foundations for portfolio analysis of securities described with use TrOFNs. Then the criterion of maximization expected return rate is replaced by criterion on muiimalization expected discount factor. The proposed portfolio analysis can be fully used for portfolios n+ of rising securities and n~ of falling securities. This is sufficient to manage portfolio risk, because only rising securities can get B U Y or A C C U M U L A T E recommendations and only falling securities can get SELL or REDUCE recommendations. In considered case study, we show that the dependence (30) is not linear. Such form of the relationship (30) allows us to use them only for evaluating an already constructed portfolio n*. Such evaluation may be carried out using the analytical tools described in [9]. A l l results obtained with use imprecise expected discount factor may be applied as input data for robo-advice systems described in [5]. The obtained results may as well be useful for a future research on the impact of the P V imprecision and orientation on portfolio analysis. References [1] Dubois, D.; Prade, H . (1978) Operations on fuzzy numbers. International Journal of System Science 9, 613- 629. https://doi.org/10.1080/00207727808941724 [2] Delgado, M . ; Vila, M . A . ; Voxman, W. (1998) On a canonical representation of fuzzy numbers. Fuzzy Sets and Systems 93(1), 125-135. https ://doi.org/10.1016/S0165-0114(96100144-3 [3] Kosinski, W.; Prokopowicz, P.; Slezak, D. (2002) Fuzzy numbers with algebraic operations: algorithmic approach. In Proc. IIS '2002; Klopotek, M . , Wierzchoh, S.T., Michalewicz, M . , (Eds), Sopot, Poland, Physica Verlag, Heidelberg, 2002, pp. 311-320 [4] Kosinski, W. (2006) On fuzzy number calculus. Int. J. Appl. Math. Comput. Sei., 16(1), 51-57 [5] Lyczkowska-Hanckowiak A . (2020) On Application Oriented Fuzzy Numbers for Imprecise Investirent 303 Recommendations, Symmetry 12(10), https://doi.org/10.3390/svml2101672 [6] Lyczkowska-Hanckowiak, A . Piasecki, K . (2018) The Present Value of a Portfolio Of Assets With Present Values Determined by Trapezoidal Ordered Fuzzy Number, Operations Research and Decisions 28(2), 41- 56, https ://doi.org/ 10.5277/ord 180203 [7] Piasecki, K . (2018) Revision of the Kosiriski's Theory of Ordered Fuzzy Numbers. Axioms 7(1), https://doi.org/10.3390/axioms7010016 [8] Piasecki, K . ; Lyczkowska-Hanckowiak, A . (2019) Representation of Japanese Candlesticks by Oriented Fuzzy Numbers. Econometrics8(1), 523. https://doi.org/10.3390/econometrics8010001 [9] Piasecki, K . ; Lyczkowska-Hanckowiak, A . (2021) Oriented Fuzzy Numbers vs. Fuzzy Numbers. Mathematics 9(3), 523. https://doi.org/10.3390/math9050523 [10] Piasecki K . , Siwek J., (2018a), Two-Asset Portfolio with Triangular Fuzzy Present Values - A n Alternative Approach, In T. Choudhry, J. Mizerka (Eds.) Contemporary Trends in Accounting, Finance and Financial Institutions, Springer Proceedings in Business and Economics. Springer, Cham, 11-26, https://doi.org/10.1007/978-3-319-72862-9 2 [11] Piasecki K., Siwek J., (2018b), Multi-asset portfolio with trapezoidal fuzzy present values, Przeglqd Statystyczny/ Statistical Review LXV(2),183-199. 304 Comparing TV advertisement in the year 2019 using DEA models 1 2 Jan Malý , Petra Zýková Abstract This paper focuses on the T V advertisement sector in the Czech Republic in the year 2019. The article aims to see how efficiency changes during the year and whether or not it is more suitable to choose less or more T R P for the campaign. This paper analyses the T V advertisement by the Data Envelopment Analysis (DEA). D E A models compute the relative efficiency of decision-making units, which transform multiple inputs into multiple outputs. The B C C input-oriented model (variable return to scale) is used for the analysis. This paper aims to identify optimal level of T R P for various target audiences across five channel mixes and compare some discovered trends with traditional views and metrics from advertisement. Keywords: T V advertisement, D E A models, efficiency analysis J E L Classification: C44 A M S Classification: 90C15 1 Introduction This paper compares commonly used parameters using D E A models during the year 2019 to show how viewership, parameters derived from it and its efficiency change throughout the year. In the Czech T V , advertising environment consists of two main groups, Media Club and Nova Group, who mediate advertising time. Nova Group [7] is responsible for managing sales of Nova stations. Media Club [6] coveres stations of Prima, Barrandov, Očko and ťhematical stations under Atmedia. Stations under Česká Televize are not mentioned here because these are limited by [8], so their impact on the advertising environment is low. However, there is one big difference between Nova Group and Media Club. Nova group targets an audience A15 - A54 (both sexes), and Media Club targets an audience A15 - A69 (both sexes). In T V advertising four main parameters are observed according to [1]: G R P (Gross Rating Point), T R P (Target Rating Point), Affinity, and Reach. G R P is the cumulative percentage of viewership in buying audience. T R P measures cumulative viewership percentage in the target audience. This paper used target audiences for men, women, and all in three age intervals of 18 - 35, 20 - 50 and 40 - 60. Affinity is the ratio of T R P and G R P . Affinity describes how the target audience (TA) watches T V compared to the population represented by G R P . Reach represents the percentage of T A who saw corporate advertising campaign (spot). There was used one input: G R P (Gross Rating Point), and two outputs: Affinity and Reach. The paper is organised as follows. The following section presents the definition of D E A models generally. Section 3 contains the analysis of T V advertisement in the Czech Republic in 2019. The last section of the article concludes the results and discusses future research. 2 DEA models D E A models have been first developed by Charnes, Cooper and Rhodes [1] based on the concept introduced by Farrell [4]. D E A models are a general tool for efficiency and performance evaluation of the set of homogenous D M U s that spend multiple (w) inputs and transform them into multiple (f) outputs. The measure of efficiency (efficiency score) of this transformation is one of the main results of applying D E A models. Let us denote Y = (y,y , r = l,...,t, j = l , . . n ) a non-negative matrix of outputs and X = (xkj, k = l,...,w,j = l,...n) a nonnegative matrix of inputs. The efficiency score of the unit under evaluation D M U y- is derived as follows: 1 Prague University of Economics and Business, Department of Econometrics, W. Churchill Sq. 4, 13067 Prague 3, Czech Republic, mali09@vse.cz 2 Prague University of Economics and Business, Department of Econometrics, W. Churchill Sq. 4, 13067 Prague 3, Czech Republic, pet- ra.zykova@vse.cz 305 Maximise Jo x,,k *70 4=1 subject to (1) <1, j=l,...,n, ur >s, r = l,...,t, vk>£, k = l,...,w, where M r is a positive weight of the r-th output, v t is a positive weight of the fe-th input, and s is an infinitesimal constant. Model (1) is not linear in its objective function but may easily be transformed into a linear program. The linearised version of the input-oriented model (often called the C C R model) is as follows: Maximise Jo Z-l rJrJa r = l IV subject to , w (2) v kx kJ < o , J = 1,.. . ,n, ur r = 1,.. .,t, k = 1,.. •, w. The model (2) assumes a constant return to scale (CRS). There are other types of return to scale: variable return to scale (VRS), non-increasing return to scale (NIRS) and non-decreasing return to scale (NDRS). For this article is the proper variable return to scale. Thus free variable fi is added to the model (2). The input-oriented model with a variable return to scale (often called the B C C model) is as follows: Maximise U. = V u y . + u Jo r l r l a ' w Zv *^0 = 1 subject to , w (3) r = l i = l ur >s, r = 1, vk > e, k =l,...,w. The model (3) is a multiplicative B C C model. From the model (3) is derived the dual version of the model (3) called envelopment model, which is as follows: 306 Minimise C U. =8. -s n V v l - s ^ y . , r = l,...,f, subject to ^ 7 ' ' ° (4) 2>,=i 7=1 A. >0, j = l,...,n, s~k>0, k = l,...,w, s*>0, r = l,...,t, where k = (/i1 ,---,2n ), I > 0 is a vector of weights assigned to particular D M U s , s" =(*,",...,s~J and s+ = ( j * , . . . , ^ ) are vectors of slack/surplus variables. Efficient units identified by this model have efficiency scores equal to one, and all slack/surplus variables are equal to 0. Inefficient units have efficiency scores lower than 1. The model is not able to rank efficient units because of their identical efficiency scores. The model (4) is used in the calculation further in this paper. A l l mentioned models are described in [3]. 3 TV advertisements The dataset analysed in this paper was obtained by [5]. Data were collected for each month in the year 2019, and then they are divided into channel mixes based on their share of T R P between Media Club and Nova Group, as shown in Table 1. Name of channel mix Share of Nova Group (%) Share of Media Club (%) M C (Media Club) (0,3) (100, 97) N G L O W (Nova Group low) (3, 35) (97, 65) B O T H (Nova Group & Media Club) (35, 65) (65, 35) M C L O W (Media Club low) (65, 97) (35, 3) N G (Nova Group) (97, 100) (3,0) Table 1 Definition of channel mixes Dataset was further specified. There are several types of viewership: live viewership, viewership with deferred viewership up to 7-days, and also, viewership with or without guests. In this paper, the data is specified as live viewership + 3-days deferred viewership with guests. A s G R P definition the market definition for both T V groups is used, like it is mentioned in chapter 1. T R P is used at T A for men, women, and all in three age intervals o f 18-35, 20-50 and 40-60 with Reach 2+. T V campaigns are planned at all parameters mentioned before (TRP, G R P , Affinity, and Reach). The efficiency obtained from D E A models can be used to select the right combination of these parameters i f the campaign is as successful as possible. The course of the dataset used for the analysis is shown in Figure 1. G R P grows linearly with T R P levels. However, Affinity is almost independent on T R P levels. Instead, Reach is dependent on T R P levels logarithmic. Therefore, the data is analysed at a given level of T R P . The data is divided into nine groups based on T R P levels from 200 T R P to 600 T R P with 50 T R P steps. 307 120 100 80 60 40 20 o si u 0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 T R P (%) GRP A F F Reach Figure 1 Course of G R P , Affinity and Reach depending on T R P levels Firstly, the model (4) was computed for the T R P equal to 400 for all channel mixes. The average efficiency scores are shown for every month of the year 2019 according to gender in Figure 2. Figure 2 also shows that the efficiency is higher for men than for women during 2019 at T R P 400. The audience watches T V more (more friequently, for longer period of time, or both) during winter than during summer. 1 2 3 4 5 6 7 8 9 101112 A18-35 A20-50 A40-60 1 2 3 4 5 6 7 8 9 1011 12 ^ — M l 8-35 M20-50 M40-60 1 2 3 4 5 6 7 8 9 1011 12 ^—W18-35 W20-50 —W40-60 Figure 2 Monthly average efficiency computed by the model (4) for T R P equal to 400 for all channel mixes Longer days decrease the viewership and for the younger audience even more, as shown in Figure 3, for A l l target audiences, for male and women audiences the curves look almost the same. The fact that avegare B C C efficiency is not the lowest in all and male target audieces in summer months do not correspond with Figure 3 which shows that average time spent (ATS) in front of T V is lower in summer months (June, July, August) so lower B C C efficiency would be expected, but this is in line only in women target audieces. L o w B C C efficiency in November and December despite of the highest A T S in those months could be caused by high interest of companies in placing their adverts in those times because of Christmas. Relatively high B C C efficiency in winter months is in accordance with Figure 3, where A T S is high and also in those months there is more free space for adverts therefore, companies and advertising agencies can plan more effectively. A T S also shows that despite traditional view that men are watching T V more then women, in reality the situation is reversed. For target audience M20-50 the A T S for year 2019 is 2 h l 8 m while for W20-50 it is 2h52m. 308 0:57 0:28 0:00 Jan Feb Mar Apr May Jun Jul Sug Sep Oct Nov Dec A18-35 A20-50 A40-60 Figure 3 Montly average time spent in front of T V during year 2019 for A l l target audieces Secondly, the model (4) was computed for different T R P levels and a particular target audience (all, men, women). There are relatively significant differences between minimal and maximal efficiency scores across the level of TRP, which indicates that for the specific target audience in a particular channel mix, different amount of T R P for the campaign should be chosen to increase efficiency. There are results for the target audience A18 - 35 in Table 2. A n A18 - 35 audience has shown interesting results in efficiency with set levels of TRP, especially the alternation of maximal and minimal efficiency score between the channel mixes. When starting a campaign in channel mixes M C , N G L O W or N G , the target level of T R P is 500 - 600 as it leads to increased or even maximal efficiency. On the other hand, lower levels of T R P significantly decrease efficiency. On the contrary, in channel mixes B O T H and M C L O W , the most suitable T R P levels are 300 - 350 since higher levels of T R P cause efficiency score to drop. Level of T R P M C N G L O W B O T H M C L O W N G 200 0,691 0,732 0,844 0,883 0,926 250 0,686 0,744 0,847 0,877 0,925 300 0,693 0,772 0,873 0,893 0,930 350 0,702 0,789 0,874 0,885 0,934 400 0,702 0,778 0,857 0,876 0,933 450 0,701 0,783 0,851 0,869 0,931 500 0,703 0,780 0,844 0,881 0,941 550 0,699 0,791 0,864 0,890 0,935 600 0,700 0,776 0,821 0,876 0,943 Table 2 Yearly average efficiency scores computed by model (4) for target audience A 1 8 - 35 Conclusions of target audience A 2 0 - 50, in Table 3, are analogous with A18 - 35. Conversely, A 4 0 - 60 shows specific changes, see in Table 3. For instance, in all channel mixes except M C L O W , a bigger campaign from 450 to 600 T R P has a higher efficiency score. Only for M C L O W , it is more advantageous to choose a campaign between 250 - 350 TRP. A20-50 A40-60 Level of T R P M C N G L O W B O T H M C L O W N G M C N G L O W B O T H M C L O W N G 200 0,753 0,785 0,885 0,923 0,984 0,800 0,863 0,916 0,955 0,962 250 0,755 0,816 0,912 0,942 0,985 0,813 0,882 0,938 0,969 0,969 300 0,749 0,809 0,910 0,935 0,983 0,823 0,886 0,943 0,970 0,972 350 0,753 0,814 0,912 0,937 0,987 0,824 0,887 0,936 0,965 0,967 400 0,755 0,818 0,908 0,932 0,985 0,829 0,888 0,948 0,962 0,971 450 0,758 0,822 0,908 0,937 0,988 0,831 0,894 0,949 0,963 0,971 500 0,756 0,816 0,908 0,924 0,984 0,832 0,888 0,939 0,959 0,970 550 0,758 0,809 0,907 0,920 0,985 0,832 0,889 0,937 0,956 0,970 600 0,758 0,823 0,908 0,923 0,986 0,836 0,891 0,938 0,958 0,970 Table 3 Yearly average efficiency scores computed by model (4) for target audiences A20-50, A40-60 309 There are results for the target audience M l 8 - 35 in Table 4. Target audience M l 8 - 35 shows a completely different situation as any campaign with more than 500 T R P is insufficient. For channel mixes B O T H , M C L O W and N G , the most convenient range of T R P is from 250 to 400, while for M C and N G L O W 400 - 500 T R P is more efficient. Level of T R P M C N G L O W B O T H M C L O W N G 200 0,867 0,856 0,861 0,885 0,905 250 0,883 0,866 0,883 0,902 0,904 300 0,887 0,860 0,874 0,889 0,905 350 0,896 0,858 0,879 0,895 0,905 400 0,903 0,864 0,874 0,881 0,909 450 0,917 0,884 0,868 0,894 0,854 500 0,882 0,898 0,870 0,770 0,843 550 0,867 0,822 0,870 0,713 0,841 600 0,836 0,819 0,841 0,694 0,826 Table 4 Yearly average efficiency scores computed by model (4) for target audience M l 8 - 3 5 In men, audiences M 2 0 - 50, a 450 - 600 T R P campaign in channel mixes M C , B O T H , and N G is more suitable since efficiency is at a peak as is shown in Table 5. Whereas for M C L O W and N G L O W a campaign between 300 and 400 T R P provides higher efficiency. For target audience M 4 0 - 60, in Table 5, the efficiency score decreases together with lower T R P levels and lower share of Media Club. Hence, in the M C channel mix, a 500 - 600 T R P campaign is the most suitable of the other channel mixes, where the most effective campaign's T R P drops by 100 - 150 from the previous channel mix. This trend terminates with channel mix N G in which a 250 - 400 T R P is more efficient. M20-50 M40-60 Level of T R P M C N G L O W B O T H M C L O W N G M C N G L O W B O T H M C L O W N G 200 0,806 0,834 0,879 0,910 0,939 0,823 0,889 0,930 0,964 0,964 250 0,811 0,839 0,885 0,923 0,944 0,832 0,882 0,936 0,955 0,971 300 0,818 0,849 0,894 0,924 0,951 0,825 0,880 0,922 0,950 0,963 350 0,828 0,853 0,895 0,926 0,956 0,832 0,893 0,935 0,952 0,970 400 0,827 0,854 0,886 0,927 0,946 0,833 0,886 0,937 0,946 0,967 450 0,838 0,848 0,889 0,916 0,954 0,830 0,887 0,932 0,953 0,957 500 0,834 0,849 0,893 0,918 0,956 0,834 0,895 0,933 0,949 0,961 550 0,836 0,847 0,903 0,919 0,965 0,838 0,888 0,928 0,955 0,958 600 0,840 0,846 0,881 0,920 0,957 0,836 0,892 0,933 0,944 0,962 Table 5 Yearly average efficiency scores computed by model (4) for target audiences M20-50, M40-60 W 20 - 50 target audience, in Table 6, follows the same pattern as the M 40-60 audience where the efficiency score is dropping with lower levels of T R P and decreasing M C share, but by lower levels of TRP. Moreover, this trend does not affect channel mix N G as the most suitable T R P levels are between 500 and 600. Generally, T R P of 450 - 600 in W 4 0 - 60 audience shows the highest performance success except for M C L O W channel mix, which has the highest efficiency score in the range of 300 - 350 TRP, see in Table 6. W20-50 W40-60 Level of T R P M C N G L O W B O T H M C L O W N G M C N G L O W B O T H M C L O W N G 200 0,668 0,735 0,859 0,906 0,970 0,750 0,823 0,880 0,931 0,911 250 0,673 0,745 0,876 0,918 0,970 0,767 0,840 0,909 0,935 0,928 300 0,678 0,779 0,892 0,931 0,976 0,788 0,864 0,932 0,954 0,940 350 0,681 0,770 0,888 0,920 0,975 0,794 0,868 0,936 0,957 0,939 400 0,685 0,779 0,904 0,929 0,976 0,809 0,870 0,933 0,948 0,949 450 0,683 0,777 0,891 0,919 0,975 0,812 0,863 0,928 0,947 0,952 500 0,684 0,788 0,907 0,923 0,981 0,811 0,877 0,937 0,952 0,949 550 0,683 0,782 0,889 0,915 0,981 0,812 0,876 0,933 0,946 0,952 600 0,686 0,782 0,887 0,905 0,980 0,813 0,875 0,940 0,951 0,955 Table 6 Yearly average efficiency scores computed by model (4) for target audiences W20-50, W40-60 There are results for the target audience W18 - 35 in Table 7 where on the other hand, in W18 - 35 audience, T R P of 300 - 350 shows high efficiency through all channel mixes, but the efficiency drops with higher T R P levels, while the decrease stops at 500 - 550 T R P where the efficiency increases again. 310 Level of T R P M C N G L O W B O T H M C L O W N G 200 250 300 350 400 450 500 550 600 0,581 0,586 0,591 0,589 0,584 0,576 0,587 0,592 0,578 0,664 0,798 0,861 0,717 0,819 0,867 0,740 0,834 0,888 0,726 0,837 0,891 0,747 0,822 0,872 0,678 0,818 0,859 0,726 0,801 0,874 0,731 0,833 0,863 0,711 0,822 0,848 0,943 0,933 0,937 0,944 0,927 0,937 0,945 0,953 0,936 Table 7 Yearly average efficiency scores computed by model (4) for target audience W18-35 4 Conclusion The paper dealt with efficiency analysis based on Data Envelopment Analysis. D E A models can be used in the T V advertisement field only as one of several indicators of campaign evaluation and planning since the evaluation serves only with final data of the specific campaign. That is due to a possibility of misinterpretation caused by evaluating campaign progress with all T R P levels included together in a model. Possibly, a more sophisticated model could operate with data as complex as described. This paper has provided an insight into the tendencies of men and women of watching T V throughout the year, which could be beneficial for companies' ability to choose the proper periods for their campaigns. O f course, having a campaign as efficient as possible depends on the right combination of both the chosen T R P and channel mix levels for the specific target audience. Nonetheless, there is no general rule that would provide companies with complete certainty that their campaign is going to be successful and efficient. The paper has shown that B C C efficinency sometimes goes against traditional view, especially in summer months where despite low A T S , relatively great values of efficiency were obtained for all and male audiences, particularly for male audiences was B C C efficiency one of the greatest throughout the year. On other hand, the use of lower levels of A T P has been acknowledged as suitable for younger audiences (18-35) since younger people tend not to watch the T V as frequently, thus aiming for higher T R P is wasteful. In the majority of cases in older target audiences (40-60), the rule "the more the better" has been found suitable. For women target audiences a higher share of Nova Group has been found more efficient in spite of Media Club having more channels that broadcast women-oriented films and shows. A l l computations were done in the L I N G O modelling system. Used data was from [5] a they were collected by Adwind Kite software with the consent of A T O . Acknowledgements This work was supported by the Internal Grant Agency of the Faculty of Informatics and Statistics, Prague University of Economics and Business, project F4/29/2020 (Dynamic data envelopment analysis models in economic decision making). References [1] Asociace televizních organizací. (2021). [Online]. Available: https://www.ato.cz/tv-vyzkum/terminologie/ [2] Charnes, A . , Cooper, W . and Rhodes, E . (1978). Measuring the efficiency of decision-making units. European Journal of Operational Research, 2(6), p. 429-444. [3] Dlouhý, M . , Jablonský, J. and Zýková P. (2018). Analýza obalu dat. Praha: Professional Publishing. [4] Farrell, M . (1957). The measurement of productive efficiency. Journal of the Royal Statistical Society. Series A (General), 120(3): p. 253-290. [5] Malý, J. (2020). Využití DEA modelů v oblasti TV reklamy, Prague University of Economics and Business. [6] Media Club. (2021). [Online]. Available: https://media-club.tv/zastupovana-media-2/. [7] Nova Group. (2021). [Online]. Available: https://www.novagroup.cz/nase-znacky/televize. [8] Zákon c. 231/2001 Sb. (2017). [Online]. Available: https://www.zakonyprolidi.cz/cs/2001 -231. 311 Efficiency of tertiary education in EU countries 1 2 Klára M a š k o v á , Veronika B l a š k o v á Abstract. Modern societies place ever-increasing demands on the level of education of the population. This article focuses on the issue of evaluating the efficiency of tertiary education in E U countries. The evaluation was performed using the data envelopment analysis method. The model was built based on two input and two output variables. Public expenditure on tertiary education and the number of teachers in tertiary education represent the input variables. The employment rate of graduates of tertiary education and the number of graduates in tertiary education represent the outputs. The radial input-oriented model with constant returns to scale was selected to calculate the efficiency in 2016, 2017 and 2018. The results show that countries such as Austria and Germany are in the lowest positions throughout the period under review and lag far behind other countries. In contrast, some economically weaker countries, such as the Czech Republic, have been identified as fully efficient. Keywords: data envelopment analysis, efficiency, education, competitiveness, linear programming J E L Classification: C44, H52,121 A M S Classification: 90B50, 90C08 1 Introduction Currently, the situation on the tertiary education market is changing dramatically. This process of change began almost ten years ago, when there was a large increase in the number of educational institutions and thus a boom in private education, but on the other hand, demographic changes (declines) began to show in this market ten years ago. Due to the fight for students, tertiary education is becoming much more accessible than in the past [20]. It is also necessary to highlight the significant decline in the number of traditional and highly motivated students and conversely the influx of various students who have neither the prerequisites nor the motivation to study at university [17]. Another problem which higher education is currently facing is the fact that there is a shortage of students in some sectors and a surplus in others [9]. For example, there is a great demand for graduates in science and technology and in foreign languages [ 1 ]. Higher education is expanding a lot today, as mentioned above, and it is therefore necessary to maintain quality. For this reason, it is important to increase spending on education both on the part of the state and on the part of entrepreneurs or the customers for educational services [4]. According to [1] increasing investment in tertiary education can improve the quality of education and the same time increase the number of graduates. For the tertiary level of education the size of the investment is as important as the structure and efficiency of its allocation. It should also be noted that investment in education is influenced by several factors. For example, it could be the size of tuition fees, the demographic structure of the population or the level of the economy [9]. Universities are under great pressure for the above reasons because of a steady decline in public support. Because of this, universities must look for external funding, and provide the best and most interesting teaching and research. Otherwise, they will not survive current global competition [22]. The funding of higher education is itself a much-discussed topic, see [24]. Today, tertiary education gives a great advantage when entering the labour market. Unfortunately, there is the problem that some graduates will not succeed in the labour market, even though the demand for universityeducated employees is still growing [18]. According to [21] this failure is mainly associated with the absence of work experience and work habits. Another problem is poor awareness of the value of salaries. Conversely, employers appreciate graduates with a knowledge of foreign languages, communication skills and excellent computer skills. They also appreciate the potential of graduates for further training and self-development. In many 1 Mendel University in Brno, Department of Statistics and Operation Analysis, Zemědělská 1, 613 00 Brno, Czech Republic, xmaskovl @node.mendelu.cz. 2 Mendel University in Brno, Department of Statistics and Operation Analysis, Zemědělská 1, 613 00 Brno, Czech Republic, veroni- ka.blaskova@mendelu.cz. 312 studies, for example [20], [21], the authors deal with the fact that computer skills are a great advantage for graduates who are just entering the labour market. A t the time of the Industry 4.0 technical revolution, demand for good computer skills is becoming more common. However, the shortcomings are complemented by low work responsibility and insufficient active knowledge of a foreign language. The employability of graduates is also currently a much-discussed topic, on which [16] also worked. Graduates of tertiary education in 2016-2018 are included in generation Y . In [15], among other things, it was found that most respondents from generation Y think that they will succeed in the labour market i f they focus on the natural or technical sciences, especially the currently evolving field - IT. Even this finding is consistent with the current industrial revolution. Given the limited possibilities for countries regarding the financing of education and the growing fight for students, the need to evaluate the efficiency of entities in this sector comes to the fore. Within the methods used to evaluate efficiency, the non-parametric method of data envelopment analysis ( D E A ) appears most often. This method is popular in all areas. It is possible to find applications in healthcare [3], construction [11] or in the evaluation of manufacturing companies [12]. The D E A method is also widely used in the field of education, see [6], [7] and [8]. Thanks to the high homogeneity of the subjects analyzed, it is possible to perform the analysis at the level of whole countries. The main aim of this article is to evaluate the efficiency of individual E U countries in the field of tertiary education from 2016 to 2018. The efficiency of education can be derived from different perspectives. In this article we solve the efficiency of tertiary education in terms of graduate employability in conjunction with the financing of tertiary education. Countries should strive to maximize the employment of graduates. 2 Material and Methods Regarding the availability of data and the number of observations, a total of four variables were selected for the efficiency analysis. Public expenditure on tertiary education in millions of euros and the number of teachers in tertiary education represent the input variables. The employment rate (%) of graduates of tertiary education and the number of graduates in tertiary education represent the outputs. The basic characteristics of the data set are in Table 1. Due to the unavailability of certain data, 6 countries had to be excluded from the analysis, these are Bulgaria, Cyprus, Estonia, Greece, Malta and Latvia. When we look at the values shown in the Table 1, Luxembourg had the lowest public expenditure, number of teachers and the number of absolvents in all years. On the contrary, Germany had the highest public expenditure, the most teachers and graduates. In 2016, the employment rate was the lowest in Slovakia and the highest in Lithuania. In 2017 and 2018 the employment rate was lowest in Italy and still the highest in Lithuania. Lower Upper Year Variable Min Max Mean quartile Median quartile Public expenditure 193.5 29800.0 4561.9 547.9 2280.5 3828.0 2016 Employment rate 77.3 90.4 83.5 82.2 83.7 85.8 2016 Number of teachers 0.8 402.4 60.4 14.9 30.5 65.0 Number of absolvents 6.6 2729.6 682.3 205.3 347.4 770.0 Public expenditure 206.1 30500.0 4671.5 710.0 2253.0 3937.0 2017 Employment rate 78.2 90.0 84.4 82.9 84.3 86.8 2017 Number of teachers 1.0 407.1 60.3 14.9 26.6 68.7 Number of absolvents 6.5 2779.4 688.7 213.8 329.2 813.3 Public expenditure 232.8 31100.0 4818.9 759.2 2344.0 4141.0 2018 Employment rate 78.7 90.5 85.1 83.4 85.5 88.0 2018 Number of teachers 1.3 416.2 61.7 15.1 26.7 69.8 Number of absolvents 6.6 2805.8 694.8 219.6 314.3 824.6 Table 1 Basic descriptive characteristics of the variables used in individual years. Public expenditure on tertiary education in millions of euros, employment rate of graduates of tertiary education in % , number of teachers in tertiary education in thousands, number of graduates in tertiary education in thousands. 313 The D E A method was chosen to calculate the efficiency. Given the selected variables, the input orientation of the model was selected, and the model was estimated assuming constant returns to scale (i.e. the C C R model) as in [11] and [13]. Based on input matrix (X) and output matrix (F) in can be calculated efficiency of each county c by solving n times model: minö e,A Bxc -XX>0 YX>yc X > 0. (1) Similar to [10] and [13], we used the Malmquist index to calculate the change in efficiency, more precisely its partial component, the so-called catch-up effect. Malmquist index can be defined as the geometric mean of two efficiency ratios, where one is the efficiency change measures by the period 1 technology and the other is the efficiency change measured by the period 2 technology: Ml = SK{xc,ycY) 5 2 ( ( x c , y c ) 2 ) -11/2 SKipCcYc)1 ) 8*{{xc,ycy\ (2) According to Formula 2, it can be stated that the Malmquist index consists four terms: 5 1 ( ( x c , y c ) 2 ) , S2 ((xc,yc)2 , 81 ((xc,yc)1 ) and 82 ({xc,yc)1 ). For each of these terms it is necessary to solve linear programs, and in this paper this calculation is based on the C C R model i n Formula 1. Technical details about the D E A method and the Malmquist index can be found in [2]. The calculations are performed in the M A T L A B R2021a computational system and in D E A SolverPro, Version 15. 3 Results and Discussion The following Figure 1 shows the efficiency score in % for each country in individual years (2016 is marked in blue, 2017 in red and 2018 in orange). The average efficiency of tertiary education in 2016 was approximately 74%. The median value was slightly higher, at around 77%, this value meant that half of the countries had their efficiency score above 77%. In the countries of the euro area, the average efficiency was also 74%, in the V 4 countries the efficiency was higher, at 86%. The full efficiency score in this year was achieved by five countries, namely Czechia, Ireland, Lithuania, Luxembourg and Romania. On the contrary, the lowest efficiency score can be seen in Germany, Austria and Denmark. Only these three countries had an efficiency score under 50%. On the 50% border we can find Sweden and Spain. In 2017, five countries again achieved full efficiency, the same states as in 2016. We also calculated the average and median value for this year. The average efficiency of European countries was almost 77%. The median value was almost the same as in 2016, also 77%. In 2017, we could observe that the average and median value were almost the same. In the euro area countries, the average efficiency score was lower, 75%; in the V 4 countries the efficiency score was higher again, almost 84%. Only two countries were below 50% efficiency. These were Germany with a score of 34%, and Austria with a score of 38%. The value of 50% was also approached by Sweden with a score of 51% and Spain with a score of 56% In 2018, only four countries had the full efficiency score, namely Ireland, Lithuania, Luxembourg and Romania. In this year, we observed that the average European efficiency was almost the same as in 2017, approximately 77% However, the median value of this year was higher than the average. Half of the countries were above the 83% efficiency score. In the euro area countries, the average efficiency was approximately at the same level, in the V 4 countries the efficiency was higher again, 82%. Among the countries where we observed the lowest efficiency of tertiary education was again Germany with a score of 34%, Austria with a score of 36% and Denmark with a score of 49.5%. Sweden (51%) and Spain (55%) were again at the 50% mark. We observed that in the V 4 countries, the average efficiency was higher than the average for all countries each year. A common feature was also that economically developed European countries such as Germany and Austria failed in the evaluation of the efficiency of tertiary education and were in the last two places in our ranking. O n the contrary, some less economically viable countries, such as Romania, finished as a country with full efficiency. The "inefficiency" of Germany is mainly caused by the high number of teachers in contrast to other countries. A n d this fact in also reflected in the value of public expenditure. 314 For example [14], their work dealt with the efficiency of tertiary education expenditure in the European Union. Their three efficiency models confirm that we can rank both Romania and Czechia among the most efficient countries. In their work the worst was Estonia, which we had to exclude due to the unavailability of certain data. A similarity with our results can be also found in [23], where Latvia, Lithuania, Romania and Czechia were best placed. Similar results are also confirmed by [19]. In [5] it was also found that more economically developed countries such as Austria are much less efficient than, for example, Hungary. Figure 1 The resulting efficiency score in % for each country in individual years. The following Figure 2 shows efficiency changes for each country in individual years (change for the period 2016/2017 is marked in blue, change for the period 2017/2018 is marked in red). Figure 2 was compiled in the form of a growth coefficient - values greater than zero mean an increase in technical efficiency and, conversely, values less than zero mean a decrease. If there is no change in efficiency over time, then the value of the coefficient for the given country is equal to zero. In the first monitored period, we could see the biggest change in Denmark, where the efficiency score increased by almost 90%. This year-on-year change was mainly due to a change on the input side - in this period the number of teachers in Denmark decreased by approximately 46%. The other variables of the model remained almost at the same level (a change of up to 10%). We could also observe an increase in another eleven countries, but this increase was not as dramatic (only between 1 and 15%). Some countries also showed a decrease in efficiency scores, namely Hungary, Slovenia, Slovakia, Finland and Sweden. It was only a slight decrease of about 1%. But only Hungary has declined by 13%. Five countries did not change their efficiency scores; these are the countries with full efficiency in Figure 1. In the second monitoring period, changes up to 35% were observed. The highest growths were in the case of Portugal and Croatia - almost 23% and 19%. We observed less significant growth, of up to about 7%, in 5 countries. In the case of 4 countries, the efficiency score did not change, these are Ireland, Lithuania, Luxembourg and Romania - again countries that were fully efficient in Figure 1. We identified falls in the efficiency score in 10 countries. The biggest fall was recorded in Denmark, a drop of almost 35%. In Denmark in 2017, there was a reform of tertiary education. This year-on-year decrease was again caused by a significant change on the input side - in this period, on the contrary, the number of teachers increased by almost 55%. The other variables of the model remained almost the same, these were only minimal changes. The second largest decline was observed in Czechia at around 9%. Other declines were up to about 5%. 315 i 12016/2017 12017/2018 • • 1 I •íl. n n .1 • 1 | • 1 1 • • Figure 2 Efficiency increment of individual countries in individual years 4 Conclusion The results of this article show that some economically stronger countries, such as Germany and Austria, do not perform well in tertiary education compared to other economically weaker countries, such as the Czech Republic. Among the best countries in terms of efficiency derived on the basis of graduate employability in conjunction with the financing of tertiary education, in addition to the already mentioned Czech Republic, were also, for example, Ireland and Lithuania. It was also found that Denmark underwent significant changes in education during the period considered. For other countries, changes in efficiency over time have not been so dramatic. Acknowledgements This article was supported by grant N o . PEF/TP/2021003 of the Grant Agency I G A P E F M E N D E L U . References [1] Aristovnik, A . & Obadič, A . (2011). The Funding and Efficiency of Higher Education in Croatia and Slovenia; A Nonparametric Comparison. Proceedings of the International Scientific Conference, Juraj Dobřila University ofPula. 218-244. [2] Cooper, W . W., Seiford, L . M . & Tone, K . (2007). Data envelopment analysis: A comprehensive text with models, applications, references and DEA-solver software. 2 n d edition. New York: Springer Science & Business Media. [3] Gaebert, T. & Staňková, M . (2020). Efficiency Development in the German Pharmaceutical Market. Acta Universitatis agriculturae et silviculturae Mendelianae Brunensis, 68(5), 877-884. [4] Glushak, N . , Katkow, Y., Glushak, O., Katkowa, E., Kovaleva, N . (2015). Contemporary Economic A s pects of Education Quality Management at the University. Procedia - Social and Behavioral Sciences, 214, 252-260. [5] Jelic, O. & Kedzo, M . (2018). Efficiency vs effectiveness: an analysis of tertiary education across Europe. Public sector economics, 42(4), 381—414. [6] López-Torres, L . & Prior, D. (2020). Long-term efficiency of public service provision in a context of budget restrictions. A n application to the education sector. Socio-Economic Planning Sciences, 100946. [7] Melo-Becerra, L . A., Hahn-De-Castro, L . H , Ariza, D. A . & Carmona, C. O. (2020). Efficiency of local public education in a decentralized context. International Journal of Educational Development, 76, 102194. [8] Mikusova, P. (2020). The Efficiency of Public Higher Education Institutions: A Meta-Analysis. Ekonomický časopis, 68(9), 963-977. [9] Rozborilová, D . (2018). Investments i n Tertiary Education in the Context of Labor Market needs at the Beginning of the 21st Century. International Review of Research in Emerging Markets, 4(1), 1248-1264. 316 [10] Staňková, M . (2020). Efficiency comparison and efficiency development of the metallurgical industry in the E U : Parametric and non-parametric approaches. Acta Universitatis agriculturae et silviculturae Mendelianae Brunensis, 68(4), 165-114. [11] Staňková, M . & Hampel, D . (2018). Efficiency Comparison in the Development of Building Projects Sector. Mathematical Methods in Economics 2018: Conference Proceedings. MatfyzPress: Praha, 503-508. [12] Staňková, M . & Hampel, D . (2019). Bankruptcy Prediction Based on Data Envelopment Analysis. Mathematical Methods in Economics 2019: Conference Proceedings. České Budějovice: Jihočeská univerzita v Českých Budějovicích, 31-36. [13] Staňková, M . & Hampel, D . (2020). Efficiency Assessment of the U K Travel Agency Companies - Data Envelopment Analysis Approach. Mathematical Methods in Economics 2020: Conference Proceedings. Brno: Mendelova univerzita v Brně, 550-556. [14] Stefanova, K . & Velichkov, N . (2020). Analysis of the Efficiency of Tertiary Education Expenditure in European Union Member States from Central and Eastern Europe: A n Efficiency Frontier Approach. SouthEastern Europe Journal of Economics 1, 115-128. [15] Stojanová, H . , Blašková, V., Tomšík, P. & Tesařová, E . (2015). Specification and Characteristic of Generation Y in the Sphere of Work Attitude. DIEM 2015: Innovation, Leadership & Entrepreneurship, 565-579. [16] Stojanová, H . & Blašková, V . (2014). The significance of the chosen field of study, depending on the difficulty of finding a job. INTED2014 Proceedings. Valencia, Spain: IATED, 4002-4012. [17] Smejkalová, J. (2016). Proč se zabývat kvalitou vysokoškolského vzdělávání? Ekonomické listy (2), 42-51. [18] Šnýdrová, M . , Šnýdrová, I., Vnoučkova, L . (2017). Vnímání příčin uplatnitelnosti absolventů a jejich závislosti a specifika u dílčích skupin. Ekonomické listy 8(2), 40-55. [19] Šonje, A., Deskar-Škrbič, M . & Šonje, V . (2018). Efficiency of public expenditure on education: comparing Croatia with other N M S . International Technology, Education and Development Conference Valencia, Spain, 12, 1-14. [20] Trčka, L . (2014). Vzdělávací procesy - výzkum potřeb a zkušeností zaměstnavatelů absolventů. Trendy ekonomiky a managementu, 8(19), 63-69. [21] Ulovec, M . (2014). Potřeby zaměstnavatelů a připravenost absolventů škol - komparační analýza. Praha: N Ú V . [22] Wolszczak, J. (2014). A n evaluation and explanation of (in)efficiency in higher education institutions in Europe and the U.S. with the application of two-stage semi-parametric D E A . UC Berkeley; Institute for Research on Labor and Employment. [23] Yotova, L . & Stefanova, K . (2017). Efficiency of Tertiary Education Expenditure in C E E Countries: Data Envelopment Analysis. Economic Alternatives, 3, 352-364. [24] Zámková, M . & Blašková, V . (2013). Comparing the views on tuition fee introduction of Brno university students. Efficiency and Responsibility in Education 2013. Praha: CULS, 671-679. 317 Application of robust efficiency evaluation method on the Czech life and non-life insurance markets Markéta Matulová1 , Lucia Kubincová2 Abstract. The paper presents the analysis of more than fourty insurance companies operating in the Czech Republic in the period 2004-2018. The performance of the companies is evaluated by an universal robust Data Envelopment Analysis. The specification of the model applied in our study includes six variables as the inputs for the analysis: the number of employees and intermediaries, operating costs for the life and non-life segment, equity and total liabilities. A s outputs, we use three variables: the financial placement and earned premiums for life and non-life insurance. We compare the efficiency of the companies operating in the individual insurance markets using nonparametric statistical tests. We also try to monitor the dynamics of the performance of individual companies. Keywords: efficiency, life insurance, non-life insurance, robust D E A J E L Classification: G22, C67 A M S Classification: 90C05 1 Introduction Since the re-establishment of the Czech insurance market in the early 1990s, the the insurance sector has undergone a number of transformations and changes. Legislative changes such as the integration of European legislation and the economic situation in the country (the impact of the economic crisis at the end of 2009) are just one of many factors that directly affect the activity of the insurance market. In 1991, Act No. 185/1991 on insurance was adopted, opening up the Czech insurance market. In the following years, a number of insurance companies were established and entered the Czech insurance market. B y the end of the 20th century, the market had taken shape, a number of laws and legislative regulations. The requirements for the entry of insurance companies into the market were tightened (minimal value of capital for insurance companies, control of owners) and other measures and broader powers of supervisory authorities in the event of insolvency of insurance companies were set. In 1999 Act No 168/1999 on public liability insurance was adopted which abolished the monopoly of Česká pojišťovna on this product. With the accession to the E U in 2004, insurance companies operating in other European countries were able to enter the Czech insurance market. In 2005, the Act on Supplementary Supervision of Banks and Insurance Companies was adopted. In 2006, the Czech insurance market gained the highest profit after tax since the beginning of its existence. Life insurance growth has resumed, especially for investment products. At the end of 2008, both the global and the Czech economy began to feel the effects of of the economic crisis, with a decline in the performance of the insurance market in the following years. The year 2010 is characterized by a number of natural catastrophes, which was reflected in particular in the high cost of non-life insurance claims. In 2011, the insurance market experienced a slight decline. Negative trends were present even in the life insurance market, which has been relatively stable in recent years thanks to the rise in average rates. However, the number of life insurance contracts was declining. The non-life insurance market continued to stagnate. The crisis has reduced rates for the key sector - car insurance - to the minimum possible level. However, non-life insurance continued to lag behind life insurance in 2013. High claims costs caused by natural catastrophes contributed significantly to this effect. In 2014, we are seeing modest growth in the insurance market after years of stagnation. Life insurance is experiencing losses in premiums, driven by the continuing trend of cancellations of insurance contracts as a consequence of the change in the recognition of tax benefits. This trend is peaking at the end of the year 2014. The year 2015 was affected by the implementation of solvency rules and preparatory processes of the long-discussed European Solvency II Directive. The trend from 2014 continues, with the non-life segment growing. The growth of the non-life segment and the stagnation of the life sector continued in 2017 and 2018. Act No. 170/2018 Coll. had a significant impact on the insurance sector, on the distribution of insurance and reinsurance, which implements the so-called "Directive IDD" of the European Council and Parliament. For insurance intermediaries and independent liquidators, it introduces changes in the structure and imposes new requirements for business authorizations in the insurance sector. The insurance market has grown in 2019, primarily driven by the non-life insurance sector. The 1 Masaryk University, Faculty of Economics and Administration, Lipová 41a, 602 00 Brno, Czech Republic, Marketa.Matulova@econ.muni.cz 2 Slovenská Sporiteľňa, Tomášikova 48 832 37 Bratislava, Slovakia, 451791 @mail.muni.cz 318 ratio of segments in this year's has tilted even more strongly towards the non-life insurance market, but this has clearly been influenced by that year's change in risk classification. We present selected indicators of the insurance market in a graphical form in Figure 1. The data are obtained from the website of the Czech National Bank from the public database A R A D [1]. The coloured bars represent the development of the number of insurance companies in the Czech market. Number of insurance companies stabilised in 2007 after the reopening of the market and E U accession, since when it has been oscillating around value of 53. The largest share in the number of insurers on the market is represented by the entities specializing in a particular non-life insurance sector. We can also observe a black line representing annual insurability values, which is the ratio of gross premiums written and gross domestic product in a given year. It is an important indicator of the quality of the insurance market. In this context, we see that the highest insurability values are achieved by Czech insurance market for the pre-crisis period (2003 and 2004) and in the post-crisis period, which is mainly due to the decline in G D P in these years. The comparison of the insurability of the Czech insurance market with the European average is rather difficult and not entirely adequate. In Western European countries, commercial insurers also participate in social, health or pension insurance in different ways, which considerably changes the input values for this indicator. This is also related to the different structure of market structure, where for Western Europe or the U S A the ratio of life insurance to the non-life segment is 40:60. For the Czech Republic, this ratio has long been reversed and in recent years, the dominance of the non-life insurance market has even increased to 70:30. 0 40 0.20 0 00 Figure 1 Number of insurance companies and insurability Important indicators of the insurance market include the level of premiums and the level of claims costs. In Figure 2, we plot the development of gross written premiums and the value of gross claims costs separately for the life and non-life segment. The values depicted in Figure 2 correspond to the market developments described at the beginning of this chapter. The annual aggregate values of non-life premiums exceed each year the amount of in the life sector. We can see that the gap has widened in recent years, as evidenced by the 70:30 segment comparison for non-life insurance mentioned above. In the post-crisis period, we observe a stagnation of premiums in both segments. Since 2015, we identify a decline in the life segment, with both premiums and the cost of claims. This development is related to the increasing trend of cancellations in the life sector. We observe the highest values of life claims costs in 2014, which is a result of the peak in the life insurance market. We are seeing a huge number of early redemption of policies due to the removal of tax benefits for life insurance products from this year onwards. 2 Literature review There is an increasing interest in assessing efficiency of the insurance sector and hence the amount of published literature is growing as well. 319 Figure 2 Rough premiums and claims costs The often cited article [6] deals with the comparison of the efficiency of insurance companies among 36 countries around the world. It covers both life and non-life insurance companies over the time span 2002-2006. The authors used both the non-parametric D E A approach and the parametric S F A approach. They identify great differences between the countries, with higher efficiency achieved by countries in Europe and Asia - with the best scores for Japan and Denmark. The calculated efficiencies were further examined on the basis of the explanatory variables to identify firm- or country-specific effects. As part of their work, Cummins and Weiss [4] conducted a review of studies on insurance market efficiency. It consisted of 74 studies published between 1983 and 2000 and 37 articles published in leading journals between 2000 and 2011. In 59.5% of the studies reviewed, D E A was applied. The authors state that for the insurance sector, non-parametric methods are an appropriate choice as a tool for efficiency analysis. This is related to the nature of of the insurance sector, which does not produce products but offers services, therefore the use of parametric methods with specific assumptions on the shape of the production function are less appropriate choice. Part of the paper addresses the issue of selection of inputs and outputs. The authors present the main principles for measuring outputs of financial institutions, namely the intermediation approach, the so-called user-cost approach and the value-added approach. The last one, the value-added approach, is considered to be the most appropriate and therefore the most used. Authors state that the most common and also the most appropriate option for outputs are insurance claims together with changes in reserves separately for non-life and life insurance and the value of investments. In the analysed studies, a frequent choice of output was also written or received premiums, a choice not supported by Cummins and Weiss. Considering inputs, labor and various forms of capital dominated. Labor is usually expressed as the number of employees, the number of hours, or, if data are not available, average wage per worker in the industry is used for the calculations. Some studies distinguished between employees and intermediaries, and capital was sometimes divided into physical, debt (total) and equity. Other authors used material (which often included physical capital) as input as well. Newer article of Wise [14] compares 6 different approaches to the estimation of the effective frontier in the evaluation of life insurance companies. He provides information on the approaches used in 190 empirical studies and finds that the S F A approach is used less frequently than D E A from year to year. The difference is significant especially in recent years. Another survey that analyzes insurance market efficiency was written by Kaffash [11]. It includes 132 articles dealing with the assessment of the insurance market efficiency using D E A exclusively, covering the period from 1993 to 2018. The paper provides an overview of the choice of variables and D E A models used in measuring efficiency of the insurance sector. The most widely used approach in the choice of variables to model proved to be the value-added approach (applied by 68% of the studies). The following inputs dominated - labor/number of workers (60.72%), capital/debt capital (49.18%), equity (37.70%), and materials and business services (32.79%). 320 For outputs the most frequent choice was insurance premiums (50.82%), personal insurance costs separately for life and non-life insurance (22.95%), incurred losses (22.13%) and investment income (21.31%). When conducting the literature search, we looked for studies that evaluated the effectiveness of using D E A for the Czech insurance market. We found several studies. The first of them, [9], estimates the efficiency of life insurance markets for Poland and the Czech Republic. The aim of the study is to compare the efficiency of 17 Czech and 26 Polish life insurance companies for the year 2014. Furthermore, several studies are comparing Czech and Slovak insurance markets. The study [7] analyzes universal insurance companies from the Czech Republic and Slovakia using the data sample covering year 2007. The following inputs were used for this study - costs of indemnity and operating costs. Outputs included earned written premiums and other income. The level of technical efficiency was calculated on the basis of an input-oriented model with variable returns to scale. When comparing the two countries, the Czech Republic had a higher average of effective scores and less variability. Another conclusion of this paper is that both markets are strongly influenced by the so-called giants, i.e. several large insurance companies. Česká pojišťovna was rated the most efficient of the two countries. The aim of the article [8] was to compare the effectiveness of Slovak and Czech insurance market using network D E A , i.e. a multi-stage D E A model. The study used operating costs and claims incurred as inputs to the first phase that determined the efficiency with respect to minimizing costs. Written premiums were used as the output of the first phase and at the same time as the input entering the second phase. The second phase evaluated the efficiency based on profit maximization, the outputs being a income from financial investments. Masárová et al., [13] evaluated the development of the Slovak and Czech insurance market in the period 2004-2014. They state that life insurance dominates on the Slovak market whereas non-life insurance dominates in the Czech Republic (on the basis of gross written premiums). For both countries, the mean value of insurance (gross written premium capita) is still below the European Union average, which is interpreted by the authors as the growth potential of these markets. 3 Methodology Many of the studies mentioned in the previous chapter deal with the appropriate choice of inputs and outputs to of the production process in the insurace industry. When using financial data, we have to deal with the variables with both positive and negative values, which is not possible in traditional D E A model. The issue of negative values of inputs or outputs for D E A models is described by Lovell et al., [12]. His study deals with models invariant to transformation of inputs and outputs. Biener et al., [2] applied transformation to get non-negative values on the Swiss insurance data. We decided to use the following robust procedure for computing D E A scores introduced by Hladík [10]. The unit under evaluation D M U o is assigned the efficiency score r — 1 + 6*, where 6* is the optimal solution of the linear program S* - maxS subject to y$u - v» > \+6, x$v < 1 - ó, Yu - Xv - lvo < 0, u, v > 0 (1) where • xo e R"1 is the input nonnegative vector for D M U o , • yo € R " 2 is the output nonnegative vector for D M U o , • X € R m x " i is the input nonnegative matrix for the other D M U s , in particular,the ith row of X is the input vector for the ith D M U • Y € R m x " 2 j s m e output nonnegative matrix for the other D M U s , in particular, the ith row of Y is the output vector for the ith D M U , • u and v are vectors of variables representing output and input weights, respectively, • 1 is a vector of ones with convenient dimension. According to [10], r e [0,2] and D M U o is efficient if and only if r > 1, otherwise r < 1. In contrast to the classical D E A approach, the proposed score r can be also used for comparing D M U ' s from different models, because it is naturally normalized and universal score. The author also proves that proposed method does not change the scores of inefficient units obtained by classical D E A approach. Moreover, for efficient units, the score r shows how much the efficient D M U ' s are close to inefficiency. Below we show another useful property of the score r. Theorem 1. The score r defined by the equation (1) is translation invariant with respect to outputs. 321 Proof. Lets assume that i-th output is enlarged by a constant k, so that we have a new matrix Y with the i-th column equal to (yu + k,... ,ymi + k)T and other columns the same as in the original matrix. The formulation of the model 1 for the new output matrix is then following: S* - max S subject to y$u + km - vo > 1 + S, x$v < 1 - S, Yu - Xv + \kui - lvo < 0, u, v > 0 (2) The constraints can be rewritten using the substitution vo = vo - kui into yl u - v 0 > 1 + 5, XQV < 1-6, Yu - Xv - l v 0 < 0, u, v > 0 (3) which gives us the same feasible set as in the model 1 as the variable vo is of an arbitrary sign. • 4 Data We analyzed the data available from the annual reports of the association Czech Insurance Association members, CAP. The years 2004-2018 are captured in our data sample. For each year, we record the overall results for that year, the balance sheet and profit and loss account, and also the amount of written premiums according to the C A P methodology. We have information on 43 different insurance companies operating on the Czech market, which is not a complete sample. Despite the possibilities of supplementing the missing data from the annual reports of individual insurance companies, we kept only the data published by the C A P organization. The reason is that the share of members C A P insurance companies on the total written premium in the Czech Republic is about 97%. We can therefore say that these data sufficiently represent the Czech insurance market. Our dataset is incomplete not only because of the existence of C A P non-members, but also because some insurance companies left the Czech market or entered it only after 2004. In addition to this we have to adjust the sample due to the mergers of some insurance companies. So we had to apply various transformations and selection criteria (especially deflating the financial data and translation of negative outputs) to comply with the specifics of the methodology as described in the literature. That allowed us to use selected methods of estimating efficiency on our modified datasets. In our D E A model, we use an intertemporal production frontier approach that assumes, that the technology is the same for all periods. Thus, all units are evaluated for all periods under one D E A model. The life products are offered exclusively by 6 life insurance companies and 18 universal life insurers in our dataset. The most commonly offered products of life insurance is insurance with investment fund, which account for almost half of of life insurance premiums written, followed by supplementary insurance (insurance accident and sickness) and insurance for the case death. The non-life segment consists of a total of 15 non-life and 18 universal insurance companies. The structure of the non-life insurance market covers motor vehicle liability insurance, property insurance, accident insurance, general liability insurance, etc. 5 Results We evaluated the data by the method described above and present the average efficiencies of the companies in Table 1. A s mentioned in the comment to the model (1), efficient are those units with the score r > 1. The companies with the mean scores above one are highlighted. After the computation of the scores, we have further analyzed the efficiencies of insurance companies to test the influence of specific factors on the companies performance. In the survey [11], the authors provide a fairly exhaustive list of factors influencing efficiency, among the frequently analyzed factors are: organizational structure, corporate governance, mergers and acquisitions, deregulation and many others. Since the Czech insurance market is relatively homogeneous in terms of the origin of capital our work did not deal with the analysis of the impact of the organizational structure. We also do not identify a sufficient number of mergers and acquisitions to ascertain the degree of influence of efficiency by this factor. Thus, as a part of the secondary analysis, we address the impact of specialization of insurance business on efficiency. For the life and non-life segments, we compared results of specialized and universal insurance companies. We analyze whether the larger universal insurance companies achieve higher efficiency rates than narrowly specialized smaller entities within the life and non-life segment. We use nonparametric Mann-Whitney U-test comparing medians of efficiencies in the individual groups. Table 2 shows the values of the test statistic and the p-value for given comparisons. For life insurance, we reject the null hypothesis, which implies the conclusion that pure life insurers are more efficient than universal life insurers. For the non-life insurance segment, we do not reject the hypothesis, that universal insurers are more efficient than pure non-life insurers and therefore no clear conclusions can be drawn for the non-life segment. We also test the hypothesis of a higher efficiency rate of the non-life segment 322 company type mean std company type mean std aegon L 0.9636 0 0569 egap N - L 1 4104 0 5386 aig N - L 0.9269 0 0108 euler N - L 1 3353 0 5756 allianz U 0.9999 0 0276 gerling N - L 1 0603 0 1104 amcico metlife U 0.9270 0 1315 gP U 0 9558 0 0523 aviva L 1.0090 0 3723 halali N - L 1 0480 0 0239 axa N - L 1.0204 0 3650 hdi N - L 0 9863 0 0513 axa zp U 0.9594 0 0788 hvp U 0 8887 0 0247 basier L 1.0184 0 0176 ing nn L 1 1161 0 2314 BNP cardif U 1.0823 0 2033 koop U 1 1259 0 2844 ckp N - L 1.0593 0 1526 kp U 1 1044 0 2209 colonnade N - L 0.9858 0 1376 maxima u 1 0707 0 4398 cp U 1.1825 0 3426 pes u 1 0953 0 2437 cp zdrávi N - L 0.9733 0 0411 pvzp u 0 8838 0 0471 cpp U 0.9320 0 0480 slavia u 0 9116 0 0666 csobp U 0.9946 0 0525 triglav direct N - L 1 0936 0 3246 d.a.s. N - L 0.935 0 0378 uniqa u 0 9081 0 0396 direct N - L 0.7852 0 1368 victoria ergo u 0 8206 0 0337 dr leben L 1.0670 0 0677 wust N - L 0 9792 0 0530 ecp erv N - L 0.9975 0 0966 wust pob U 1 4704 0 7489 wust zp L 0 9450 0 0297 Table 1 Efficiencies of insurance companies (for their full names see [3]) relative to the life segment. The test did not reject the null hypothesis and thus we cannot say that the leading position of the non-life of the Czech insurance market is accompanied by higher efficiency values compared to life insurance. life vs. universal non-life universal life vs. non-life M - W U p-value M - W U p-value M - W U p-value 4871 0.01256 13688 0.2158 46395 0.7982 Table 2 Mann-Whitney U-test for the comparisons 6 Conclusion In our work, we have dealt with the evaluation of efficiency of decision making units in life and non-life segment of the insurance market. The results of the efficiency evaluation showed that a high level of efficiency is achieved by both smaller, specialised insurers and large market players. Similar conclusions though applied only to the life insurance segment, were presented by Diacon et al. in 2002 [5]. In the evaluation of the segments of the Czech insurance market by the Č A P or the Czech national bank, the non-life segment has been predominant, especially in recent years. Due to this fact, we therefore assumed a higher level of efficiency of the non-life segment. However, there was no statistically significant difference between the efficiency of the life and non-life segments. The results of the assessment of the individual segments even indicate that in the non-life insurance market, no single insurer performs efficiently through all the years. The practical reason why the outputs of insurance companies do not sufficiently cover their inputs may be due to the high competition in the non-life insurance market or low returns on investment activity. For the life segment, the results of the analysis suggest that better results are acquired by insurers specialized in the life segement compared to universal life insurers. The opposite results are suggested by the study of Grmanová and Pukala, [9] which assesses the life insurance market in Poland and the Czech Republic for the year 2018. Also their analysis for the year 2015, [8], suggests that universal insurance companies are the ones that are more efficient in the life insurance market. However, their study uses a different approach and model parameters, moreover, it assesses different dataset for only one year, which may explain the differences in the 323 conclusions of the papers. The contribution of our paper may be seen in the covering of a broader time span and the use of a robust universal D E A model for the evaluation of efficiencies. References [1] Czech National Bank database A R A D [online] [cit. 2020-07-18] Available from: https://www.cnb.cz/cs/statistika/menova_bankovni_stat/bankovni-statistika/ [2] Biener, C , Eling, M . & Wirfs, J. H . (2016). The determinants of efficiency and productivity in the swiss insurance industry. European journal of operational research, 248, (2), 703-714 [3] Č A R Česká asociace pojišťoven: individuální výsledky členů 2006-2019 [online] [cit. 2020-07-18]. Available from: https://www.cap.cz/statistiky-prognozy-analyzy/individualni-vysledky-clenu [4] Cummins, J.D. & Weiss, M . A . (2013). Handbook of insurance, Springer [5] Diacon, S.R., Starkey, K . & O'Brien, C , (2002). Size and Efficiency in European Longterm Insurance Companies: A n International Comparison. The Geneva Papers on Risk and Insurance. Issues and Practice 27, (3) [online] [cit. 2020-10-24]. [6] Eling, M . & Luhnen, M . (2010). Efficiency in the international insurance industry: a cross-country comparison. Journal of banking and finance, 34 (7), 1497-1509 [7] Grmanová, E . & Jablonský, J. (2009). Analýza efektívnosti slovenských a českých poisťovní pomocou modelov analýzy obalu dát. Ekonomický časopis, 57 (9), 857-869 [8] Grmanová, E . (2015). Efficiency in two-stage data envelopment analysis: an application to insurance companies. In: Kajurova, V., Krajíček, J. (Eds.), Proceedings of the 12th international scientific conference 158-165. [9] Grmanová, E . & Pukala, R., (2018). Efficiency of insurance companies in the Czech Republic and Poland. Oeconomia copernicana, 9 (1), 71-85 [10] Hladík, M . , (2019). Universal efficiency scores in data envelopment analysis based on a robust approach. Expert systems with applications, 122, 242-252. [11] Kaffash, S., Azizi, R., Huang, Y. & Zhu, J. (2020). A survey of data envelopment analysis applications in the insurance industry 1993-2018. European journal of operational research, 284 (3), 801-813 [12] Lovell, C . A . & Knox, P. T. (1995). Units invariant and translation invariant dea models. Operations research letters, 18(3), 147-151 [13] Masárová, J. & Koišová, E . (2016). Insurance market in Slovak Republic and Czech Republic. International multidisciplinary scientific conference on social sciences, 353-360 [14] Wise, W. (2017). A survey of life insurance efficiency papers: methods, pros & cons, trends. Accounting, 3 (3), 137-170 324 Stochastic reference point in the evaluation of risky decision alternatives Ewa Michalska1 , Renata Dudzinska-Baryla2 Abstract. In the first-generation (PT) and second-generation (CPT) prospect theory introduced by Kahneman and Tversky, a non-stochastic reference point is assumed in the evaluation of risky alternatives. In the third-generation prospect theory, Schmidt, Starmer, and Sugden took into consideration, besides well-known elements of the prospect theory, the uncertainty of the reference point. In this paper, we propose a method for the evaluation of risky decision alternatives using a random variable with a discrete distribution (another risky alternative) as a stochastic reference value. We provide the examples on the basis of which we demonstrate that the ordering of decision alternatives can depend on whether full information on the distribution is taken into account or not. The change of ordering (and also preferred alternative) can be observed regardless of considered criteria, subjective or objective, for example, stochastic dominance, Omega ratio, Omega-PT ratio, PT, and C P T . Keywords: stochastic reference point, ordering of alternatives, prospect theory, Omega ratio, stochastic dominance, threshold level J E L Classification: G40 A M S Classification: 91B06 1 Introduction The concept of a benchmark is a well-known concept in decision-making theory and practice. Benchmark (also known as reference value, reference point) means any value against which a decision alternative is evaluated. The reference value expresses the individual preferences of the decision-maker, such as e.g. the preferred portfolio rate of return or the desired outcome of a decision. However, the assessment of a risky decision alternative can depend not only on the adopted reference value but also on the measure used (objective or subjective). In the theories considered today, it is assumed that emotions and subjective perception of information are inherent in decision-making process. Therefore, decision support methods used in practice increasingly often take into account behavioural elements (including reference value), thus becoming part of the stream of modern economics, management and finance. These include the P T [4] or C P T [9] evaluation proposed in the prospect theory as well as performance ratios such as the omega ratio [7] or omega-PT [6]. In the third generation prospect theory proposed in 2008 by Schmidt, Stramer and Sugden, the uncertainty of the reference value is taken into account in addition to the basic assumptions of prospect theory [8]. In decisionmaking practice, however, an approach is used in which the reference point being a random variable is replaced by a fixed value e.g. the expected value of a random variable or its certainty equivalent. This approach leads to a loss of information on the distribution, which may cause a reordering of the preferred decision alternatives. The aim of this paper is to propose a method of taking into account (in the evaluation of risky decision alternatives) a reference value that is a random variable with a discrete distribution, which does not lead to a loss of information about the distribution. The paper also shows (through a case study method) that the order of preferred decision alternatives can depend on whether the reference value is a random variable or a fixed value (the expected value of a random variable). The order is determined based on both objective and subjective evaluation criteria. 2 Reference value in decision making Risky decisions occur when the outcomes of decisions are uncertain and the decision problem is to choose one of many alternatives. In the normative approach, when the probability distribution of decision outcomes is known, a choice rule based on the concept of expected value or expected utility is used. In 1944, von Neumann and Morgenstern provided the axioms of utility theory and showed that the concept is consistent with the preferences of 1 University of Economics in Katowice, ul. 1 Maja 50, 40-287 Katowice, Poland, ewa.michalska@ue.katowice.pl. 2 University of Economics in Katowice, ul. 1 Maja 50, 40-287 Katowice, Poland, renata.dudzinska-baryla@ue.katowice.pl. 325 a rational decision-maker [10]. However, further empirical studies on decision-making criteria revealed that decision-makers do not always use the principles of expected utility theory. The next step in studies on decision making under risk was the paper published in 1979 by Kahneman and Tversky [4]. In their concept called prospect theory, attention is paid to the context of the decision, and the possible outcomes of the decision are related to an assumed reference point (a fixed value, e.g. a state of wealth) and expressed as relative gains and losses. The decision-maker subjectively evaluates gains and losses (relative outcomes), showing loss aversion and risk aversion in the face of gains and risk seeking in the face of losses. The probabilities of relative outcomes are also subjectively evaluated. In the third generation prospect theory proposed in 2008 by Schmidt, Starmer and Sugden, in addition to the characteristic elements present in the previous generation prospect theory, the randomness of the reference value, which can be another risky decision alternative, is assumed [8]. The need to take into account the randomness of the reference value has empirical and theoretical justification. In the evaluation of risky decision alternatives (e.g. lotteries, stocks, investment funds) the adoption of a fixed value of the reference point does not always correspond to reality. In practice, the role of the benchmark is often played by financial instruments that are random in nature. For investment funds, the benchmark portfolio can be, for example, a stock market index [1]. Under risk conditions, the decision alternative X is a random variable of the form of * = Pi), 0_; Vi). - . O n ; pJ) where: xt means the absolute outcome, and pt the corresponding probability for i = 1,2,..., n. Similarly, the random reference alternative L is a random variable L = ( ( _ ; / _ ) , (l2;h2) (lm;hm)) where: lj means reference value, and hj the corresponding probability for j = 1,2,...,m. Taking into account the reference alternative L by replacing it with the expected value E(L), used in decisionmaking practice (in assessing the risky decision alternative X against the another risky decision alternative L), leads to a decision alternative of the form of X - E(L) = « X l - E(L); P l ), (x2 - E(L); p 2 ) („„ - E(L); p j ) where: (xi — E(L)) denotes relative outcome and p; the corresponding probability for i = 1,2,...,n. The consequence of such a procedure is the loss of information on the distribution of the random variable L, nevertheless it is widely used e.g. in various types of performance ratios [3, 5, 7]. In the approach proposed in this paper, we assume that the random variables X and L are independent. In assessing the risky decision alternative X against the another risky decision alternative L we take into consideration the relative outcomes zk = xt — lj and the corresponding probabilities of the form of qk = p; • hj for i = 1,2, ...,n, j = 1,2,..., m, k = 1,2,..., n • m. These elements form a new random decision alternative Z = X — L = ( O i ; qt), (z2; q2 ),..., (zn .m ; q n . m ) ) , whose ordered outcomes of z1 < z2 < ••• < zn.m represent gains or losses relative to the reference values. 3 Representation of a random reference alternative in evaluation of decision alternatives Taking into account the randomness of the reference point makes it possible to model the decision-making situations in which instability of the decision-maker's revealed preferences is observed. The manner in which we take into account the random reference alternative in the evaluation of risky decision alternatives (random variable or fixed value) can influence the order of the preferred decision alternatives. In examples illustrating this fact, the risky decision alternatives were evaluated on the basis of selected objective and subjective criteria. For the decision alternatives X and Y and a random reference alternative represented by the random variable L the preference relation " > L " was determined on the basis of the following criteria: Criterion 1. Maximisation of the omega performance ratio X >L Y o D.X(L) > fly(L) 326 where Hk,zk>Oz k ' Ik n ( L ) = 2Zk,zk<0 z k ' Ik Criterion 2. Maximisation of P T value X >L Y P T X ( L ) > P T r ( L ) where PT(L) = ) v(zk) • w(qk) + ) v(zk) • w(qk) *—'k,zk<0 *—'k,zk>0 Criterion 3. Maximisation of C P T value X >L Y <=> C P T X ( L ) > C P T y ( L ) where CPT(L) = V v{zk) • Wk + Y v{zk) • Wk< *—'k,zkozk<0 *—'k,zk> Criterion 4. Maximisation of the omega-PT ratio X >L Y <=> n P T x ( L ) > n P T y ( L ) where 2fc,zfc>oKzfc) • w(qk) H P T ( L ) = Le,Z f c 0 z k < 0 whereas v(0) = 0 and the function w(-) is a probability weighting function of the form w ( q k ) ~ ( f e T + (i - qkyyir where y = 0,61 for gains and y = 0,69 for losses, moreover w(0) = 0, w ( l ) = 1. Symbols Wt~, Wt + mean weights depending on the evaluation of cumulative probabilities [2, 9]. In an analogous way, based on criteria 1-4, the ">E(L)" preference relation for the decision alternatives X and Y and a random reference alternative represented by the expected value E(L) is defined. Example 1. Suppose that we have two random decision alternatives: X = ((20; 0.1), (40; 0.7), (60; 0.2)) Y = ((10; 0.2), (50; 0.4), (110; 0.4)) and the reference alternative L = ((30; 0.9), (50; 0.1)), whose expected value is E(L) = 32. Moreover, D,X(L) = 6.26, H y ( L ) = 8.73, flx(E(L)) = 9.33, n y ( E ( L ) ) = 8.73. 327 Based on the calculated values of the omega ratio (criterion 1) for the decision alternatives X and Y the following relationships were obtained H X ( L ) < flr(L) and flx(E(L)) > n y ( E ( L ) ) . Thus, if in the evaluation of decisionmaking alternatives we consider the reference alternative L as a random variable, then Y > n m a n d ifw e replace it by the expected value E(L), then X >a(E(D) Y. Example 2. Consider random decision alternatives: X = ((10; 0.2), (70; 0.5), (90; 0.3)) Y = ((10; 0.1), (60; 0.6), (90; 0.3)) and the reference alternative L = ((20; 0.5), (50; 0.5)), whose expected value is E(L) = 35. Moreover, PTX(L) = 15.78, PTy(L) = 13.77, P TX ( E ( L ) ) = 10.61, P T y ( E ( L ) ) = 12.37. Based on the calculated P T values (criterion 2), it was stated that P T X ( L ) > P T r (L) and P TX ( E ( L ) ) < P T r ( E ( L ) ) . Therefore, if in the evaluation of the decision alternatives X and Y we consider the reference alternative L as a random variable, then X >L Y but if we replace it by the expected value E(L), then Y >E(L) Example 3. Suppose that we have two random decision alternatives: X = ((20; 0.5), (70; 0.2), (110; 0.3)) Y = ((10; 0.1), (50; 0.5), (70; 0.4)) and the reference alternative L = ((30; 0.9), (50; 0.1)) whose expected value is E(L) = 32. Moreover, CPTX(L) = 6.54, CPTr (L) = 6.84, CPTX (E(L)) = 8.14, CPTy (E(L)) = 7.62. On the basis of the C P T values (criterion 3) calculated for the decision alternatives X and Y, it was stated that CPTX(L) < CPTr (L) and CPTX (E(L)) > CPTr (E(L)). Thus, if in the evaluation of the decision alternatives we take into consideration the reference alternative L as a random variable, then Y >L X, and if we replace it by the expected value E(L), then X >E(L) Y. Example 4. We have two random decision alternatives: X = ((20; 0.5), (60; 0.4), (70; 0.1)) Y = ((10; 0.3), (50; 0.2), (70; 0.5)) and the reference alternative L = ((30; 0.9), (60; 0.1)) whose expected value is E(L) = 33. Moreover, J1PTX (L) = 0.87, nPTy (L) = 0.89, nPTx (E(L)) = 1.15, f2PTv (E(L)) = 1.14. The values of the omega-PT ratio (criterion 4) calculated for the decision alternatives X and Y lead to the following inequalities: H P T X ( L ) < flPTr(L) and flPTx(E(L)) > flPTr(E(L)). If in the evaluation of the decision alternatives we take into consideration the reference alternative L as a random variable, then Y >LX and if we replace it by the expected value E(L), we have X >E(L) Y. The PT, C P T and omega and omega-PT performance ratios used in the examples are decreasing functions of the reference value. A n increase in the reference value results in a decrease in the value of the ratio. For the decision alternative X and two random reference alternatives L I and L2, such that E(L1) > E(L2), the following inequalities occur: n x ( E ( L l ) ) < n x ( E ( L 2 ) ) 328 PT X (E(L1)) < PT X (E(L2)) CPTX (E(L1)) < CPTX (E(L2)) n P T x ( E ( L l ) ) < flPTx(E(L2)). The analogous property is observed for the omega ratio and C P T value, when (in the evaluation of decision alternatives) the reference alternatives are random variables ordered based on the first-order stochastic dominance (FSD). For the decision alternative X and two random reference alternatives L I and L2, such that L I > F S D ^2, the following inequalities occur: nx(Ll) < nx(L2) CPTyCLl) < CPTX (L2) These properties are illustrated by the following example. Example 5. Let us consider two random decision alternatives X = ((10; 0.2), (70; 0.5), (90; 0.3)) and Y = ((10; 0.1), (60; 0.6), (90; 0.3)) and three reference alternatives: L I = ((25; 0.3), (55; 0.7)), L2 = ((20; 0.5), (50; 0.5)) and L 3 = ((19; 0.6), (49; 0.4)), whose expected values are E(L1) = 46, E(L2) = 35, E(L3) = 31. The reference alternatives L I , L2 and L3 were ordered according to the first-order stochastic dominance and the following preference relations were obtained L I > F S D E2 >FSD L3. Moreover, their expected values satisfy the condition E(L1) > E(L2) > E(L3). Table 1 shows the values of PT, C P T and omega and omega-PT performance ratios for different forms of reference value i.e. random variable and expected value of random variable. Criterion (X - L) for (Y - L) for Criterion L = L I 1=12 L = L 3 L = L I L = L2 L = L 3 Omega 3.50 6.80 8.86 6.00 12.60 16.71 C P T 0.33 6.91 9.24 3.95 10.24 12.45 Table 1 T h e evaluations o f decision alternatives X and Y for random reference alternative L represented b y random variables L I , L 2 , L3 Criterion (X - E(L)) for (Y - E (L)) for Criterion L = L I L = L2 L = L 3 L = L I L = L2 L = L 3 Omega 3.50 6.80 8.86 6.00 12.60 16.71 C P T 0.09 7.60 10.35 3.94 11.00 13.55 Table 2 T h e evaluations o f decision alternatives X and Y for random reference alternative L represented b y the expected values o f random variables L I , L 2 , L3 Regardless of how the random reference alternative is represented (random variable or the expected value of random variable) in the evaluation of random decision alternatives, all considered evaluations are decreasing. The regularities presented in the examples were confirmed in simulation studies performed for the three-outcome decision and reference alternatives. Random reference alternatives satisfying the first-order stochastic dominance (FSD) and random decision alternatives were generated (60000 triples ( L I , L 2 , X)). For all replications the considered relations are satisfied. 4 Conclusions In recent years, there has been growing interest in decision support methods that reflect the behavioural inclinations that characterise the decision-maker. New measures and ratios that fulfil these requirements are constantly being 329 proposed in the literature, and much attention is devoted to the study of their properties. In this paper we consider the selected evaluations/ratios (PT, C P T , Q., OPT) that allow to take into account both a reference value and full information about the distribution of evaluated random decision alternatives. Taking into account the aforementioned decision alternative evaluations as functions of the reference point represented by the expected value, it was found that the increase in the value of the reference point causes a decrease in the value of the random decision alternative evaluations. B y analysing the relationship between the evaluations of the random decision alternatives, we can also conclude that after changing the value of the reference point, a different decision alternative than before can be preferred. Moreover, the reordering of the preferred decision alternatives can occur repeatedly with the change of the reference point. The specified properties are confirmed in the source literature [2, 6]. Analogous properties were observed in the examples analysed for the omega ratio and C P T evaluation, taking into account the considered decision alternative evaluations as functions of the reference point represented by a random variable. Their generalisation requires evidence or more extensive simulation studies. Nevertheless, the presented examples are sufficient to conclude that the manner for taking into account the random decision alternative (random variable or the expectation value of random variable) in the evaluation of risky decision alternatives influences their ordering. This should be taken into consideration when selecting a representation of the random reference alternative. References [1] Borkowski, K . (2011). Teoria ipraktyka benchmarków. Warszawa: Wydawnictwo Diffin (in Polish). [2] Dudziahska-Baryla, R. (2019). Subiektywna ocena inwestycji gieldowych - ujecie ilošciowe. Katowice: Wydawnictwo Uniwersytetu Ekonomicznego w Katowicach (in Polish). [3] Elton, E.J. & Gruber, M.J. (2010). Investments and Portfolio Performance. World Scientific. [4] Kahneman, D. & Tversky, A . (1979). Prospect Theory: A n Analysis of Decision Under Risk. Econometrica, 47(2), 263-291. [5] Michalska, E . & Kopanska-Bródka, D . (2015). The Omega Function for Continuous Distribution. In: D . Martinčik, J. Ircingowá & P. Janeček (Eds.), Conference Proceedings, 33rd International Conference Mathematical Methods in Economics (pp. 543-548). Plzeň: University of West Bohemia. [6] Michalska, E. (2018). Obiektywna a subiektywna ocean efektywnosci ryzykownych wariantów decyzyjnych, Katowice: Wydawnictwo Uniwersytetu Ekonomicznego w Katowicach (in Polish). [7] Shadwick, W . & Keating, C. (2002). A Universal Performance Measure. Journal of Performance Measurement, 6(3), 59-84. [8] Schmidt, U . , Starmer, Ch. & Sugden, R. (2008). Third-generation prospect theory. Journal of Risk and Uncertainty, 36(3), 203-223. [9] Tversky, A . & Kahneman, D . (1992). Advances in Prospect Theory: Cumulative Representation of Uncertainty. Journal of Risk and Uncertainty, 5, 297-323. [10] von Neumann, J. & Morgenstern, O. (1944). Theory of Games and Economic Behavior. Princeton: Princeton University Press. 330 Structure of the threshold digraphs of convex and concave Monge matrices in fuzzy algebra Monika Molnarova1 Abstract. Properties of threshold digraphs of both convex and concave Monge matrices over fuzzy algebra (max-min algebra) are studied. The threshold digraphs of a concave Monge matrix are shown to have a block structure corresponding to the strongly connected components of the digraph with loops on every node and cycles of length two connecting each pair of consecutive nodes in a non-trivial strongly connected component. A n algorithm with polynomial computational complexity to determine the strongly connected components is presented. In contrast to the concave Monge matrices all loops lie in the same strongly connected component in the threshold digraphs of a convex Monge matrix. A cycle with odd length guaranties the existence of a loop in the threshold digraph. The existence of a common cycle of length two for minimal and maximal node in a non-trivial strongly connected component is proved. Keywords: fuzzy algebra, convex Monge matrix, concave Monge matrix, threshold digraph J E L Classification: C02 A M S Classification: 08A72, 90B35, 90C47 1 Introduction To model discrete dynamic systems (DDS) using the concept of extremal algebras is a frequent method for solving optimization problems in many diverse areas ([8], [9]). We have studied Monge matrices, their structural properties and algorithms solving many problems related to Monge matrices in [1]. Results concerning robustness of matrices were presented in [5], [6], [7]. The aim of this paper is to present properties of the threshold digraphs of convex and concave Monge matrices over max-min algebra. Obtained results can be useful for investigating properties of Monge matrices related to problems as periodicity or robustness and consequently for deriving efficient algorithms. We briefly outline the content and main results of the paper. Section 2 provides the necessary preliminaries on max-min algebra and the notion of a convex Monge matrix and a concave Monge matrix is introduced. In Section 3, results concerning properties of threshold digraphs of convex Monge matrices is presented. The key result of this section is Theorem 3, which proves the existence of a common cycle of length two for minimal and maximal node in a non-trivial strongly connected component. In Section 4, results concerning properties of threshold digraphs of concave Monge matrices is presented. The key results of this section are Theorem 4, which proves the existence of the loop on every node in a non-trivial strongly connected component, Theorem 6, which proves the existence of cycles of length two connecting each pair of consecutive nodes in a non-trivial strongly connected component, and Theorem 7, which presents the polynomial algorithm for finding the strongly connected components. 2 Background of the problem The fuzzy algebra S is a triple (B,(B, ®), where (B,<) is a bounded linearly ordered set with binary operations maximum and minimum, denoted by ffi, ®. The least element in B will be denoted by O, the greatest one by /. B y N we denote the set of all natural numbers. For a given natural n e N , we use the notation N for the set of all smaller or equal positive natural numbers, i.e., N - \ \,2, . . . , n). For any m, n e N , B(m,n) denotes the set of all matrices of type m x n and B(n) the set of all n-dimensional column vectors over !B. The matrix operations over S are defined formally in the same manner (with respect to ©, ®) as matrix operations over any field. 1 Technical University of Kosice, Department of Mathematics and Theoretical Informatics, B. Nemcovej 32, 04200 Kosice, Slovakia, Monika.Molnarova@tuke.sk 331 A digraph is a pair G = (V,E), where V , the so-called vertex set, is a finite set, and E, the so-called edge set, is a subset of V x V. A digraph G' = (V',E') is a subdigraph of the digraph G (for brevity G' c G), if V" c y and E' Q E. A path in the digraph G = (V,E) is a sequence of vertices p = (i\, . . . , (£+1) such that (ij,ij+i) e £ for j - 1, . . . , k. The number k is the length of the path p and is denoted by ((p). If i\ = ik+i, then p is called a cycle. For a given matrix A e B(n, n) the symbol G(A) = (TV, £ ) stands for the complete, edge-weighted digraph associated with A, i.e., the vertex set of G(A) is N, and the capacity of any edge (i,j) e E is atj. In addition, for given h € B, the threshold digraph G(A,h) is the digraph G = (N,E') with the vertex set N and the edge set E' = {(i,j); i,j e TV, a y > h). B y a strongly connected component of a digraph G(A,h) - (N,E) we mean a subdigraph "7C = (N%,E%) generated by a non-empty subset c N such that any two distinct vertices i,j € N% are contained in a common cycle, E% - EC\(N%Y.N%) and N% is the maximal subset with this property. A strongly connected component *7C of a digraph is called non-trivial, if there is a cycle of positive length in "7C. B y S C C * ( G ) we denote the set of all non-trivial strongly connected components of G. Definition 1. We say, that a matrix A - (a! y ) e B(m, n) is a convex Monge matrix (concave Monge matrix) if and only if atj ® au < an ® akj for all i < k, j < I (atj ® au > an ® a\j for all i < k, j < I). Obviously it is enough to consider thresholds h e H - {a^; i,j e N) to get all threshold digraphs corresponding to the matrix A. 3 Properties of threshold digraphs of convex Monge matrices In this section we present results related to structure of threshold digraphs of convex Monge matrices. Every cycle in a non-trivial strongly connected component is a concatenation of cycles of length one and two. Consequently there is a node with loop in the cycle of odd length. Moreover, all loops in a threshold digraph lie in the same non-trivial strongly connected component. In addition, we prove the existence of a common cycle for minimal and maximal node in a non-trivial strongly connected component of a threshold digraph. Theorem 1. [3] Let A e B(n,n) be a convex Monge matrix. LefK e S C C * (G( A, h)) for h e H. Let c be a cycle of length ((c) > 3 in ( K. Then c can split in "7C into finite number of cycles of length one or two. Corollary 1. [3] Let A e B(n,n) be a convex Monge matrix. LefK" e S C C * ( G ( A , h)) for h e H. Let c be a cycle of odd length ((c) > 3 in< K. Then there is a node in c with a loop. Theorem 2. [2] Let A e B(n,n) be a convex Monge matrix. Let for i, k € N be the loops (i,i) and (k,k) in the digraph G(A, h)for h e H. Then the nodes i and k are in the same non-trivial strongly connected component h. Hence I < u. Since u e N%, there exists k e N% \ \t, u) with atu ^ h. Using the Monge property of the matrix A holds h < ati® auu < atu ® auConsequently atu > h, i.e. there is an arc (/, u) in *7C what is a contradiction. Now, let us assume there is no arc (u, t) in < K, i.e. aut < h, and let k e N% be the minimal index for which aut > h. Hence k > t. Since t e N%, there exists / e N% \ {t, u] with au > h. B y the Monge property of the matrix A holds h < an ® auk < aut ® aikConsequently aut > h, i.e. there is an arc (u, t) in "7C what is a contradiction. Hence there is a cycle (t, u, t) in 1 and the assertion is true for all cycles of length less than k. Let ci = (io.ii, ••• Jk-\Jk) with z'o = ik and let is = max{z'o,z'i,... ,ik-\}. If J S - I = is, then the arc is-\ = is is a loop and clipping out this arc from the cycle c we get a cycle c' of length ((c') < k and the assertion follows from the recursion assumption. B y analogy we can deal with the case when is+\ - is. 333 Thus, we may assume that the nodes is-\ and is+\ are different from is. Since is is maximal it follows that is-\ < is and i s + i < is and we can use the Monge property of the matrix A a; ,,• ® a,- ,• ^, < a,- ,,• ^, ® a,- ,• . Consequently for the cycle c in the threshold digraph G(A,h) for h e H holds h < ai o i j ®fljjij... flis_!i5® a i s i s + l ... a!|t_l!(l < (1) ... ais_lis+J ®fljsis... a!(l_l!(l. Then fljsjs represents a loop on the node is in G(A,h). Moreover, by (1) we can clip out the arcs (is-i,is) and (is,is+i) from the cycle c, replacing them by arc (is-\is+\) and we get a cycle c' of length t(c') < k and the assertion follows from the recursion assumption. • Corollary 2. Let A e B(n,n) be a concave Monge matrix. Let 7C e SCC*(G(A,/z)) for h € H. Then there is a loop on every node in < K. Proof. B y definition of a non-trivial strongly connected component there exists a cycle connecting all nodes of the component < K. The assertion follows by Theorem 4. • Corollary 3. Let A e B(n, ri) be a concave Monge matrix. Element akk < h represents a trivial strongly connected component of G(A, h). Proof, atk < h implies that there is no loop on the node k in G(A,h). Hence, by Corollary 2 there is no cycle in G(A,h) containing node k. • Now, we can reformulate Corollary 2. Corollary 4. Let A e B(n,n) be a concave Monge matrix. Let c be a cycle in G(A,h) for h e H. Then c can split in "7C into finite number of cycles of length one. Theorem 5. [4] Let A e B(n,n) be a concave Monge matrix. Let c\ — (it),i\,... ,ik) with io — z& and C2 — C/0>7i> • • • Jl) with y'o = ji be cycles in different non-trivial strongly connected components in G(A, h) for h e H. Let is — min{z'o,z'i,... and it - max{z'o,z'i,... ,ik-i }• Then exactly one of the conditions holds (i) jm < is far all m € {0,1,..., /}, (H) jm > h for all m € {0,1,..., /}. Corollary 5. Let A e B(n,n) be a concave Monge matrix. Let "7Ci and 'Ki be two different non-trivial strongly connected components of G(A, h) foxh&H generated by the node set = { i i , . . . , i*} and N%2 = { J i , . . . Ji}, respectively. Let is = min{z'i,z'2, • • -fk) and it = max{z'i,z'2, • • -fk)- Then exactly one of the conditions holds (i) jm < h for all m e {1,2,...,/}, (ii) jm > it for all m e {1,2,...,/}. Proof. B y definition of a non-trivial strongly connected component there exists a cycle connecting all nodes of the component < K. The assertion follows by Theorem 5. • Corollary 6. Let A e B(n,n) be a concave Monge matrix. Than A has a block form in which the diagonal blocks represent the strongly connected components of G(A,h) for h e H. Proof. The assertion follows by Corollary 5. • Theorem 6. Let A e B(n,n) be a concave Monge matrix. Let i and i + 1 be two nodes in a strongly connected component *7C of G(A,h) for h e H. Then *7C contains the cycle (i,i + l,i). Proof. Let *7C be a non-trivial strongly connected component of G( A, h) for h e H generated by the node set N h and ai+u ^ h to determine the non-trivial strongly connected components and to verify the condition an < h to determine a trivial strongly connected component. This part takes O(n) time. Since the number of possible inputs of h, i.e. the number of threshold digraphs G(A,h), is given by the number of matrix inputs, namely n2 , is the total computational complexity 0(ni ). • Example 2. Let us check the structure of the threshold digraphs G(A, h) of the given concave Monge matrix A e £ ( 8 , 8 ) forfi = [0,3] ' 3 3 2 0 0 0 0 0 3 3 3 1 0 0 0 0 1 3 3 3 0 0 0 0 0 2 3 3 0 0 0 0 0 0 0 1 2 2 1 0 0 0 0 0 2 3 1 0 0 0 0 0 1 1 3 3 v 0 0 0 0 0 1 3 3 Due to Corollary 6 the matrix A has a block form in which the diagonal blocks represent the strongly connected components of G(A,h) for h e H. According to the algorithm described in the proof of Theorem 7 it is enough to check for an > h the consecutive cycles of length two till at least one of the inequalities an+i > h and ai+u ^ h 335 does not hold, i.e. the node set of a non-trivial strongly connected component is completed and the node i + 1 belongs to the following component. For an < his the component trivial. Hence there are two non-trivial strongly connected components in G(A, 1) (the corresponding blocks are bounded by single lines in A), there are three nontrivial strongly connected components in G(A,2) (the corresponding blocks are bounded by single and double lines in A) and there are three non-trivial and one trivial strongly connected component (due to Corollary 3 since 055 < 3) in G(A,3) (the corresponding blocks are bounded by single, double and triple lines in A) (see Figure 2). Moreover, due to Corollary 2 there is a loop on every node in all non-trivial strongly connected components of G(A,h) fovh€ H. Figure 2 Strongly connected components in threshold digraphs of a concave Monge matrix 5 Conclusion We have studied structure of threshold digraphs of convex and concave Monge matrices in this paper. Obtained results can be helpful to find efficient algorithms for checking possible and universal robustness of interval Monge matrices over max-min algebra. The robustness of a matrix A is related to the solution of the eigenproblem A ® x(r) - x(r), where an eigenvector x(r) represents the steady state. This can reflect the following economic application. Several projects characterized by different properties should be evaluated in a company. The level of each property i is described by value xi, influenced by all properties Xj. The influence is represented by a factor aij. References [1] Burkard, R. E., Klinz, B . and Rudolf, R.: Perspectives of Monge properties in optimization, DAM, Volume 70(1996), 95-161. [2] Molnárová, M . : Robustness of Monge matrices in fuzzy algebra, In: Proceedings of 32nd Int. Conference Mathematical Methods in Economics 2014, Olomouc, 2014, 679-684. [3] Molnárová, M . : Periodicity of convex and concave Monge matrices in max-min algebra, In: Proceedings of3%th Int. Conference Mathematical Methods in Economics 2020, Brno, 2020, 377-382. [4] Molnárová, M . : Convex and concave Monge matrices in fuzzy algebra - comparison with respect to robustness, (preprint). [5] Myšková, H . and Plavka, J.: X-robustness of interval circulant matrices in fuzzy algebra, Linear Algebra and its Applications, Volume 438 (2013), 2757-2769. [6] Myšková, H . and Plavka, J.: The robustness of interval matrices in max-plus algebra, Linear Algebra and its Applications, Volume 445 (2013), 85-102. [7] Plavka, J.: The weak robustness of interval matrices in max-plus algebra, DAM, Volume 173 (2014), 92-101. [8] De Schutter, B . , van den Boom, T , X u , J. and Farahani, S.S.: Analysis and control of max-plus linear discrete-event systems: A n introduction. Discrete Event Dyn. Syst., Volume 30 (2020), 25-54. [9] Zimmermann, H.J.: Fuzzy Set Theory A n d Its Applications; Springer Science and Business Media, Berlin, Germany, 2011. 336 Weak Solvability of Max-plus Matrix Equations Helena Myšková1 Abstract. Max-plus algebra is an algebraic structure, in which classical addition and multiplication are replaced by maximum and addition, respectively. Behavior of discrete event systems, in which the individual components move from event to event rather than varying continuously through time, is often described by systems of maxplus linear equations or matrix equations. Discrete dynamic systems can be studied using max-plus matrix operations. It often happens that a max-plus matrix equation with exact data is unsolvable. Therefore, we replace matrix elements with intervals of possible values. In this way, we obtain an interval max-plus matrix equation. Several types of solvability of interval max-plus matrix equations have been studied yet. In this paper, we prove the necessary and sufficient condition for an interval max-plus matrix equation to be strongly tolerance solvable and provide an algorithm for checking whether the given interval matrix equation is strongly tolerance solvable. Keywords: max-plus algebra, interval matrix, matrix equation, weak solvability J E L Classification: C02 A M S Classification: 15A18; 15A80; 65G30 1 Motivation Behaviour of discrete event systems, in which the individual components move from event to event rather than varying continuously through time, is often described by systems of linear equations or by matrix equations. Discrete dynamic systems and related algebraic structures were studied using max-plus matrix operations in [1, 2]. In the last decades, significant effort has been developed to study systems of max-plus linear equations in the form A ® x = b, where A is a matrix, b and x are vectors of compatible dimensions. Systems of linear equations over max-plus algebra are used in several branches of applied mathematics. Among interesting real-life applications let us mention e.g. a large scale model of Dutch railway network or synchronizing traffic lights in Delfts [8]. In the last two decades, interval systems of the form A ® x = b have been studied, for details see [1, 3,4]. In this paper, we shall deal with interval max-plus matrix equations of the form A ® X = B, where A, B are given interval matrices of suitable sizes a X is an unknown matrix. Several solvability concepts have been studied in [6], [7]. The following example is a slightly modified version of an example given in [6]. Example 1. Let us consider a situation, in which passengers from places P\, P2, • • •, Pm want to transfer to holiday destinations D\, D2,..., Dr. Suppose that the air transport to the destination Dk is provided by the airport terminal Tk. Different transportation means provide transporting passengers from places P\, P2,..., Pm to airport terminals T\, T2,..., Tr. We assume that the connection between Pi(i e N, N = { 1 , 2 , . . . , « } ) and Tk (k e R, R = {1,2,... ,r}) is possible only via one of the check points Q\, Q2,..., Qn. Denote by a^- the times needed for transportation and carrying out the formalities on the connection from Pi to Qj (j e N, N = { 1 , 2 , . . . , « } ) . If there is no connection from P{ to Qj we put aij = -co. If the time needed for transportation from place Qj to terminal Tk is xjk, then the time needed for transportation from P,- to Tk via Qj is equal to aij + xji + cik. Assume that b[k is the time available to passengers traveling from place P : to destination Dk to move to terminal Tk. Our task is to choose the appropriate times xjk, j e N, k e R so that all passengers get to their airport terminal on time, i.e., max{a,7 +xjk} = bi k . (1) j€N 2 Preliminaries Max-plus algebra is the triple (R, ©, ®), where R = R U {-co}, a®b = max{a, b} and a®b = a + b. 1 Technical University in Košice, Faculty of Electrical Engineering and Informatics, Department of Mathematics and Theoretical Informatics, Němcovej 32, 042 00 Košice, Slovak Republic, helena.myskova@tuke.sk 337 The set of all m x n matrices over R is denoted by R(ra, n) and the set of all column n-vectors over R by R(n). Operations © and ® are extended to matrices and vectors in the same way as in the classical algebra. We will consider the ordering < on the sets R(ra, n) and R ( « ) defined as follows: • for A , C e R ( m , n) : A < C if atj < ctj for each i e M and each j e iV, • for x, y e R(n) : x < y if Xj < yy for each j e Af. We will use the monotonicity of ®, which means that the inequalities A < C, B < D imply A ® B < C ® D. For given matrix A e R(ra, n) and vector b e R(m) a system of linear max-plus equation can be written in the form A®x = b. (2) We define the vector x*(A, b), called a principal solution of (2) as follows: x*AA, b) = mm{bi - a i 7 } , j e N. (3) J i€M The main significance of the principal solution is that (2) is solvable if and only if x*(A, b) is its solution. Le us replace a vector b in (2) by a matrix B e R ( m , r) and an unknown vectorx by an unknown matrix X e R(n, r). We get the matrix equation of the form A ® X = B. (4) It is easy to see that (4) is equivalent to the following r equations: A ® Xk = Bk for each k e R, where Xk (Bk) is the k-th column of X (B). Consequently the principal solution of (4) is the matrix X*(A, B) e R(n, r) with elements x*jk(A,B)=x*(A,Bk). (5) Theorem 1. [6] Let A e R ( m , n) and B e R ( m , r) be given. (i) IfA®X = BforX€ R ( « , r), then X < X*(A, B). in) A®X*(A,B) < B. i\\\)The matrix equation A ® X = B is solvable if and only if the matrix X*(A, B) is its solution. L e m m a 2. [6] Let A , A ( 1 ) , A ( 2 ) e R ( m , n) and B, B ( l \ B{2) e R ( m , r). The following assertions hold: (i) If A{ V < A(l ~> thenX*(A(l \B) < X*(A(2 \B). (ii) I f 5 W < B( V t h e n X * ( A , B W ) < X * ( A , f i ( 2 ) ) . L e m m a 3. [6] Let A^f A^f B^f B^ be matrices of compatible sizes. The system of matrix inequalities of the form A(l ~> ® X < B ( l \ A(2 ~> ®X> B^ is solvable if and only if A ( 2 ) ®X*(A(r KB(l) ) > B(2) . (6) 3 Weak solvability of interval matrix equations A certain disadvantage of the necessary and sufficient condition for the solvability of (4) given in Theorem 1 (iii) stems from the fact that it only indicates the existence or non-existence of the solution but does not indicate any action to be taken to increase the degree of solvability. However, it happens quite often in modelling real situations that the obtained system turns out to be unsolvable. One of possible methods of restoring the solvability is to replace the exact input values by intervals of possible values. In practice, the travelling times aij may depend on external conditions, so they are from intervals of possible values. Due to this fact, we will require the transportation times from P : to Dk to be from a given intervals of possible values. Similarly to [3, 4, 9] we define A = [A, A] = | A e R ( m , n); A < A < A j and B = [B, B] = jB e R ( m , r); B < B < B j . Denote by A ® X - B (7) the set of all matrix equations of the form (4) such that A € A and B e B. We call equation (7) an interval max-plus matrix equation. 338 Definition 1. Interval matrix equation (7) is weakly solvable i f there exist B e B and A € A such that the equation A <_ X = B is solvable. Theorem 4. If an interval system (2) j's weakly solvable then A®X*(A,B)>B. (8) Proof. Suppose that inequality (8) is not satisfied. Then, according to Lemma 3, for each X e R ( « , r) at least one of the inequalities A® X < B, A ®X > B_'\s not satisfied. Let X e R ( n , r) be arbitrary but fixed. In the first case, there exists i e M, k e R such that [A ® > bik- Then for each A e A and for each B € B we have [A ® X ] > so A &> X £ B. In the second case, there exists i e M , e >? such that [A ® X ] ^ < . Then for each A e A and for each B € B we have [A X] i k < bik, so A ® X £ B. Hence there is no X e R(ra, r) such that A ® X e Z? for some A e A , so A ® X = Z? is not weakly solvable. • Theorem 4 gives a necessary, but not sufficient condition for the weak solvability. The opposite implication does not hold because of the multiplication of two matrices with all sizes greater than one is not continuous. This is why to check for weak solvability we shall use the procedure which finds the matrices A and B such that the matrix equation A ® X - B is solvable. In this paper we will deal with a special case of interval matrix equations with constant right-hand side B - B - B. If any of the equations A ® X*(A, B) — B or A ® X*(A, B) — B apply, (7) is weakly solvable. So it only makes sense to deal with the case when A ® X*(A, B) + B and A X* (A, B) +B. We shall create the sequence of matrices {A^}J=I such that A ^ is such that the matrix equation A ^ ® X — B is solvable in the case that (7) is weakly solvable. In which follows we show the procedure of creating the sequence M a n d a n auxiliary sequence {C^}^. Let us denote = A. Let I > 1. If C ^ - 1 ^ is known, define the matrices A^) (Bk) for each k e R as follows: a g - t o ) = m i n f e - J C ^ C ' ' - 1 ' , ( 9 ) L e m m a 5. Let A ® X - B be an interval matrix equation with a constant right-hand side. (i) x*{A{l \Bk),Bk) -x*(C(l -l \Bk)fovanyk € R. (ii) Let A € A be such that A > C ( / _ 1 ) . Thenx*(A,Z?f c ) = x * ( C ( / _ 1 ) , Z?fc) i f and only i f A < A ^ ( B f c ) . Proq/: (i): Let j € N, k € Rbe arbitrary, but fixed. Let Mi - {i e M : bik - x * . ( C ( / _ 1 ) , Bk) < a y } . We obtain x ! - ( A ( / ) , B f c ) = m i n f e - a ^ ( B f c ) } = m i n { m i n { ^ - bik + x*AC{l ~X) ,Bk)}, mm{bik -atJ}} = bik because of Mi + 0 and bik - a u > x * . ( C ( / _ 1 ) , Bk) for any i i M\. J J (ii): Suppose that A $ A^>(Bk). Then there exist i e M, j e such that a y > - X y ( C ( / _ 1 ) , Z?fc) which implies x 1 ( C ^ / _ 1 \ B y t ) > bik - aij > x*.(A, Bk). For the converse implication suppose that x*(A, Bk) < x*.(C^l ~v> ,Bk) for some j e N. Then there exists i e M such that bik - aij < x*.(C^l ~l '> , Bk). Then aij > bik -x*AC^l ~v> , Bk)) = a^iBk). • For a given k e R, A^ (Bk) represents the greatest matrix with the principal solution corresponding to Bk equal to X*(C(l ~l \ Bk). In general, for k + t we have A(Bk) + A(Bt). To find the greatest matrix A e A such that X*(A, B) = X*(C(l \B), we define the matrix A ( / ) as follows: A ( / ) = mm A(l) (Bk) (10) k€R L e m m a 6. Let A e A be such that A > C ( / _ 1 ) . T h e n X * ( A , B ) = X*(C{l) ,B) if and only if A < A{M f Proof. The equality X*(A, B) = X*(C(l) ,B) is equivalent to x*(A, Bk) = x*(C(l \ Bk) for each k e R. According to Lemma 5 it is equivalent to A < A(l+l ^(Bk) for each k e R or equivalently A < min A(l+l ^(Bk) = A(l+l K • k€R 339 Suppose that for a fixed I e N the equality ® X*(A< -1 \ B) — B is not satisfied. Denote U{1) = {(i,k) zMxR: [ A « ® X * ( A « \ B ) ] a < bik). Let (r,t) € U(l \ We shall give the procedure of creating the matrix Cil) such that [Cil) ® X*(C(l \ B)]rt = br t and [A{1) ® X*(A{1 \ B)]ik = bi k implies [C(l) ® X*(C(l \ B)]ik = bi k i f such matrix exists. Let u € Nbe arbitrary but fixed. Define the matrix as follows: l J I af) otherwise. IJ min {bu - x*j{A{l) , B,), aij} for i = r, j = u We can see that x'AA®,Bk) = x*.(C(l) ,Bk) for each k e R, j e N - {u}, but x* ( A ^ , fifc) * x*u{C{l) ,Bk) in general. Denote R® = {k € R:x*u{A{l \Bk)=x*u{C{l \Bk)} L e m m a 7. t e R^' i f and only i f br,-x*u{A{l \Bt) < ar2(Bt). Proof. Since x*(A(l \Bt) = x * ( C ( / _ 1 ) , B,), theequalityx* (A(l \ Bt) = x* ( C ( / ) , Bt) is equivalent to x * ( C ( / _ 1 ) , Bt) = x*(C(l \Bt). According to Lemma 5, this is equivalent to < A^(Bt). For + (r,u) the inequality cf} = af} < af}(Bt) trivially holds and therefore it is sufficient for the inequality br, - x*(A^l \ Bt) < ar l l{Bt) IJ IJ IJ to apply. Denote Af/° (Bk) = {j e N : x*-(A^,Bk) = bik-af)}. It is easy to see that [ A ^ ®X*{A{l \B)}ik = bi k if and only if Mf] (Bk) * 0. L e m m a 8. Suppose that (r, t) e £ / ^ a n d there exists u e N which satisfies the following conditions: • t € R( J\ i . e., brt-x*u{A^,Bt) < ar l l{Bt) • iffe e flissuchthatfc i R( u l) thenforeachi e M,i + r the inequality M{ p (Bk) + 0 implies M . W (Bk)-{u) * 0 Then the following is true: (i) [ C « ®X*(C«\B)]rt = brt; (ii) [ A « ® X*(A^,B)]ik = ftiJt implies [ C « ® X * ( C « , = ft,*; (iii) [ C ( / ) ®X*(C{l \B)]rk = br k for each k e /?, Jt g R®; Proof, (i) [ C ( , ) ® X * ( C < z \ f l ) ] r , =max{c ' +x* ( C « fl)} > c # + J £ ( C « fl,) brt -x*u(A{l \Bt) +x*u(C{l \Bt) = ftrt -x'(A®,Bt) + x*u(A«\ Bt) = ftrt Together with the inequality [ C ( / ) ® X*(C(l) , B)]r, < br t we obtain [C(l) ® X*(C(l) , B)]rt = br t . (ii) If [ A « ® X * ( A « , B ) ] ! f c = i i J t a n d X * ( A « , B f c ) = X * ( C « , B k ) , then cj[» + x*.(C« B*) = af] + x * ( A « , B k ) holds for each j € N and the assertion trivially follows. If x*u{C(l) ,Bk) < x*u(A(/) , then the assumption A f . ( / ) ( B k ) - { M } # 0 implies that there exists j € N, j ± u such that X y ( A ^ \ B^) = fejfc - which implies cg> + x } ( C ( / ) , B f c ) = flW + x } ( A « , B ( t ) = i i t because of x*.(C( / ) , Bf c ) = x } ( A W , (iii) For k i RiP we obtain x*(C(l \ Bk) = br k - cru which implies [ C ( / ) ® X*(C{1 \ B)]rk = max{c^ + x } ( C ( / ) , B , ) } > + x : ( C ( / ) , Bk) = cS + ferfc - 42 - £r f c . Similarly as in (i) we obtain [ C ( / ) ® X*(C(l \ B)]rk = br k . • 340 Lemma 8 provides an algorithm for finding a matrix A € A such that A ® X = B is solvable, as you can see in the following example. Example 2. Let us have ( [9,10] [4,10] [6,7] ^ 1 10 7 13 \ A = [8,15] [2,2] [8,10] B = 12 9 12 { [6,8] [2,12] [6,7] J v 13 6 10 ) We use the procedure based on Lemma 8 to decide about the weak solvability of an interval matrix equation A ®X = B. We can easily show that A ® X*(A, B) + B and A ® X*(A, B) + B. We are looking for the matrix A € A such that A ® X*(A, B) = B. For C ( 0 ) = A we have / 9 4 / 1 - 2 4 ^ 110 7 13 \ C ( 0 ) ® X * ( C ( 0 ) , B ) = 8 2 8 ® 6 3 8 = 12 8 12 v 6 2 6} \ 4 0 4 ) { 10 6 1 0 / We compute the matrix A ^ \ the greatest matrix with the principal matrix solution equal to X*(C^°\ B). B y formula (9) and (10) we obtain / 9 4 6 ^ 19 4 7 ^ (9 5 7 ^ f9 4 6 \ A ( 1 ) ( Z ? i ) = 11 2 8 , A(1 HB2) = 11 2 9 , A ( 1 ) ( f i 3 ) = 8 2 8 , A W = 8 2 8 i 8 7 7) { 8 3 6} I 6 2 6 j I 6 2 6 / We check whether the equality A ^ X*(A(L \ B) - B is satisfied. We obtain A ( L ) ® X * { A ( 1 \ B ) = ( 1 0 7 13 \ 12 8 12 \ 10 6 10 , We have f / ( 1 ) ) = {(2,2), (3,1)}. Let us take r = 2, t = 2. For u = 3, we have • t € 7V3 ( 1 ) = {2}, i.e., b22 - x*3(A(l \B2) = 9 - 0 = 9 = a^(B3) • Fork = 1 w e h a v e M j ( 1 ) ( 5 i ) = {1,2,3}, so M j ( 1 ) ( f l , ) - {3} # 0. ( M 3 ( 1 ) ( f i i ) = 0) For k = 3 w e h a v e M j ( 1 ) ( 5 3 ) = { l } , s o Mj( 1 ) (fl3 ) - {3} # 0 , M 3 ( 1 ) ( f i 3 ) = {1,2,3}, s o M 3 ( 1 ) ( B 3 ) - {3} * 0. Since the assumptions of Lemma 8 are satisfied, we can compute the matrix and verify that the assertion of this Lemma is true. We obtain 1 9 4 6 ^ / 1 - 2 4 ^ ( 1 0 7 13 \ C ( 1 ) ®X*(C([) ,B) = 8 2 9 ® 6 3 8 = 12 9 12 v 6 2 6) 13 0 3 ) { 9 6 1 0 / We compute the matrices A^ (Bk) for k = 1, 2, 3 by formula (9) and the matrix A^2 ) by (10). We obtain / 9 4 7 ^ f 9 4 7 ^ f9 5 7 ^ f9 4 7 \ A ( 2 ) ( B 0 = 11 2 9 , A ( 2 ) ( B 2 ) = 11 2 9 , A<2 >(fl3) = 8 2 9 , A<2 > = 8 2 9 i 8 7 7 / { 8 3 6 / I 6 2 7 / I 6 2 6 / We check whether the equality A(2 ^ ® X*(A(2 \ B) - B is satisfied. We obtain / 10 7 13 \ A ( 2 ) ® X * ( A ( 2 ) , B ) = 12 9 12 . K 9 6 1 0 , Since U{2) = {(3,1)} we have r = 3, f = 1. For M = 2 we the following assertions hold: 341 • = {1}, i.e., fe3i - x * ( A ( 2 ) , B i ) = 1 3 - 6 = 7 = a £ ( B 0 • For k = 2 we have M(2) (B2) = {1,2}, so M{2) (B2) - {2} * 0, A f f ) (fl2) = {3}, so M 2 ( 2 ) (fl2 ) - {2} # 0. For k = 3 we have Mj( 1 ) (fl3 ) = {1}, so M j ( 2 ) ( f l 3 ) - {2} * 0, M{2) (B3) = {1,3}, so M{2) (B^) - {2} * 0. Since the assumptions of Lemma 8 are satisfied, we can compute the matrix and verify that the assertion of this Lemma is true. We obtain / 9 4 M / 1 - 2 4 ^ f1 0 7 13 \ C ( 2 ) ®X*{C{2) ,B) = 8 2 9 ® 6 - 1 3 = 12 9 12 v 6 7 6 1 v 3 0 3 ) i 13 6 1 0 / Since <8> X* (C^2 \ B) = B , it is not necessary to compute the matrix A^K There exists a matrix A, namely A = such that A ® X*(A, B) = B, so the given interval matrix equation is weakly solvable. Remark 1. The problem is that the choice of the order of the elements in which we achieve equality is currently chaotic. It is possible that for a given set the above procedure does not lead to a solution even though a solution exists. Then it is necessary to go back one step to the set U^l ~^ and select another element (r,t) in which we achieve equality. Thus, the algorithm is generally exponential. 3.1 Conclusion Max-plus algebra is a useful tool for describing real situation in the traffic, economy and industry. In this paper, we dealt with the solvability of interval matrix equations in max-plus algebra. Returning to Example 1, the weak solvability of interval matrix equation with constant right-hand side means that for given total transportation times bit there exist possible times a; ,• from the given intervals for which the total transportation time bit can be achieved. In Example 1, the values aij,Xji represent the transportation times. Another possibility is the use of interval matrix equations in economics as follows. Suppose the given selling price is bit of the product P ; in the store 7*. We are looking for a price aij from the given interval, at which the manufacturer will offer the product Pi to the intermediary Qj, so that it is possible to reach the required selling price after adding the margin of the broker xjk. In practice, the selling prices of b^ are also from certain intervals rather than fixed numbers. Therefore, it would be appropriate if the matrix B were also interval. The study of weak solvability for the case that the right side of the equation is an interval matrix is our main goal for the future. Another challenge is to modify the algorithm so that it is polynomial. References [1] Cuninghame-Green, R. A . (1979). Minimax Algebra. Lecture notes in Economics and Mathematical systems. Berlin: Springer. [2] Gavalec, M . & Plavka, J. (2010). Monotone interval eigenproblem in max-plus algebra. Kybernetika, 46, 387-396. [3] Myšková, H . (2005). Interval systems of max-separable linear equations. Lin. Algebra Appi, 403, 263-272. [4] Myšková, H . (2006). Control solvability of interval systems of max-separable linear equations. Lin. Algebra Appi, 416, 215-223. [5] Myšková, H . (2012). On an algorithm for testing T4 solvability of fuzzy interval systems. Kybernetika, 48, 924-938. [6] Myšková, H . (2016). Interval max-plus matrix equations. Lin. Algebra Appi, 492, 111-127. [7] Myšková, H . (2018). Universal solvability of interval max-plus matrix equations. Discrete Appi. Math., 239, 165-173. [8] Olsder, G . J. et al. (1998). Course Notes: Max-algebra Aproach to Discrete Event Systems. Algebres Max-Plus et Applications an Informatique et Automatique, INRIA, 1998, 147-196. [9] Plavka, J. (2012): On the <9(«3 ) algorithm for checking the strong robustness of interval fuzzy matrices. Discrete Appi. Math. 160,640-647. [10] Zimmermann, K . (1976). Extremální algebra. Praha: Ekonomicko-matematická laboratoř Ekonomického ústavu ČSAV. 342 The Impact of Technical Analysis and Stochastic Dominance Rules in Portfolio Process David Neděla1 Abstract. During the last decades, modern portfolio theory has become one of the most applied portfolio approaches by investors. However, this theory can be regarded as a pillar from which in recent years has been derived and adapted a large number of portfolio approaches. The possible approach is to combine a general portfolio model with the discipline of the financial area to find a more suitable strategy for the investment process. This paper is focused on the impact analysis of several technical analysis indicators and stochastic dominance approach in the portfolio formation process in various markets during different time horizons capturing different market conditions. Two strategies of implementation, technical analysis rules and stochastic dominance rule, in the portfolio creation process are considered. Strategy 1 aims at eliminating the whole market systemic risk with the alternative of investing in a risk-free asset. In contrast, the second strategy focuses on the use of assets meeting particular rules. It was evident from the results that using strategy 1 to eliminate systemic risk during the crisis reduced the risk of the portfolio with similar profitability. Oppositely, strategy 2 is more effective in a period with a growing global economy. Keywords: technical analysis, portfolio model, moving average, stochastic dominance J E L Classification: G i l , G15, G20 A M S Classification: 90C05, 90C39 1 Introduction Modern portfolio theory has become an essential approach in portfolio decision process, see [8]. However, this theory can be regarded as a pillar from which in recent years has been derived and adapted a large number of similar portfolio models. Investors are not guided in the investment strategy only by the general portfolio model, see [7]. Feasible approach can be a combination of a particular portfolio model with other disciplines of the financial area to find a more suitable strategy for investment making. In the literature can be found the possibility of including technical analysis rules or stochastic dominance (SD) rules in the portfolio theory, see [5], [9], or [12]. Some researchers dealt with the question of the advantage of the technical analysis rules for predicting the future asset price development. The moving average and indicators derived therefrom (alarms) are typical and most used rules, see [6]. However, Taylor [17] does not consider it appropriate to apply technical analysis rules to predict future price evolution. In contrast, some authors take the opposite view, see [6] or [7]. In portfolio optimization, two methods are frequently used for modeling the choice among uncertain prospects: SD and mean-risk approaches, see [12]. The advantage of the SD approach is mentioned as well in [12]. In the portfolio process, the SD of the first three orders play a crucial role, due to the relation of different investor utility functions. In [9], the SD approach is applied in the asset selection process. The paper aims at analyzing the impact of technical indicator rules and the stochastic dominance rule in the portfolio decision process in different markets during two investment horizons capturing different market conditions. The motivation for this analysis is to examine the impact of rules from another financial area contained in the portfolio creation strategy. The whole paper is divided into 5 sections. The introduction and the structure of the paper are described in Section 1. In Section 2, a description of technical analysis indicators and SD method is provided. Characterization of performance measures and portfolio models are introduced in Section 3. To verify the efficiency of the applied approaches, the empirical analysis is provided in Section 4, and in Section 5, the whole paper is concluded. 2 Technical Indicators and Stochastic Dominance Trading Rules We can distinguish a large number of technical analysis indicators. Therefore, only 4 frequently applied indicators by practitioners are selected, see [1], [6], or [10]. The first one is the moving average ( M A ) , which is a general and 1 VŠB - Technical University of Ostrava, Department of Finance, Sokolská tř. 33, Ostrava 70200, Czech Republic, david.nedela@vsb.cz 343 often used indicator in technical analysis, see [6] or [10]. The equation of MAj n(x) calculation is following: MAT,n(x) = lM*±t (1) n where xj represents the asset price at time T and n is the selected length of the analyzed period. The second selected indicator is the exponential moving average ( E M A ) . This indicator is based on a weighted moving average where the highest importance is placed on actual prices. When the E M A is calculated for the first time, the initial value of E M A is the simple M A calculated by formula (1). The formula for calculating E M AT ,n(x) for the weighted factor k = ^ is as follows: ™ T , „ ( i ) = EMAT-Un(x) + k - [ x T - EMAT-Un(x)]. (2) One of the simplest momentum indicators is the moving average convergence divergence ( M A C D ) . The M A C D equation is defined by subtracting the long-term E M A (Af periods) from the short-term E M A (« periods). Mathematical expression is MACDT,N,N(x) = EMAT ,n(x)-EMAT ,N(X), where EMAT ,n(x) is calculated by equation (2). In the decision process, a signal curve (usually EMAT^(x)) or the zero horizontal line are needed. The relative strength index (RSI) is the last selected technical analysis indicator. Firstly, an up change U C and a down change D C are calculated for each day, see [1]. We can define a relative strength (RS) as the ratio between the n-day E M A of the U C time series and the n-day E M A of the D C time series. Usually, 14 or 9 days E M A are considered in practise. The RS is calculated as RS = §^x7?ic7' w n e r e EMAn(x) is calculated by equation (2). Therefore, the RSI is calculated as RSI = 100 1 0 0 (l+RS) • Consequently, the signals for investing (buy/hold and sell) are defined by simple rules. If M A is considered as a technical analysis indicator, the signals are defined as following: , ( MAT,„(x) > MAT,N(X) for buy signal Signal < (3) I MAT,n(x) < MAT,N(X) for sell signal, where n and N are selected lengths of particular time periods. Instead of M A , the E M A , defined by equation (2), can be substituted. When M A C D is considered, the signal rules are slightly different. If the signal curve is represented by EMAT<)(X), the formulation is as follows: f MACDT N N(X) > EMAT <){X) A MACDT-i n N(X) < EMAT-\ $(x) for buy signal Signal \ I MACDT,„,N(X) < EMATfi(x) A MACDT-I,„,N(X) > EMAT-it9(x) for sell signal. (4) Finally, the RSI rules are defined as: RSI(x) e (0,65> for buy signal Signal < (5) I otherwise for sell signal. SD allows to compare different random variables such as asset returns or different risk indicators by their distribution function, see [ 12]. S D rules of the order of one to four are particularly interesting because they cumulatively impose standard assumptions of risk aversion, prudence, and restraint, which are necessary conditions for standard risk aversion, see [4]. Application of S D in portfolio process is provided in [2], The F S D can be defined as followed: a random variable A first order stochastically dominates a random variable B, written A >FSD B,if for any z applies Pr(A > z) > Pr(B > z), where Vz e R and there is at least one z for which a strong inequality applies. Let define the cumulative distribution function as FA(X) = J* f{z)dz. Given the previous expression, a random variable A second order stochastically dominates a random variable B, written A >SSD B, i f for any z applies FA{z)dz< / FB(z)dz, (6) J — CO where there is at least one z for which a strong inequality applies. Asset selection rule based on S D approach is defined as: assets that dominate at least one asset by formula (6) and concurrently are not dominated by other assets, are selected. Mathematically, the task can be expressed as: Iif Xj >SSD x; A xt is S S D non-dominated, where i + j for buy signal (7) otherwise for sell signal. 344 3 Portfolio Performance and Models For measuring portfolio performance, several approaches can be used.The compiled portfolio can be assessed not only on the return basis but also the risk associated with the investment must be considered. Let x = [x\, x2, • • •, xz] is a vector of asset weights and r = [r\, r 2 , . . . , rz] is a vector of gross returns, the expected return of the portfolio is an equal weighted average of the asset's expected return formulated as E(x'r). Variance of a portfolio cr2 is defined Tas x'Qx, where Q is a covariance matrix of assets. The portfolio standard deviation is determined as a}, where X is a random return variable and Fx is its distribution function, then Fx (±i) — P r ( X < /j),see[14]. The equation for CVaRa(X) calculation is CVaRa(X) = ± VaRa(X)da. The performance ratios are more appropriate for examination, due to combining the asset's excess expected return and its risk. Sharpe ratio (SR) is one of the most used ratios, see [7] or [15]. The calculation is SR = E( ~xr T {\ (x'Qx) 2 where rf is riskless return (or benchmark return). Subsequent ratio used to measure portfolio performance is Rachev Ratio (RR), see [13]. The calculation is RR - c v a ^ ( x ^ r - " ) • Sortino Ratio (SoR) is a modified variation of SR, where the standard deviation as a measure of risk is substituted with the downward or negative deviation, see [16]. It is formally defined as follows: E(x'r - rf) SoR = (8) E((Tf-*r)iy where the function ( T ) 2 = (raax(r,0))2 . Another variation is to measure the portfolio excess return with the maximum drawdown, see [18]. This ratio is called Calmar Ratio (CalR) and the calculation is as follows: CalR = i / ; ; (9) m a x , = i ( . . . ( T ddt (x'r) where ddt(x'r) - m a x s = i > . . . > , ws (x'r) - wt(x'r) provided that for the calculation of ws (x'r) = _^=i x ' r s ~r ftIn the portfolio process, several approaches can be distinguished and applied. Risk minimization model is one of many. A portfolio with a minimum of risk can be defined as the following optimization problem: minl F(x'r) x'e = 1 (10) Xi > 0; i — 1 , z , where ^(x'r) is a risk indicator (such as cr2 , VaRp, or CVaRp) and e is an z-column unit vector with all values being equal to one. Due to many performance indicators of the portfolio are identified and compared, it is as well possible to optimize the portfolio based on these indicators, see [13]. In general, this problem can be written by the following equation: max r(x'r) x ' e = l (11) Xi > 0; i — 1 , z , where T(x'r) indicates one of the performance measures (such as SR, RR, STARR, SorR, or CR). 4 Data Description and Empirical Application of Selected Approaches For the empirical analysis, the data set, containing daily adjusted close prices of stocks included in the major world indices, is used. Specifically, the set of indices is contained by F T S E 100 and N A S D A Q - 1 0 0 indices traded on the U K and U S stock markets. Since the comparison is performed at different time periods, two periods are selected, containing both the crisis period (2005-2010) and the "boom" period (2014-2019).The investment itself starts in the second year due to decision-making on a one-year time series. The risk-free rate, needed for the calculation of performance measures or as an alternative investment instrument, 10 year government bond returns is used. In all indices, several stock time series are not included in this analysis due to the incomplete data in the analysis period. 345 The applicability of alarm portfolio approaches is motivated by the work of [3], [5], or [6], in which the authors mention several ways how to use the alarm in the decision-making and investment process. Therefore, the whole empirical process can be divided into several steps. In the first step, the supplementary matrix Mij during the investment period T is defined. The number of columns in the matrix corresponds to the number of particular assets X j , where i e (1, z). The matrix data are found throughout the investment horizon according to the alarm rules of the technical indicators and S S D by equations (3), (4), (5), and (7). Due to the requirement of two length periods that are necessary in M A and E M A calculation, several combinations of (n, N) = (5,100),(5,150),(10,100) are considered. The recommended values (12,26,9) are used when the M A C D indicator is applied and only nondominated assets are selected by SSD approach. The values obtained in the matrix are calculated as follows: II if buy alarm in time t applies3 V V (12) 0 if sell alarm in time t applies . After assembling the matrix, we can proceed to the second step, where two basic strategies for using the alarm are distinguished. If applying strategy 1 (SI), the alarm rules are used to predict the interval of systemic risk during the investment horizon, see [3]. If the proportion of assets satisfying rule Mj - — = — > to, where to is the threshold parameter of systemic risk, it is assumed that the probability of systemic risk loss in the market is low. Otherwise, the investment in a risk-free asset is preferred. In [5] is mentioned, the amount for the threshold value to depends on the decision-maker, but in this project is set to a value of 25% for technical analysis rules and 11% for SSD rule. Strategy 2 (S2) differs from the previous one in that the investor should only invest in specific assets that meet the rules in a given interval. The rules that must be met are the same as for S I . Furthermore, the asset weights in a portfolio are determined according to the selected portfolio models, where the weights are calculated based on one-year rolling window. Finally, the portfolio value and performance indicators are calculated. For the purposes of the analysis, a 20 days re-balancing interval is considered. The initial value of all investments in the portfolio Wo is set as 1 currency unit. It is assumed that the value of the particular weight x : > 0 and xt e (0,1), therefore a short sale of shares and weight limitation are not assumed. Two approaches were used to calculate the portfolio weights: the minimum risk approach defined by equation (10) and the maximum performance ratio approach defined by equation (11). Due to the suitability of particular models in the individual markets analyzed in previous research [11], specific portfolio models vary depending on the market. The particular results are depicted in Tables 1 and 2. From the results in Table 1, it is evident that using S I , which reduces the systemic risk, has a positive impact mainly in the period with crisis. While using technical analysis rules, the level of risk expressed by cr or VaR in the minimization risk model is lower compared to the general portfolio model in both markets. With the right choice of technical indicator, the profitability of the portfolio could have been higher. In the situation where the SSD alarm is considered in min risk model, the level of risk is essentially the same, but higher profitability is generated with a higher value of W, SR, or RR. The same conclusion is evident for the max performance ratio portfolio models. However, only a few technical indicators are suitable for applying, but it is not possible generally determining individual ones. When analyzing the min risk model during the period 2015-2019, the W of portfolios and overall portfolio performance (SR, RR) are slightly higher, however, this conclusion does not apply to SSD. Due to the properties of the model, a rapidly higher profitability cannot be expected. When examining the effect on cr or VaR, there is no significant decrease compared to the previous period. Regarding the combination of the S1 with the maximizing performance ratio model, this combination does not generate sufficient results compared to the general model, both in terms of return and risk. Generally, it can be concluded that the application of S1 is more suitable mainly during an economic crisis due to the risk-free investment at the time of the market downturn. When we focus on the results of S2 in Table 2, the conclusions are different. Oppositely to the previous strategy, results do not show the advantage of using S2 in a situation, where there was a crisis in part of the investment period, suggesting that the alarm strategy implementation is not always beneficial but could be inefficient likewise. The previous ones apply mainly for max performance ratio models. However, when the min risk model is selected especially on U K market, the slight advantage of trading rules can be achieved when comparing SR or RR. It can be determined that this mainly meets the S S D rule or E M A indicator. Looking at the second investment period, there is already reflected positive impact of portfolio. In general, the SSD rule is not appropriate for the min risk model, but according to the second portfolio approach, the results of W, E(R), or SR are the preferable. Otherwise, if the investor in U S market decides to apply the rules of technical analysis, an improvement in return and performance is visible for most portfolios. Although the same is not true for the U K market, where the ratio of more favourable portfolios to less favourable ones is lower, even more profitable portfolios can be seen than using the general model. Generally, more profitable portfolios in periods with economic growth are achieved by applying this strategy. Therefore, it can be summarized that the suitability of strategies for particular periods is different. 346 2006-2010 2015-2019 US Market Alarm \ W | E(R) \ IT \ VaR \ SR \ RR \ Alarm \ W \ E(R) \ IT \ VaR \ SR \ RR minCVaR Model MA(1,50) 1.6459 0.0004 0.0085 0.0133 0.0276 0.9328 M A ( 10,200) 1.4333 0.0003 0.0066 0.0115 0.0280 0.8419 M A ( 10,50) 1.3492 0.0002 0.0079 0.0128 0.0124 0.8943 M A ( 15,200) 1.4866 0.0003 0.0068 0.0117 0.0310 0.8564 MA(15,200) 1.7829 0.0004 0.0080 0.0125 0.0354 0.9297 EMA(5,50) 1.5965 0.0003 0.0061 0.0105 0.0415 0.8909 E MA( 1,50) 2.0860 0.0005 0.0078 0.0125 0.0497 1.0324 EMA(15,200) 1.5529 0.0003 0.0067 0.0115 0.0358 0.8425 M A C D 2.0918 0.0005 0.0085 0.0126 0.0465 1.0442 M A C D 1.575 : 0.0003 0.0053 0.0089 0.0456 0.8635 RSI 1.9823 0.0005 0.0108 0.0154 0.0351 0.9933 RSI 1.5947 0.0003 0.0068 0.0117 0.0377 0.8597 SSD 1.9059 0.0005 0.0107 0.0151 0.0330 0.9989 SSD 1.4207 0.0003 0.0063 0.0107 0.0282 0.8481 General 1.7704 0.0004 0.0108 0.0153 0.0196 0.9966 General 1.5493 0.0003 0.0068 0.0116 0.0351 0.8477 maxCalR Model MA(1,50) 1.8956 0.0006 0.0196 0.0299 0.0248 1.0206 M A ( 10,200) 3.2600 0.0009 0.0181 0.0286 0.0483 1.0183 M A ( 10,50) 1.8729 0.0006 0.0170 0.0268 0.0252 1.0469 MAC 15,200) 3.7446 0.0010 0.0183 0.0286 0.0528 1.0400 MA(15,200) 2.2133 0.0007 0.0196 0.0289 0.0300 1.0455 EMA(5,50) 4.3406 0.0011 0.0173 0.0262 0.0607 1.0458 E MA( 1,50) 2.7120 0.0008 0.0188 0.0282 0.0376 1.0800 EMA(15,200) 3.7598 0.0010 0.0182 0.0286 0.0532 1.0279 M A C D 2.2404 0.0007 0.0193 0.0278 0.0306 1.1012 M A C 1.1 3.8886 0.0010 0.0159 0.0205 0.0597 1.0644 RSI 2.2246 0.0008 0.0220 0.0320 0.0291 1.0710 RSI 4.0560 0.0011 0.0183 0.0286 0.0557 1.0400 SSD 2.1959 0.0008 0.0217 0.0319 0.0288 1.0776 SSD 3.2338 0.0009 0.0174 0.0278 0.0492 1.0266 General 2.2345 0.0008 0.0220 0.0324 0.0293 1.0705 General 3.9146 0.0011 0.0183 0.0286 0.0544 1.0401 UK Market Alarm \ W \ E(R) \ cr \ VaR S R R R Alarm \ W \ E(R) \ cr \ VaR | SR | RR minCVaR Model M A ( 1,50) 1.7799 0.0004 0.0088 0.0119 0.0319 1.1637 M A ( 10,200) 2.6677 0.0007 0.0070 0.0101 0.0905 1.2321 M A ( 10,50) 1.6083 0.0003 0.0084 0.0117 0.0251 1.1249 MAC 15,200) 2.6677 0.0007 0.0070 0.0101 0.0905 1.2320 MA(15,200) 1.6554 0.0004 0.0081 0.0114 0.0280 1.0467 EMA(5,50) 2.6553 0.0007 0.0072 0.0103 0.0879 1.2159 E MA( 1,50) 1.5019 0.0003 0.0092 0.0127 0.0188 1.0839 EMA(15,200) 2.7842 0.0007 0.0071 0.0102 0.0932 1.2420 M A C D 1.5921 0.0003 0.0091 0.0124 0.0231 1.1159 M A C D 2.0471 0.0005 0.0066 0.0098 0.0693 1.2149 RSI 1.7066 0.0004 0.01 12 0.015 : 0.0248 1.0736 RSI 2.9291 0.0007 0.0074 0.0105 0.0947 1.2218 SSD 1.9188 0.0005 0.0108 0.0143 0.0324 1.0920 SSD 1.7110 0.0004 0.0063 0.0093 0.0535 1.1513 General 1.6810 0.0004 0.0111 0.0154 0.0240 1.0758 General 2.7720 0.0007 0.0077 0.0105 0.0919 1.1564 maxSoR Model M A ( 1,50) 10.3996 0.0018 0.0246 0.0191 0.0669 1.7206 M A ( 10,200) 3.8595 0.0009 0.0107 0.0147 0.0853 1.2042 M A ( 10,50) 8.6427 0.0016 0.0231 0.0190 0.0648 1.6401 MAC 15,200) 3.8595 0.0009 0.0107 0.0147 0.0853 1.2042 MA(15,200) 5.3451 0.0013 0.0223 0.0176 0.0521 1.5045 EMA(5,50) 4.0015 0.0010 0.0109 0.0151 0.0863 1.1914 E MA( 1,50) 6.8374 0.0015 0.0248 0.0204 0.0555 1.6148 EMA(I5,200) 3.9617 0.0010 0.0109 0.0154 0.0853 1.1862 M A C D 7.7545 0.0016 0.0247 0.0193 0.0590 1.6269 M A C D 2.8419 0.0007 0.0103 0.0142 0.0686 1.1713 RSI 8.6987 0.0017 0.0257 0.0216 0.0606 1.57 11 RSI 4.1371 0.0010 0.0113 0.0158 0.0855 1.1502 SSD 9.8185 0.0018 0.0255 0.0213 0.0640 1.6119 SSD 2.6883 0.0007 0.0097 0.0127 0.0682 1.1421 General 8.6986 0.0017 0.0257 0.0216 0.0606 1.57 11 General 4.1371 0.0010 0.0113 0.0158 0.0855 1.1502 Table 1 Results of S1 with particular alarm on different markets 2006-2010 | 2015-2019 US Market Alarm \ W \ E(R) \ cr \ VaR | SR \ RR | Alarm \ W \ E(R) \ cr \ VaR | SR | RR minCVaR Model MAC 1,50) 1.5310 0.0004 0.0125 0.0175 0.0183 1.0230 MAC 1,50) 2.0081 0.0005 0.0079 0.0128 0.0529 0.9534 MA(10,150) 1.6377 0.0004 0.0116 0.0169 0.0226 1.0249 MA(5,50) 1.9600 0.0005 0.0074 0.0126 0.0536 0.9489 EMA(1,50) 1.6520 0.0004 0.0104 0.0157 0.0246 0.9739 MAC 15,50) 1.9624 0.0005 0.0072 0.0117 0.0551 0.9325 E M A ( 1,100) 1.7618 0.0005 0.0124 0.0163 0.0258 1.0392 EMA(1,50) 2.1464 0.0005 0.0077 0.0116 0.0598 0.9463 M A C D 1.7388 0.0004 0.0127 0.0175 0.0248 0.9930 M A C D 1.5157 0.0003 0.0076 0.0129 0.0302 0.8434 RSI 1.4515 0.0003 0.0110 0.0157 0.0160 0.9841 RSI 1.7359 0.0004 0.0070 0.0114 0.0448 0.8747 SSD 1.4425 0.0003 0.0108 0.0150 0.0157 0.9361 SSD 1.2513 0.0002 0.0073 0.0116 0.0137 0.8279 General 1.7704 0.0004 0.0108 0.0153 0.0196 0.9966 General 1.5493 0.0003 0.0068 0.0116 0.0351 0.8477 maxCalR Model MAC 1,50) 1.7927 0.0006 0.0211 0.0323 0.0226 1.0325 MAC 1,50) 4.6298 0.0012 0.0167 0.0255 0.0647 1.0810 MA(10,150) 2.3024 0.0008 0.0212 0.0324 0.0305 1.0445 MA(5,50) 4.1012 0.0011 0.0153 0.0240 0.0639 1.0048 EMA(1,50) 1.7935 0.0006 0.0194 0.0305 0.0229 1.0051 MAC 15,50) 4.0073 0.0010 0.0161 0.0244 0.0604 1.0662 E M A ( 1,100) 1.9102 0.0007 0.0217 0.0325 0.0246 1.0245 EMA(1,50) 4.6451 0.0012 0.0173 0.0241 0.0633 1.0383 M A C D 2.8280 0.0009 0.0217 0.0309 0.0365 1.1344 M A C D 3.5350 0.0009 0.0145 0.0235 0.0599 0.9419 RSI 1.1545 0.0003 0.0222 0.0340 0.0095 1.0224 RSI 2.9108 0.0009 0.0184 0.0262 0.0438 0.9379 SSD 1.6047 0.0005 0.0212 0.0344 0.0192 1.0608 SSD 6.8823 0.0015 0.0187 0.0271 0.0738 1.0467 General 2.2345 0.0008 0.0220 0.0324 0.0293 1.0705 General 3.9146 0.0011 0.0183 0.0286 0.0544 1.0401 UK Market Alarm \ W \ E(R) \ cr \ VaR \ SR \ RR \ Alarm \ W \ E(R) \ cr \ VaR \ SR \ RR minCVaR Model MAC 1,50) 1.9604 0.0005 0.0121 0.0155 0.0314 1.0313 MAC 1,50) 3.1694 0.0008 0.0077 0.0108 0.0983 1.1521 MA(10,150) 1.6040 0.0004 0.0127 0.0161 0.0201 1.0042 MA(5,50) 2.6098 0.0007 0.0075 0.0110 0.0839 1.1323 EMA(1,50) 2.1751 0.0006 0.0119 0.0155 0.0374 1.0444 MAC 15,50) 2.3810 0.0006 0.0079 0.0109 0.0718 1.1162 EMA(1,100) 1.6624 0.0004 0.0122 0.0160 0.0223 1.0112 EMA(1,50) 3.4655 0.0008 0.0077 0.0105 0.1064 1.1946 M A C D 1.9751 0.0005 0.0114 0.0172 0.0330 1.0189 M A C D 2.2345 0.0006 0.0072 0.0110 0.0727 1.1458 RSI 1.7127 0.0004 0.0109 0.0154 0.0254 1.0918 RSI 2.7779 0.0007 0.0084 0.0112 0.0802 1.2453 SSD 2.2809 0.0006 0.01 12 0.0142 0.0:17 1.0522 SSD 2.7868 0.0007 0.0086 0.0124 0.0789 1.0786 General 1.6810 0.0004 0.0111 0.0154 0.0240 1.0758 General 2.7720 0.0007 0.0072 0.0105 0.0919 1.1564 maxSoR Model MAC 1,50) 6.3091 0.0015 0.0256 0.0223 0.0527 1.4111 MAC 1,50) 3.7382 0.0009 0.0118 0.0158 0.0763 1.1918 MA(10,150) 6.9036 0.0015 0.0257 0.0231 0.0548 1.3988 MA(5,50) 3.2804 0.0009 0.0117 0.0155 0.0697 1.1668 EMA(1,50) 6.5512 0.0015 0.0254 0.0224 0.0538 1.4406 MAC 15,50) 3.3731 0.0009 0.0119 0.0157 0.0703 1.1565 E M A ( 1,100) 7.1727 0.0016 0.0261 0.0240 0.0553 1.4181 EMA(1,50) 4.0549 0.0010 0.0118 0.0155 0.0813 1.1962 M A C D 3.3 1 1 1 0.0009 0.0180 0.0238 0.0451 1.0928 M A C D 2.1340 0.0006 0.0107 0.0157 0.0490 0.9866 RSI 6.2773 0.0015 0.0245 0.0226 0.0538 1.3953 RSI 3.6436 0.0009 0.0137 0.0161 0.0661 1.1928 SSD 8.9122 0.0017 0.0260 0.0226 0.0608 1.5592 SSD 4.8422 0.0011 0.0125 0.0171 0.0864 1.2018 General 8.6987 0.0017 0.0257 0.0216 0.0606 1.5711 General 4.1371 0.0010 0.0113 0.0158 0.0855 1.1502 Table 2 Results of S2 with particular alarm on different markets 347 5 Conclusion The objection of the project was to analyze the impact of several technical analysis rules and the SD rule included in the portfolio strategy within different world markets during two investment horizons capturing different market conditions. For implementation in the portfolio creation process, two general strategies (SI and S2) were considered, where SI aimed to reduce systemic risk with alternative risk-free investment. In contrast, S2 selected assets that complied with the rules, without the possibility of a risk-free investment. From the results, it was evident that using Strategy 1 with selected technical analysis indicators to find systemic risk during the crisis period helped to reduce the risk of the portfolio with similar level of the profitability. In general, the conclusion of Kouaissah [5], that an alarm rule is a suitable tool for predicting future market risk, can be confirmed for this period. In contrast, Strategy 2 had the opposite effect, meaning an increase of the profitability with an unchanged level of risk in specific situations. In connection with this strategy, sufficient results were achieved with the SSD rule as well. During a period with a growing global economy, the use of strategy 1 for both models was not profitable, which meant that the strategy implementation was not always beneficial but inefficient. Acknowledgements Author greatly acknowledged support through the Czech Science Foundation ( G A C R ) under project GA20-16764S, SGS research project SP2021/15 of V S B - T U Ostrava, and Moravian-Silesian region by the project RRC/02/2020. References [1] Anderson, B . & L i , S. (2015). A n investigation of the relative strength index. Banks and Bank Systems, 10(1), 92-96. [2] Dupačová, J. & Kopa, M . (2014). Robustness of optimal portfolios under risk and stochastic dominance constraints. European Journal of Operational Research, 234(2), 434—441. [3] Giacometti, R., Ortobelli, S. & Tichý, T. (2015). Portfolio selection with uncertainty measures consistent with additive shifts. Prague Economic Papers, 24(1), 3-16. [4] Kimball, M . S . (1993). Standard Risk Aversion. Econometrica, 61(3), 589-611. [5] Kouaissah, N . & Hocine, A . (2020). Forecasting systemic risk in portfolio selection: The role of technical trading rules. Journal of Forecasting, 1-22. [6] Kouaissah, N . , Orlandini, D., Ortobelli, S. & Tichý, T. (2019). Theoretical and practical motivations for the use of the moving average rule in the stock market. IMA Journal of Management Mathematics, 31(1), 117-138. [7] Kouaissah, N . , Ortobelli, S. & Tichý, T. (2018). Portfolio Theory and Conditional Expectations: Selected Models and Applications. SAEI, vol. 59, VŠB-TU Ostrava. [8] Markowitz, H . M . (1952). Portfolio selection. Journal of Finance, 7(1), 77-91. [9] McNamara, J.R. (1998). Portfolio Selection Using Stochastic Dominance Criteria. Decision Sciences, 29(4), 785-801. [10] Mills, T.C. (1997). Technical Analysis and the London Stock Exchange: Testing Trading Rules Using the FT30. International Journal of Finance & Economics, 2(4), 319-331. [11] Neděla, D . (2020). On the Impact of Technical Analysis Rules in Selected Portfolio Approaches Under Different Market Conditions. Subject Project. V S B - T U Ostrava. [12] Ogryczak, W. & Ruszczyňski, A . (2001). On consistency of stochastic dominance and mean-semideviation models. Mathematical Programming, 89(2), 217-232. [13] Rachev, S.T., Stoyanov, S . V & Fabozzi, F.J. (2008). Advanced stochastic models, risk assessment and portfolio optimization: The ideal risk, uncertainty and performance measures. New York: Wiley Finance. [14] Rockafellar, R . T & Uryasev, S.P (2002). Conditional value-at-risk for general loss distributions. Journal of Banking & Finance, 26(7), 1443-1471. [15] Sharpe, W . F (1994). The Sharpe ratio. Journal of Portfolio Management, 21(1), 49-58. [16] Sortino, F A . & Price, L . N . (1994). Performance measurement in a downside risk framework. Journal of Investing, 3(3), 59-65. [17] Taylor, N . (2014). The rise and fall of technical trading rule success. Journal of Banking & Finance, 40(1), 286-302. [18] Young, T.W. (1991). Calmar ratio: A smoother tool. Futures, 20(1), 40. 348 Information Retrieval System for IT Service Desk for Production Line Workers 1 2 Dana Nejedlová , Michal Dostál Abstract. The information retrieval system gets answers to its users' questions. A s a part of an IT service desk available for the workers at the production line it recognizes a spoken request and selects a proper answer or action for this request. Speech recognition done at the beginning of the process and presentation of the selected answer done in the end of the process is solved using freely available modules for speech recognition and text to speech. The subject of the presented research is the process of selecting answers to recognized questions. The software solving this process is done by the authors of this contribution. The main result of the research is the comparison of two approaches of selecting answers to questions, namely decision tree and neural network. The input of both approaches is the set of keywords recognized i n the user's speech. Due to the lack of real data from a real production line, the training set of keywords and correct answers is generated artificially. The result of the research shows that the neural solution is advantageous when the number of possible answers and the number of keywords in questions is high. Keywords: information retrieval, decision tree, neural network J E L Classification: C61, D83 A M S Classification: 68T30 1 Introduction Automatic processing of natural language is becoming more important as more information written or spoken is communicated through the internet and other electronic media. During recent years important improvements in this task have been achieved thanks mainly to the field of deep learning. Some older approaches to solving this task have also proved to meet all requirements imposed by practical usages. The aim of this contribution is to present two types of information retrieval system and compare the traditional approach to making such a system using decision tree and a new solution using neural network. Information retrieval (IR), as defined in [8], is a task of finding information that answers a given query (i.e. question). IR system dealt with in this contribution is intended to be used by production line workers who need help from IT service desk. The user interface of this system uses a microphone and a speech recognition module that outputs query in the form of the sequence of words. This query is processed by decision tree or neural network that select some reaction to the query from the set of possible answers or actions. This selection can also be called a classification of query, and because the query has been converted to text, we can call it text classification task. One of the earliest works about solving text classification by decision trees is by Lewis and Ringuette [6] from 1994. Early works describing solving the same task by neural networks are by Wiener et al. [14] and Schütze et al. [12], both from 1995. The other efficient text classification method based on the theory of probability, Naive Bayes, was published in 1960 by Maron and Kuhns [9]. The queries and the documents to be retrieved in early neural network and probabilistic solutions were represented by vectors which had as many elements as was the size of dictionary, and the values of their elements have been derived from the frequency of each word in the represented document. The process of finding the category for the query involved the dimensionality reduction of the vector space by statistical techniques closely related to Principal Component Analysis (PCA). The way of describing the probability of occurrence of one word on condition that two other words precede it in the text, i.e. the so-called trigram language model, was represented as the table of (size of dictionary)3 values and the typical size of dictionary, i.e. the number of different words in the language, is 100 000. This meant that the system that 1 Technical University of Liberec, Faculty of Economics, Department of Informatics, Studentská 2, 461 17 Liberec, Czech Republic, dana.nejedlova@tul.cz. 2 Technical University of Liberec, Faculty of Economics, Department of Informatics, Studentská 2, 461 17 Liberec, Czech Republic, michal.dostall@tul.cz. 349 predicted the most probable next word in e.g. speech recognition task had to work with a table of 10 values. Youshua Bengio et al. [1] showed in 2003 directions to escape this curse of dimensionality by representing each word by a vector typically not longer than 100 elements which describe its relationship with other words using the so-called distributed representation. These vectors can be trained from texts by neural networks and today they are called w o r d embeddings. Recently, the fast development in the field of deep learning with clusters of computers and large corpora of text and multimedia content found on the internet lead to such combinations of trained word embeddings and neural networks for their processing that enables to work with text and some multimedia in such a way that borders on the level of human understanding. Besides typical tasks of natural language processing, like speech recognition, question answering, translation, semantic analysis, text summarization, text generation, and sentiment analysis, some more specialized applications of contemporary speech processing are developed. Examples of economic applications include training of word embeddings from texts specialized on particular problem domain, e.g. the oil and gas industry [3], customer segmentation based on their queries and descriptions of items they have positively rated from which word embeddings have been trained [2], description of the evolution of understanding of a particular economic concept from literature about this concept from various time epochs [7], text summarization of articles that aids access to medical evidence [11], identification of similar companies based on a corpus of financial news articles [5], and estimation of uncertainty about the financial health of companies from their annual reports [13]. The aspects of use of virtual assistants in various kinds of businesses are discussed in [10]. 2 Data Preparation Both forms of information retrieval system presented in this contribution have the same input and output. The input is a question in the form of a set of keywords. The output is a single answer. The set of keywords is not ordered, and each instance of its member is contained in it only once. A single keyword in real-life application may be either a single word or a phrase. If phrases are included, then each phrase has its own identification number (ID). If distinct words forming a phrase are the same as keywords found elsewhere in the data, then each of them is given another ID. Keywords characterize the answer and in natural language they are accompanied by function words, defined in [8], that carry information about relationships among keywords and are present repeatedly in texts on many different topics. The rule-based system described in Section 2.1 has manually designed a set of keywords which does not contain function words. Data for the neural network, preparation of which is described in Section 2.2, contain function words, and one of the aims of this research is to test whether neural network can correctly classify questions that contain also function words without informing it what words are keywords and what words are function words. 2.1 Rule-Based Expert System The types of keywords detected by the system are grouped by corresponding areas of ITIL (Information Technology Infrastructure Library), by which the IT services are built and managed. The keywords are in lemmatized form, so that the system is able to detect them in all their grammatical forms. To this day there are over 30 keywords that are detected by the system. The set of possible answers develops with the needs of the IT services department and is based on the experiences with communication with the users, i.e. the workers on the production line. 2.2 Neural Network The words or phrases found in real-life application are represented by single letters of English alphabet in the data for the neural network. Due to the lack of real data from a real production line, which is the supposed environment where our information retrieval system should be used, our data are generated artificially according to model (1). A A ( B V C ) => Answer 1 (1) Model (1) reads as "If set of keywords is composed of word A and word B or C, then the output of the system should be A n s w e r I n logic such "if X then Y" rule can be represented by logical operator of implication "X => Y" which can be read as " X implies Y". In this implication X is called antecedent and Y is called consequent. Model (1) can be split into two implications (2) and (3) meaning that a single answer can be retrieved for more than a single set of keywords. 350 AAB => Answer_1 A A C => Answer_1 (2) (3) There is a possibility in real-life data that for a single set of keywords more than one answer is correct, which may be caused by some circumstances like word order, intonation, and processes around production line not captured in recognized words shown in implications (4) and (5): AAB => Answer_1 (4) AAB => Answer 2 (5) Such data cannot be processed without error and the solution of this phenomenon would involve inclusion of more information (e.g. word order) into the input of the information retrieval system. Our artificial data are designed in such a way, so that there are no ambiguities of the type (4) and (5). We have manually designed 52 sets of 1 to 4 keywords leading to 27 different answers. The same keywords are present in different sets and some different sets lead to the same answer. There are 18 keywords in our data represented by letters A to R. Keywords in speech processing should be recognized from all other words. We augment our sets of keywords by zero to 4 different function words which are represented by 8 letters S to Z. In rule-based systems the function words should be discarded from the set of keywords by statistical processing of word frequencies in large text corpora or manually when only a small amount of text is available. Neural network should recognize keywords from functional words when it learns large enough amount of data with mixed keywords and functional words, because it will see some repeating pairs of the same set of keywords and the same answer but no or little number of repeating pairs of the set of functional words and the same answer. A l l possible combinations of zero to 4 function words have been added to each of our 52 sets of keywords with answers. In such a way each set of keywords has acquired 163 instances with different set of additional function words. A l l sets of words that form the antecedent of all 8476 (= 52 x 163) implications are connected by logical operator A N D represented as A in our models (1) to (5). A l l data can be after this augmentation divided into training (52 x 81 implications) set and test set (52 x 82 implications) each containing the same 52 sets of keywords but differing in the additional set of function words. 3 Traditional Solution - Rule-Based Expert System The rule-based expert system is written by the second author of this paper in Python programming language and is using open source libraries for artificial intelligence capabilities such as speech recognition and text-to-speech. This system has a three-layer architecture. There is a presentation, application, and data layer in it. The presentation layer (the front-end part of this application) contains a communication interface with a microphone attached to speech recognizer for user input and text to speech module attached to a speaker for the acoustic output. The application layer of the system processes information provided by the speech recognizer. This layer forms the main part of the application. The logic of the decision tree, the heart of this expert system, is programmed here. The data layer (the back-end part of this application) contains the database with all important information needed for resolving the queries placed by the users once they are classified by the decision tree. The nodes of the decision tree represent the attributes (in our case a keyword or more keywords present in the spoken statement or question). The branches then represent decision rules, which determine what answer should be replied to the user. A s usual, the leaf nodes represent the outcome (in our case the answer). The decision tree is developed manually based on the needed dialogue flow. For example, one branch of the decision tree is executed i f the user input contains a keyword "password". The user is asked for his or her employee ID and other authentication information in the subsequent nodes on this branch. The branches of the decision tree were developed gradually based on the topics that were selected as desired services to be delivered by the system to its users. The question answering process of this expert system begins with calling a special telephone number allocated for this specific use case. The phone call is directed to the communication interface of the system which invites the user to place a question or a request. The expert system then evaluates the input and tests for specific keywords contained in it. This will initiate the execution of the decision tree. Based on the specific keywords, the 351 system chooses a path (branch of the decision tree) and either provides an answer or asks the user a follow-up question. When the system asks a follow-up question, its purpose is mainly to get more precise information. For example, i f the employee asks for a piece of information or a task that needs authentication, the system asks him or her for his or her credentials to ensure that he or she is authorized to get that information or some process (e.g. password change, contact telephone number change, etc.) could be initiated for him or her. 4 Connectionist Solution - Neural Network Deep feedforward neural network with five layers of neurons, i.e. with three hidden layers, has been programmed by the first author of this paper in C language and is freely available together with the analysis of results on the link https://owncloud.cesnet.ez/index.php/s/HV3mQrhrueV2ccl. Neurons in adjacent layers are fully connected forming a matrix of weights which are its parameters. To all but the first layer of neurons one extra connection with a trainable weight leads from a small nonzero constant input forming the so-called bias term which is added to the rest of each neuron's input, and this input is a dot product of the output of preceding layer and weights on the connections from all neurons on the preceding layer to the neuron on the current layer. The output of each neuron on all but the first layer is the result of the activation function. Neurons on the first layer just pass forward the input data to the network. The activation function on all hidden layers is tanh and on the output layer is sigmoid. The only difference from a standard backpropagation algorithm is a constant called the error propagation amplifier, which is a number larger than one that is multiplied with the error propagated backwards through the network without which this error would be dampened in its path through the network having more than a single hidden layer. Initial weights of the network and initial values of elements of word embeddings are set to random values from the interval of (-0.5, +0.5). The network is trained according to the following algorithm: 1. The input of the network is a single word embedding. 2. The network computes the output of this word embedding on its output layer. 3. The network computes squares of differences of its activations on the output layer from the correct values of the output which is one of 27 possible answers, mentioned in Section 2.2, encoded as a vector of 26 zeros and a single value of one on a position characterizing this answer. Such a representation is called one-hot vector in [4]. These squares of differences are errors on neurons on the output layer. 4. The network computes errors on neurons on all its other layers in such a way that each but the output layer computes its error from the error on the next layer. 5. The error on the first (input) layer is used to update the values of the input word embedding. 6. A l l weights of the network are updated according to the errors computed on the layers that they lead to. The process of learning a single implication, defined in Section 2.2, consists of the above mentioned 6 points applied successively to all word embeddings forming the antecedent of the implication. The network repeats learning all implications until it reaches its maximal accuracy of the prediction of all consequents from all antecedents. The implications in the training set are processed one after another. After the processing of this set the neural network outputs the number of correctly classified implications into consequents (i.e. answers) in the training set and test set separately. To get these numbers of correctly classified implications the network at first computes the sum of activations of the output neurons for all 26 kinds of word embeddings which are English letters A to Z , see Section 2.2, and the result is a vector called sum_of_output with as many elements as is the number of neurons in the output layer. Then the network computes the vector of the same number of elements with the sum of the output for all word embeddings present in the set for a given implication called sum_of_output_in_implication according to formula (6). sum_of_output_in_implication = ^ 2 • output_of_word ^ words in implication Then the index of the answer to which the network classifies the set of words in the implication is the index of maximal value in the vector (7). sum_of_output_in_implication - sum_of_output (7) The value of elements in vector (7) is equal to the sum of positive values of the output of words in the implication and the negative values of the output of all other words. In this way the network can discriminate between such sets of words where one set is the subset of the other set. 352 5 Experimental Results This section presents the quality (in terms of algorithmic complexity and accuracy of its response) of information retrieval system in the form of a rule-based expert system, see Section 3, and in the form of a neural network, see Section 4. 5.1 Rule-Based Expert System The algorithmic complexity of the already compiled decision tree is directly proportional to the average length of its branches leading from the initial node that processes the first recognized keyword to the final node with the answer. Because of the fact that the rule-based expert system does not run in real production line, we cannot statistically determine its rate of successful use cases. If all words are correctly recognized by the speech recognition module and the user uses the right words for the answer encoded in the decision tree than the system is always successful. 5.2 Neural Network The algorithmic complexity of a trained neural network is directly proportional to the number of its weights multiplied with the number of words in the question. The fully connected network with 5 layers with l\ to k number of neurons respectively has number of weights determined by formula (8). number of weights = (lt + 1) • l2 + (l2 + 1) • l3 + (l3 + 1) • Z4 + (l4 + 1) • ls (8) The best results on data described in Section 2.2 have been achieved with the network with 10 neurons on the input layer, which means that the embeddings were represented by vectors of 10 values. Each of all three hidden layers had 20 neurons and the output layer had 27 neurons, so that the network could classify word embeddings into 27 different answers. The learning rate parameter was 0.0001 and the error propagation amplifier was 1.1. The network has been cycling through the training set and the number of correctly classified implications had both in the training set and the test set its highest values after 21000 cycles. These values were 3762 for the training set and 3800 for the test set. The reason why the number of correctly classified implications in the test set was higher than at the training set was the way of computing the class according to formulas (6) and (7) which are different from the result of standard minimization of the sum of square error. Another experiment has been conducted with the network that learned implications defined only by the keywords, as if the function words were filtered from the input using some external knowledge, as is the case in our rule-based expert system. Analysis of erroneously classified implications has shown that misclassified implications in the test set were almost the same as those in the training set as well as in the results of different experiments with neural network being trained from the state of random weights and random word embeddings. These errors are related to the formulas (6) and (7), because for some implications mainly with keywords designed as synonyms of words in other implications with the same answer only the sum_of_output_in_implication would be more appropriate. The accuracy of classification as the percentage of correctly classified implications in the number of all implications is shown in Table 1. Set Number of antecedents Number of correctly classified antecedents Accuracy Training 4212 3762 89.32% Test 4264 3800 89.12% Keywords only 52 47 90.38% Table 1 Accuracy of neural network The analysis of variance of random projections of resulting word embeddings shows that words that had a relatively large number of answers associated with them have lower variance than words in implications having a relatively small number of answers as their consequent. In such a way function words could be statistically discriminated from keywords. 353 6 Conclusion In this work, two approaches to the task of information retrieval have been compared: the decision tree and the neural network. The decision tree has been compiled manually from manually designed set of keywords. The data for the neural network have been compiled manually as well but function words have been added to keywords. Experimental results have shown that the trained word embeddings could be statistically processed to discern keywords from function words. The main advantage of our decision tree is its 100% accuracy when it is properly used. Its main disadvantage is the need of manual work with words and manual addition of branches for new answers. The main advantage of neural network is its ability to work with speech that contains also function words when it is trained on enough data. Its main disadvantage is its non-zero error rate even for a relatively small amount of data and the fact that it cannot recognize utterly wrong answers while the decision tree has branches for such combinations of words. In the next phase of our research, we will seek to find proper uses of both systems in real production settings. One of the aims of this paper is to draw attention to the emerging way of economic data analysis in which economic objects would be represented by embeddings. Analysis applied to text, like prediction of next word or finding the relationships between words where words or other units of the text are represented as embeddings, as it is shown in this paper, can be likewise done with various participants and phenomena of economic processes, especially from big data produced by the internet and other information and communications technology. Acknowledgements This work is supported by internal grant of the Faculty of Economics of the Technical University of Liberec. References [I] Bengio, Y . , Ducharme, R., Vincent, P., & Jauvin, Ch. (2003). A Neural Probabilistic Language Model. Journal of Machine Learning Research, 3(Feb), 1137-1155. [2] Boratto, L . , Carta, S., Fenu, G . , & Saia, R. (2016). Using neural word embeddings to model user behavior and detect user segments. Knowledge-Based Systems, 108, 5-14. DOI: 10.1016/i.knosvs.2016.05.002 [3] Gomes, D . da S., M . , Cordeiro, F. C , Consoli, B . S., Santos, N . L . , Moreira, V . P., Vieira, R., Moraes, S., & Evsukoff, A . G . (2021). Portuguese word embeddings for the oil and gas industry: Development and evaluation. Computers in Industry, 124, 1-14. DOI: 10.1016/i .compind.2020.103347 [4] Jurafsky, D . & Martin, J. H . (2020). Speech and Language Processing. Third Edition draft. https://web.stanford.edu/~jurafsky/slp3/ [5] Kee, T. (2019). Peer Firm Identification Using Word Embeddings. In 2019 IEEE International Conference on Big Data (Big Data), (pp. 5536- 5543). Los Angeles, C A , U S A . DOI: 10.1109/BigData47090.2019.9006438 [6] Lewis, D . D . & Ringuette, M . (1994). A comparison of two learning algorithms for text categorization. In Proc. SDAIR 94, (pp. 81-93). Las Vegas, N V . [7] Mahanty, S., Boons, F., Handl, J., & Batista-Navarro, R. T. (2019). Understanding the Evolution of Circular Economy through Language Change. In Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change, (pp. 250-253). Florence, Italy. [8] Mantling, Ch. D . & Schütze, H . (1999). Foundations of Statistical Natural Language Processing. Cambridge, Massachusetts, London, England: The M I T Press [9] Maron, M . E . & Kuhns, J. L . (1960). On relevance, probabilistic indexing, and information retrieval. Journal of the ACM. 7(3), 216-244. DOI: 10.1145/321033.321035 [10] Quarteroni, S. (2018). Natural Language Processing for Industrial Applications. Informatik-Spektrum, 41, 105-112. DOI: 10.1007/s00287-018-1094-1 [II] Sarker, A , Yang, Y - C , Al-Garadi, M . A . , & Abbas, A . (2020). A Light-Weight Text Summarization System for Fast Access to Medical Evidence. Frontiers in Digital Health, 2. DOI: 10.3389/fdgth.2020.585559 [12] Schütze, PL, Hull, D. A . , & Pedersen, J. O. (1995). A comparison of classifiers and document representations for the routing problem. In SIGIR '95, (pp. 229-237). DOI: 10.1145/215206.215365 [13] Theil, Ch. K . , Stajner, S., & Stuckenschrnidt, H . (2020). Explaining Financial Uncertainty through Specialized Word Embeddings. ACM/IMS Transactions on Data Science. 7(1). DOI: 10.1145/3343039 [14] Wiener, E., Pedersen, J., & Weigend, A . (1995). A neural network approach to topic spotting. In Proc. SDAIR 95, (pp. 317-332). Las Vegas, N V . 354 Permanent Income Hypothesis with the Aspect of Crises. Case of V4 Economies Václava Pánková Abstract. Some households manage their consumption according to the permanent income hypothesis (PIH) and others consume their actual income without any apparent long-run conception. The quantification of a percentual share of both groups bring an important macroeconomic information about a considerable part of G D P . The theory of P I H allowing for computing the share of both consumption sorts is described and applied to V 4 (Czech Republic, Hungary, Poland, Slovakia) economies. The influence of financial crisis in the end of first decade and a starting period of covid crisis is formulated by the help of a multiplicative dummy variable. The share of households consuming according to actual income is from two thirds to three quarters and uses to increase significantly due to the impact of crises. Keywords: PIH, cointegration, seemingly unrelated regression J E L Classification: E2, C82, C51 A M S Classification: 62J05 1 Introduction Consumption is an important entity both for the households and for the policymakers. As a considerable part of G D P (represents 50 - 70 % of spending in most economies) it is a subject of theoretical studies and theories. In 1957, M . Friedman defined in [4] a concept of permanent income hypothesis (PIH) based on an assumption that the economic subject consumes according to its expected rather than actual income. Permanent income and consumption represent a concept which brings a long - run information because of its nature. It is a theory of consumer spending which states that people will spend money at a level consistent with their expected long term average income. Consumers do not care about the past, they only care about the present and future. The consumed proportion of the expected income may be influenced by the shocks which are either transitory (e.g. short-run changes of taxes, super-gross wage cancelled for two years only) or permanent (e.g. changes of the social system, possible retirement reform). Consumers will react differently i f a shock is permanent rather than transitory. In case of PIH, consumption functions should not be formulated in terms of consumption expenditures and disposable income but in terms of permanent and transitory income and consumption which are not observable. The permanency phenomenon was studied by using different complementary theories: Adaptive expectation is applied e.g. in [2], Based on rational expectation hypotheses it is elaborated by Hall [5] and Sargent [8], both approaches harmonized by Flavin [3], The concept of rational inattention was formulated by Sims [9] changing the quality of the topic by an addition of further aspects. Validation of permanent income hypothesis made by the help of some of the mentioned approaches can bring eventual implications for policy makers. 2 Material and Methodology Though the theory never was called into question, empirical data often used not to validate the P I H even in the most developed economies. A convenient analytical apparatus occurred due to articles of Hall [5], Campbell and Mankiw [1] and Flavin [3]. A realistic assumption is incorporated that only a part of households uses a permanent income and a percentage of it can be estimated. One group of consumers reflects the actual disposable income as C l t = Ylt and the other group consumes according to the permanent income C2t = Y2 p t. Together the income is Yt = l i t + Yft = *>Yt + (1 - (ú)Yt , 0)et . (3) D is a multiplicative dummy variable, D = 1 during the periods of crises. The multiplicative dummy shows a)AY, if D = 0 A Q - O + SD)AYt = + £ lfD=1 5 is anticipated to be positive. B y 8 we do not measure the changes in consumption but the changes in the PIH versus non-PIH distribution. 3 Results Model (3) is applied to V 4 economies (Czech Republic, Hungary, Poland, Slovakia). Quarterly data (source: Eurostat, seasonally unadjusted) cover the period from 1996Q1 to 2020Q2. Consumption C and G D P Y are distinguished by suffixes -cz for Czechia, -hu Hungary, -pi Poland and -sk Slovakia. G D P is used as a proxy for aggregate income. First, the series were transformed using the H P (Hodrick - Prescott) filter. Further, the A D F (Augmented Dickey Fuller) test was performed to study stationarity. Relevant t-statistics are shown in Table 1. Critical value is -2.894 at 5 % level Evidently, all the series are 1(1). Using the differences in equation (3) we are sure to avoid a spurious regression. t-statistic /series/ t-statistic /first differences/ Ccz -1.732 -19.828 Chu -1.936 -19.801 Cpl -1.853 -19.739 Csk +4.997 -17.814 Y c z -1.725 -19.786 Yhu -1.985 -19.784 Y p l -1.913 -19.750 Ysk +1.272 -18.949 Table 1 t-statistics of A D F test A l l the data is measured in local currencies; as for the Slovakia both variables are in Euro during all the followed period. The transformation from Slovak crown to Euro of the data from the pre-Euro period was done by Eurostat. 356 The non-homogeneity in currencies is the reason why the seemingly unrelated regression S U R was used instead of a panel technique. Such estimates are more efficient (in comparison to O L S applied to single equations) if the error terms are correlated between the equations. It respects single equations (one for each country) but allow for an assumption of common disturbance impacts what in V 4 can very well be reasoned due to the rather long common history and very similar economic development. The substantial part of computation (Eviews) is presented as Table 2 with the parameters re-named for a better orientation. Estimation Method: Seemingly Unrelated Regression Included observations: 392 Coefficient Std. Error t-Statistic Prob. Alpha-cz 3.516108 25.83382 0.136105 0..8918 Omega-cz 0.657512 0.004777 137.6365 0..0000 Delta-cz 0.179703 0.020425 8.798326 0..0000 Alpha-hu -320.5660 275.3116 -1.164375 0..2445 Omega-hu 0.709742 0.004892 145.0720 0..0000 Delta-hu -0.003012 0.022498 -0.133898 0..8935 Alpha-pl 12.18713 10.17939 1.197236 0..2314 Omega-pl 0.731144 0.004306 169.7962 0..0000 Delta-pl 0.040384 0.010869 3.715496 0..0002 Alpha-sk 0.275806 0.776889 0.355014 0..7226 Omega-sk 0.765118 0.008276 92.44940 0..0000 Delta-sk 0.125075 0.027235 4.592348 0..0000 Equation: D C C Z _ H P = Alpha-cz + Omega-cz * D Y C Z _ H P + Delta-cz * D 1 Y Observations: 391 R-squared 0.980779 Equation: D C H U _ H P == Alpha-hu + Omega-hu*DYHU_HP + Delta-hu*D2Y Observations: 391 R-squared 0.982175 Equation: D C P L _ H P = Alpha-pl + Omega-pl*DYPL_HP + Delta-pl*D3Y Observations: 391 R-squared 0.988145 Equation: D C S K _ H P = Alpha-sk + Omega-sk*DYSK_HP + Delta-sk*D4Y Observations: 391 R-squared 0.959429 Table 2 The Eviews output, parameters of model (3) The share of those who consume according to their permanent income (given by 1 — oo ) and a change expected under an influence of a crisis (given by 8) is presented in Table 3. Percentage of P I H house- holds Influence of crises Czech Republic 34.25 -17.97 Hungary 19.03 no apparent change Poland 26.86 - 4.03 Slovakia 23.49 -12.50 Table 3 The share of P I H households 357 4 Discussion and Conclusions Consumption following a permanent income is a theoretical concept the confirmation or non-confirmation of which brings a consequence to an eventual forecast of future aggregate consumption. A n econometric approach for testing the validity of permanency is known but empirical data often used not to validate the PIH. A more realistic assumption incorporates an idea that only a part of households uses a permanent income and a percentage of it can be estimated. The other group of households are those whose consumption tracks their actual income closely. In the article 1, the economic idea and basic econometric theory of the relevant approach are recapitulated. Model is formulated and extended by a variable allowing for a quantification of a possible crises impact on the proportion of both groups of consumers. Article 2 delivers an application to V 4 economies. It starts by a routine time - series analysis of the data. Econometric estimation is made by the help of the method S U R which allows for profiting from the fact that economic, social and historical background of countries comprised leads to correlated disturbances and thus to more efficient estimates. A brief economic interpretation of the results follows. We conclude that in all the countries of V 4 a consumption tracks income very closely. The share of households consuming according to actual income is from two thirds to three quarters and uses to increase significantly due to the impact of crises. The largest part of P I H consumers is detected in C R but the group seems not to be very much stable; the impact of crises decreases it approximately by one half. Very similar influence of crises is apparent in case of Slovakia. In both countries, the permanent consumers form a group which is apparently very fragile and threatened by serious economic shocks. The P I H group in Poland is much more stable and the Hungarian PIH consumers form though a smaller however a robust part of the households of the country. In [7] the author tried to answer the same question by using the same data but another computational method. The results for Czech Republic, Poland and Slovakia are almost equal, in case of Hungary there is a ten-percent fluctuation in favour of the ad hoc consumption. The detailed knowledge of a consumption pattern can bring important implications for policy makers as well as for the producers and traders. Acknowledgements The financial support of I G A F4/34/2020 is gratefully acknowledged. References [1] Campbell, J. Y . & Mankiw, N . G . (1990). Permanent Income, Current Income and Consumption, Journal of Business & Economic Statistics, V o l . 8 No. 3 (pp 265 - 279). [2] Dougherty, C. (2016) Introduction to econometrics, Oxford University Press. [3] Flavin, M . A . (1981). The Adjustment of Consumption to Changing Expectations about Future Income, The Journal of Political Economy, V o l . 89 No. 5 (pp 974 - 1009) [4] Friedman, M . A . (1957). A Theory of Consumption Function. Princeton Univ. Press [5] Hall, R. E . (1978). Stochastic Implications of the Life-Cycle-Permanent Income Hypothesis: Theory and Evidence. Journal of Political economy, V o l . 86 No. 6 (pp 971 - 987). [6] Kuan-Min, W . (2011). Does the Permanent Income Hypothesis Exist in 10 Asian Countries? E+M Ekonomie a Management, V o l . 14 No 4 (pp 1212 - 3609). [7] Pánková, V . (2021). Permanentní spotřeba v zemích V4. Competition 2021, Proceedings of conference V S P (will be issued December 2021), accepted [8] Sargent, T. J. (1978) Rational Expectations, Econometric Exogeneity, and Consumption. Journal of Political Economy, V o l . 86, No. 4 (pp 673-700). [9] Sims, C. A . (2002). Implications of rational inattention, http://pages.stern.nyu.edu/~dbackus/Exotic/lRo- bustness/Sims%20inattention%20JME%2003.pdf 358 Multi Vehicle Routing Problem Depending on Vehicle Load Juraj Pekar1 , Zuzana Cickova2 , Ivan Brezina3 Abstract: Nowadays, the emphasis in the transport planning is on its efficiency in connection with the ever-increasing environmental aspects. The related problems are solved in the field of logistics. In general, various optimization models aimed at minimizing the total distance traveled, minimizing the driving time of the vehicle, fuel consumption, C 0 2 emissions or to meet other objectives have been developed. The known models of multiple vehicles enable to coordinate the deployment of several types of vehicles. This paper focuses on the problem of tracing several types of vehicles, which minimizes the fuel consumption of used vehicles depending on the length of the route, but also on their load. This idea is presented through a constructed multi vehicle routing problem depending on vehicle load ( M V R P V L ) . The difference in the distribution compared to the multi vehicle routing problem illustrates the solving of the given problem in Slovakia. Keywords: Multi Vehicle Routing Problem, Vehicle Load, Mathematical Model J E L Classification: C02, C61 AMS Classification: 90C11, 90B06 1 Introduction A variety of vehicle routing problems optimization models aimed at minimizing the total traveled distance (or the other related costs) are commonly known. Most models are based on capacitated vehicle routing problem (CVRP), which designs optimal set of routes of vehicles from depot (suppose unlimited number of the same vehicles in the depot) aimed to serve a set of customers with a known demand, where each vehicle travels exactly one route so that the demand of customers must to be met in full by exactly one vehicle and vehicle's capacity must not be exceeded. It is assuming the known lowest cost (usually distance) between depot and all of the customers, as well as between each pairs of customers ([1], [3], [4], [5], [7], [9]) and its application can led to significant cost savings. However, when analyzing the transport costs fuel consumption, it is clearly observed that they are related to fuel consuption. That means that the traveled distance is not the only relevant factor and undoubtedly also vehicle load has significant impact on the consumption. This idea was introduced in [3], where the authors presented capacited vehicle routing problem depending on vehicle route. The paper was aimed on a model that minimizes the fuel consumption, depending on the length of the traveled route and also on the vehicle load. Let's extend this idea and present multi vehicle routing problem depending on vehicle load ( M V R P V L ) . This paper is divided into following interrelated parts: In the first part of second section we present presuppositions and a mathematical model of M V R P V L . At first, we present mixed integer non-linear formulation of the problem and then its modification to mixed integer programming model that contains linear equations and linear objective. The third part is devoted to an illustrative example, where we calculate optimum route based on C V R P as well as on a C V R P V L . The results illustrate the difference using both of approaches, while we also report an achieved decrease in the fuel consumption. The presented approach is an extension of classical routing problems with the aim of reducing C 0 2 emissions while minimizing fuel consumption. The result of M V R P V L is the delivery of a certain commodity from one service center to individual customers, which require the delivery of known quantities of commodity, provided that a transport with different vehicles with limited capacity. However, the modified objective function reflects not only the shortest distances, which represent the evaluation between nodes in the transport network, but also the set of considered vehicles with known and different fuel consumption per unit od distance and also additional vehicle fuel consumption per tonne of payload. This is related to payload weight of the vehicle to the customer served. 1 Department of Operations Research and Econometrics, Faculty of Economic Informatics, University of Economics i n Bratislava, Dolnozemská cesta 1, 852 35 Bratislava, e-mail:, juraj.pekar@euba.sk. 2 Department of Operations Research and Econometrics, Faculty of Economic Informatics, University of Economics i n Bratislava, Dolnozemská cesta 1, 852 35 Bratislava, e-mail: zuzana.cickova@euba.sk 3 Department of Operations Research and Econometrics, Faculty of Economic Informatics, University of Economics i n Bratislava, Dolnozemská cesta 1, 852 35 Bratislava, e-mail: ivan.brezina@euba.sk. 359 2 Multi Vehicle Routing Problem Depending on Vehicle Load The mathematical formulation of multi vehicle routing problem depending on vehicle load ( M V R P V L ) is based on Miller-Tucker-Zemlin's formulation of the traveling salesman problem ([8]). Let's use the following notation: Let N = {1,2,...«} be the set of served nodes (customers) and let N0 = Afu{0}be a set of nodes that represents the customers as well as the origin (depot). Certain demand q,, i G N is associated with each customer. Further on there exists a matrix D(n+i)X(n+i) associated with pairs i, j e No, i that represents the minimum distances between all the pairs of nodes (customers and the depot). There are different types of vehicles in unlimited numbers in the depot, which are represented by the set H = {1,2,... ft}. Each vehicle has a certain capacity, designated as gk, k E H. A l l the customer's demands have to be met from the depot in full and in such a way that the distribution is performed by exactly one of the vehicles. W e implicitly assume that q-,< m i n q k , i <=N, k E k H, i.e. the demand of each customer does not exceed the capacity at least one of the vehicles. N o w consider parameters associated with the fuel consumption, that can differ according to vehicle type. Consider parameter at and bk, which are related to consumption of the k-th vehicle, k E H, when the parameter at represents the consumption per unit distance and the parameter bk be parameter represents the increase in consumption for one unit per unit distance. We will use binary variables xyk (z, j e N0 j , k E H) that determine i f the node i precedes node j in the route of k-th vehicle (if yes: xyk = 1 and i f not: x-,jk = 0) in the case of distribution. In the case of collection those variables determine i f node j precedes node i in the route of k-th vehicle. Further on, we will use variables M,, i G N. When suppose that the goods are collected, those variables represent cumulative load of vehicle on its one particular route (to i-th customer). O n the other hand, they represent cumulative load of vehicle (from i-th customer) on its one particular route (note the meaning of binary variables xv- differ depending on problem type (collection or distribution)). Unlike the classical C V R P [11], the goal is to find out such distribution, which minimizes whole vehicle's fuel consumption (not minimizing the total distance). Let us recap the variables and parameters once again: Sets and parameters: n - number of customers (served nodes), N = {l,2,...n} - set of customers (served nodes), N0 = N u {0} - set of customers and the depot, dij, i, j e No, i # j - shortest distance between nodes i to node j, q„ i e N (qo = 0) - demand of z'-fh customer, zero demand in the depot gk - capacity of k-th vehicle, k EH, ak- consumption of the k-th vehicle per unit distance, k E H, bk - increase in consumption for one unit of k-th vehicle vehicle load per unit distance, k E H. Variables: x-,jkE {0,1}, i, j e No, i # j, k E H representing i f the node i precedes node j on the route of k-th vehicle, u!k>0,ieN,kEH, uo k = 0, representing vehicle load to z'-fh node (including) in the case of goods collection. Mathematical model: m i n / ( X ) = __Z__K( 0, i, j e NQ,k e H,i ^ j that represent the current load of the k-th vehicle on the edge (i, j). Now we can rewrite the objective (1) in the form: m i n / ( X ) = X Z Z K ( ^ V + ^ ) ) 0. K-th moment of F M K L G L D exists only if m i n ( / l 3 , 2 4 ) > — A : - 1 . We use the method of maximum likelihood estimation to the estimation of parameters. 3 Portfolio selection model in space of expected return and absolute de- viation It is an optimization model of portfolio selection based on the mean absolute deviation risk measure, also called the mean absolute deviation model ( M A D ) [5]. It is a model of linear programming, which in the case, when the returns are normally distributed, gives the same results as the Markowitz model. The advantage of M A D is that it does not impose a condition on data distribution. The formulation of the M A D model in space of expected return and absolute deviation with the objective function of minimizing the mean absolute deviation has the form: •* teT yr-T^j(f,-rj)>0, teT jeN jeN H r i W i ^ E P jeN 2>,=IjeN 0