Data analytics approach to predict the hardness of the copper matrix composites

Copper matrix composite materials have exhibited a high potential in applications where excellent conductivity and mechanical properties are required. In this study, the machine learning models have been applied to predict the hardness of the copper matrix composite materials produced via powder metallurgy technique. Two particular composites were considered in this work. From experiments, we extracted the independent variables (features) like the milling time (MT, Hours), dislocation density (DD, 1/m2 ), average particle size (PS,nm), density (gm/cm3 ) and yield stress (MPa) while the Vickers Hardness (MPa) was used as the dependent variable. Feature selection was performed by calculation the Pearson correlation coefficient (PCC) between the independent and dependent variables. We employed six different machine learning regression models to predict the hardness for the two matrix composites.


Introduction
The excellent mechanical properties and electrical conductivity of copper matrix composites (CuMCs) and copper alloys possess [1,2,3] making them desirable materials in several industries viz. automotive, aerospace, military, nuclear, electronic. The main potential of these materials lies in reaching the favorable relation between improving the mechanical properties and preserving high conductivity. It is well known that the lower content of alloying elements in the copper matrix supports higher thermal and electrical conductivity. The most commonly used reinforcements [4,5] for copper matrix are metals (Ti, Mg, Co, Ni, etc.) or ceramic particles SiC and Al2O3, while in recent years with particles such as ZrB2, TiO2, TiB2, TiC, B4C, etc. Since the properties of the CuMCs and its alloys strongly depends on the nature, amount and distribution of the reinforcements, the great attention is given to the selection of the manufacturing techniques for production of CuMCs and Cu alloys. Ingot and powder metallurgy are both used for production of the Cu based materials, where powder metallurgy is more suitable when in situ formation of the reinforcing particles is needed [6,7,8,9,10]. Although, the most recent study [11] of the copper matrix particulate-reinforced with ZrB2 ceramics produced by ingot metallurgy show that as-cast Cu-ZrB2 composites can reach the improvement in hardness up to 140 Vickers Hardness (HV) similar to the results obtained by powder metallurgy [12].
Investigation of the copper based materials attracts researches and engineers from different fields due their wide application and fast industry growth.
In powder metallurgy technique, the properties of alloys and metal matrix composites (MMCs) depend largely on the milling time. Thus, it is highly desirable to have a rapid and accurate prediction of the hardness via structure-property correlation of these MMCs. While physics-based models (e.g. density functional theory and phase field simulations) can promote understanding at a given length scale but they are often limited to low order model systems due to computational complexity and lack of input parameters to represent realistic higher-order systems. An efficient way to achieve is the data-driven methodology that involves applying statistical learning tools to analyze correlations between hardness and features of the MMCs. Machine learning (ML) approach can reduce the experimental cost and time while predicting target properties of materials [13,14,15,16].
In the present study we made an attempt to apply ML approach to predict the hardness of the CuMCs. We employed six different regression models (random forest, gradient boost, near neighbor, support vector, kernel ridge and linear) to predict the hardness. The remaining paper is organized as follows. In section 2.1 and 2.2, we briefly describe the experimental work the machine learning model, respectively. We discuss our results in section 3 followed by conclusion.

Experimental work
The Cu-ZrB2 alloy was produced using powder metallurgy technique, where Cu, Zr and B were used as starting powders. Mechanical alloying was performed in the attritor mill. The in situ formation of ZrB2 particles inside the Cu matrix was achieved during hotpressing at 950 o C. Morphological analyses of the mechanically alloyed (MA) powder mixtures were done by particle sizer and scanning electron microscopy (SEM). Microstructural characterization of the MA powder mixtures and hot-pressed samples were characterized by X-ray powder diffraction (XRD) and SEM. Detailed production procedure of Cu-ZrB2 composites and characterization methods applied have been described in previous studies [12,17,18].

Machine learning models
The primary requirement to build a statistical learning model for any material is to have a dataset containing the material descriptors or features, X. These descriptors represent the fundamental material properties. The basic task of the machine learning (ML) models if to map these features to a specific (target) property, Y (hardness in this case), that is, = . Thus, the two important elements of machine learning approach are the empirical model, and features, . The ML model must be trained and cross validated using the training dataset which includes the measured targeted property. The trained model is then applied to an unseen dataset in order to predict he target property. From the experiments, we get the milling time (MT, Hours), dislocation density (DD, −2 ), average particle size (PS, ) , density ( , 3 ) and yield stress ( , MPa) as our descriptors, . In this study we used two datasets two different MCs: (i) Cu-7% vol. ZrB2 and (ii) Cu-2%vol. ZrB2. As powder metallurgy is a time consuming process, in both the cases the datasets are small. As will be explain later, these small datasets are enough to understand the trend for these mechanically alloyed powders considered in the present study.
To predict the hardness of both the alloys, Cu-7%vol. ZrB2 and Cu-2%vol. ZrB2, we used different ML models: random forest (RF) regression [19], gradient boosting (GB) regression [20], support vector (SV) regression [21], k-Nearest neighbors (KNN) regression [22], linear regression (LR) [23] and kernel ridge (KR) regression [24] as implemented in the Python based open source data analytics toolkit, scikit-learn [25]. RF and GB regression models are ensemble learning methods where multiple decision trees are constructed. SV regression is considered a nonparametric technique as it relies on kernel functions. The linear regression models the relationship between the input and output variables using a linear predictor function and fits to minimize the residual sum of squares between observed data and predicted data. Kernel Ridge regression estimates the conditional expectation of a random variable to find a non-linear relationship between a pair of random variables. Using the kernel method, it simplifies the product of the inner products in a high dimensional space and learns a linear model in the implicit feature space induced by the kernel and the dataset. k-Nearest neighbors regression model uses a nonparametric method and outputs the average number of given data points, the k nearest neighbors.
Due to the availability of the small dataset, we performed Leave One Out (LOO) -cross validation (CV) [26]. The training of ML models with CV avoids the errors due the bias and variance. Finally, the hyperparameters for the ML models were optimized during the training process. For model performance we calculated the coefficient of determination, 2 [27]. It is important to note that for both these CuMCs, we trained the ML models separately with their respective datasets.

Results and discussion
A strong influence of the milling parameters on the morphological and mechanical properties of alloys and MMCs has been reported in many studies. The duration of the milling process is essential in providing uniform distribution of the reinforcing particles in the metal matrix. During milling in the Attritor mill, the powder mixture is exposed to high energy collisions such as ball-particle-ball. Those collisions initiate changes in lattice parameters, shape and size as well as the hardness of the particles. Finding the suitable milling parameters for each alloy or composite material is a time consuming process.
First all the features were subjected to the correlation filter to remove those which are uncorrelated by calculating the Pearson correlation coefficient. The Pearson correlation is the measure of the linear correlation between the predictors, , and target, . The Pearson correlation maps for both the CuMCs are shown in Figure 1. For both the MCs, we observed the yield stress and density to have the strongest correlation with hardness followed by dislocation density. The particle size was found to have the lowest correlation coefficient in case of Cu-7%.vol ZrB2 composite. Importantly all the features were found to have a positive correlation coefficients for Cu-7%.vol ZrB2 composite. For the case of Cu-2%.vol ZrB2 composite, the features, milling time and particle size, were found to have negative correlation. While the milling time was found to have almost no correlation with hardness, the particle size was found to have a weak negative correlation. While we use all the features of Cu-7%.vol ZrB2 composite for fitting the ML models, in case of Cu-2%.vol ZrB2 composite we dropped the two feature milling time. Next we train the ML models using LOO-CV. We calculated the coefficient of determination ( 2 ) to evaluate the model performance. The coefficient of determination ( 2 ) which is calculated as where is the true value, ̂ is the predicted value and ̅ is the mean of . The 2 value lies between 0 and 1, with 1 signifying excellent fits.
In Table 1, we summarized the 2 for Cu-7%.vol ZrB2 composite. In this case, the random forest and kernel ridge regressor models exhibited the highest accuracy (92%) followed by gradient boosting regressor (88%) while the nearest neighbor regressor has the lowest accuracy of 79%. It is evident that all the models were able to achieve an accuracy of 80% or even higher. For the two best performing ML models, random forest and kernel ridge, we plotted the true and predicted values of hardness for the Cu-7 vol.% ZrB2 composite as shown in Figure 2.
In Table 2, we tabulated the 2 values obtained for the different ML models applied to Cu-2% vol. ZrB2 composite. For gradient boosting we achieved an accuracy of 79% while for the support vector regressor and for kernel ridge regressor we obtained an accuracy of 74%. Overall, all the ML models have a lower accuracy in case of Cu-2vol.% ZrB2 compared to Cu-7% vol. ZrB2 composite. We think perhaps more data is necessary to make a better predictive model for the hardness of Cu-2% vol. ZrB2 composite. In figure  3, we plotted the true and predicted values of hardness for the Cu-2% vol. ZrB 2 composite for the gradient boosting and random forest models.

Conclusion
In summary, we have built a regression model to predict the hardness of CuMCs prepared by powdered milling method. For Cu-7 vol.% ZrB2 composite we achieved an accuracy of 80% or higher. On the other hand, the ML models for Cu-2 vol.% ZrB2 composite have a lower predictive accuracy. To improve the accuracy of the ML models, we think some more data points must be included in the training dataset. The same strategy can be extended to other matrix composites prepared from mechanical alloying method.