In 2003 and 2004 I spent a great deal of time trying to understand the relationship of steel prices to each other, to commodities, and to the stock market. I felt it would be important to the risk management component of our effort to build a great steel company. I read everything I could find in the steel literature and on then I read everything on wall street. There was woefully little information of any value. As a result, I took the project on myself and the steel correlation matrix herein is the result of that effort.
The ingredients were the following: I examined 20 years of steel data from CRU, World Steel Dynamics and Purchasing Magazine and chose the data which was of the highest quality. This was 20 years of monthly price data on the various steel products listed. Then I acquired data as of the exact same dates for the closing commodity prices, closing nasdaq and S&P indices, and closing data on those dates for roughly 500 stocks that survived that period of 20 years. I then ran a large calculation that produced the matrix herein. The data is now a bit old, but given the 20 years of data I used in the exercise the results still have reasonable validity. I decided to publish these results rather than sit on them. There is a great deal of intuition about the steel market to be gained by studying this information. I hope you find it interesting.
Use the Correlation transformer to determine the extent to which changes in the value of an attribute (such as the price of hot rolled steel) are associated with changes in another attribute (such as the equity price of US Steel, or the commodity price of copper). The data for a correlation analysis consists of two input columns. Each column contains values for one of the attributes of interest. The Correlation transformer can calculate various measures of association between the two input columns. You can select more than one statistic to calculate for a given pair of input columns.
The data in the input columns also can be treated as a sample obtained from a larger population, and the Correlation transformer can be used to test whether the attributes are correlated in the population. In this context, the null hypothesis asserts that the two attributes are not correlated, and the alternative hypothesis asserts that the attributes are correlated.
The Correlation transformer calculates any of the following correlation-related statistics on one or more pairs of columns:
The correlation coefficient r is a measure of the linear relationship between two attributes or columns of data. The correlation coefficient is also known as the Pearson product-moment correlation coefficient. The value of r can range from -1 to +1 and is independent of the units of measurement. A value of r near 0 indicates little correlation between attributes; a value near +1 or -1 indicates a high level of correlation.
When two attributes have a positive correlation coefficient, an increase in the value of one attribute indicates a likely increase in the value of the second attribute. A correlation coefficient of less than 0 indicates a negative correlation. That is, when one attribute shows an increase in value, the other attribute tends to show a decrease.
Consider two variables x and y:
Covariance is a measure of the linear relationship between two attributes or columns of data. The value of the covariance can range from -infinity to +infinity. However, if the value of the covariance is too small or too large to be represented by a number, the value is represented by NULL.
Unlike the correlation coefficient, the covariance is dependent on the units of measurement. For example, measuring values of two attributes in inches rather than feet increases the covariance by a factor of 144.
T-value is the observed value of the T-statistic that is used to test the hypothesis that two attributes are correlated. The T-value can range between -infinity and +infinity. A T-value near 0 is evidence for the null hypothesis that there is no correlation between the attributes. A T-value far from 0 (either positive or negative) is evidence for the alternative hypothesis that there is correlation between the attributes.
The definition of T-statistic is:
where r is the correlation coefficient, n is the number of input value pairs, and SQRT is the square root function. If the correlation coefficient r is either -1 or +1, the T-value is represented by NULL. If the T-value is too small or too large to be represented by a number, the value is represented by NULL.
P-value is the probability, when the null hypothesis is true, that the absolute value of the T-statistic would equal or exceed the observed value (T-value). A small P-value is evidence that the null hypothesis is false and the attributes are, in fact, correlated.
Your source table and target table must exist in the warehouse database. This transformer can create a target table in the same warehouse database that contains the source, if you want it to. You can change the step only when the step is in development mode.