# The Gold Standard

In data science, a gold standard is a set of labels or answers that we trust to be correct, Skills4Industry is a set of tools representing Gold Standard for the translation of academic subject matter to work skills. Using Skills4Industry we have embedded all academic subjects into work tools, with predictive models assigning continuous data from for example GPS and Natural Language NLP based apps to provide a driver feedback on their level of Skills4Industry and other skills applied in their daily work.

At Level 1 there are two sets of Skills4Industry:

- At 5
^{th}– 8^{th}(levels 1.1 – 1.5) grades they are mathematics and statistics, communication, technology and data competencies; - At 9
^{th}– 12^{th}grades (levels 1.6 – 1.10) they represent mathematics and statistics, communication, technology, data, critical thinking, teamwork, learning how to learn, entrepreneur and Self-management. - Academic = STEM, arts and humanities, social and behavioral sciences

At Level 2 where two and four-year colleges and university are placed:

- Skills4Industry include Mathematics and statistics, communication, technology, data, critical thinking, teamwork, learning how to learn, planning, entrepreneur and Self-management.
- Academic = Physical and biological sciences, mathematics and technology, arts and humanities, social and behavioral sciences

The pressure of a gold standard provides a vigorous way to develop a good scoring system. Linear regression is a curve fitting technique used to weigh input features to correctly approximate the right answers to the gold standard instances.

The skills4industry gold standard represents curricula proxies because they correlate well with the core competency requirements for academic and work skills. They are a proxy because they relate occupational, and relational skills with the academic subject matter at all grade levels, by acting as a translation device for these important work skills. The statistics stating that the value of a good employee should correlate with their importance is important to the considerations given to Skills4Industry as the gold standard of work competencies. It should, therefore, follow the saying – “the higher the level of Skills4Industry possessed by an employee the better their work product, and hence their value to an organization.”

**Note:** Our book (Global Competence Standard) planned for release fall 2019 contains the complete mathematical, statistical and probabilistic calculations for all decisions and free codes for our predictive models. Here is a snapshot of how we arrived at the Gold Standard. In the nearest future, this site will have a link for developers that will provide a snapshot of codes.

**Correlation**

Suppose we are given two variables x (academic knowledge) and y (Skills4Industry), represented by a sample of n (work task) points of the form (x; Yi), for 1 ≤ i ≤ n. We say that x and y are correlated.

This example also reflects the relationship between establishing Occupations by ISTD (x) and Competencies (y). The degree for which good occupational skills is a function of competencies (academic + Skills4Industry) is determined by the correlation coefficient (X, Y) to which Y is a function of X and vice versa. Zero or no relation means that when Y goes up X importance comes down. At the elemental and primary skills or repetitive (machine) skills this example is the correlational relationship between academic and Skills4Industry.

A further breakdown using correlation coefficient value for work, we found that the higher the relational (communication, teamwork, and problem solving) skills required to perform a task the higher the tasks’ hierarchical level, and will exhibit zero relational at the entry levels. [O=no relation, 1=fully correlate, -1= anti correlated]

This follows the assertion that “you are less likely to be unemployed the more education plus Skills4Industry you have,” which is a good example of negative correlation. Meaning the level of Skills4Industry can help predict employment status.

Correlation around zero is useless for forecasting. We observed correlation to understand when the connection between variables is real.

The Pearson correlation coefficient defines the degree to which a linear predictor of the form y = m.x+b can fit the observed data.

The Spearman rank correlation coefficient counts the number of pairs of input points which are out of order. Suppose that our dataset contains points (X₁,Y₁) and (X₂, Y₂) where X₁ ‹ X₂ and Y₁ ‹ Y₂. These are votes saying the values are positively correlated. Whereas the vote would be for a negative correlation if Y₂ ‹ Y₁.

X = competencies (determined by what individuals must be able to do to be fit for a task)

Y = Levels (the level of performance required to stay on top of your skill game or make progress to other tasks)

The correlation coefficient r reflects the degree to which x can be used to predict y in a given sample of points. As /r/ → 1.

Predictive value y (academic) from x (Skills4Industry), with the parameters, m, and c corresponding to the best possible fit. The variance of the full data set (occupational standard) V (Y) should be much larger than V (v) if there is a good linear fit f(x). There should be no residual error if x and y are totally uncorrelated, and V (v) = 0. If x and y are totally uncorrelated, the fit should contribute nothing and V (y) ≈ V (v). Generally speaking, 1 – v ² = V (v) / V (y).

The points representing the different definitions of the purpose of a job will appear left and right points on a plotted vᵢ = yi –f (xᵢ) this will show those with residual values that have lower variance and mean zero. The closer definitions of the purpose of a job will have data points on the left with the corresponding residual on the right. The set of points left will be admitting a good linear fit, with correlation r= 0.94. The corresponding residual ri = yᵢ –f (x₁) are plotted on right. As a result, the variance of the y values on the left V (y) = 0.056, substantially greater than the variance V (r) = 0.0065 on the right.

By using the best fit method that depends on correlation implies that we take into consideration the saying that “correlation does not imply causation”.

Correlate strongly, for example, the number of applicants for a given pathway credential correlate strongly with the demand for the competencies in the labor market but credentialing institutions did not cause the spike in the demand.

The credentialing institutions will impact demand is a common error in thinking that correlation implies causation.

This is why we support a strong High School to employment transition at Skills4Industry, the courses already have very good and globally competitive rigor, what is missing is a strong work-based learning culture.

**Artificial Intelligence Predictive Models**

Modeling is the process of encapsulating information into a tool which can forecast and make predictions. Predictive models are structured into what causes events to happen in the future. What really causes events to happen? Two world views: 1) that the future will be like the past-this notion extrapolate data from recent trends and observations; 2) the principled notions of causation, which provides an explanation for why things happen. These notions require a detailed understanding of possible choices. A model should have a good sense of its limitations and boundaries.

Please visit this page to understand Skills4Industry philosophy to understand how learning is about using the past with a set of design thinking compositional tools to invent the future because philosophy is about thinking in a fundamental way about what we are trying to do and why.

Machine learning techniques are governed by equations that weigh each feature variable by a coefficient reflecting its importance, and sum up their values to produce a score. This can be used to fit training data that yields very effective models. However, the world is not linear. Higher order polynomials, logarithms, and exponentials fit training data more tightly than linear functions. Although it’s hard to find the best possible coefficients to fit non-linear models. Despite inherent difficulties in optimization deep learning methods, based on neural networks offer the best solution. Linear Models are readily understandable, generally defensible, easy to build and avoid over-fitting on modest-size data.

While defining the purpose of a specific job title to nth point, which is the point showing that correlation does not in logical reasoning imply causation, we need tools to tease out whether A really causes B. This will help us apply probabilistic tools in predicting the competencies that will be in demand in the future with a given timeline for adequate preparation based on the timeline required by individuals to learn.

Many human activities proceed with a seven-day cycle associated with the workweek.

Every year large populations of students provide work services to employers in ISTD or a two to three-year cycle in an effort to quantify this economic contribution. How can we correlate the values of Sᵢ with Sᵢ + p, for all 1 ≤ i ≤ n-p. If the values are in sync for a particular length P, then this correlation with itself will be unusually high relative to other possible logged values.

Comparing a sequence to itself is called an autocorrelation, and the series of correlations for all 1 ≤ K ≤ n- 1 is called the autocorrelation function. A time series of daily lectures data shows an autocorrelation function. The peak at a shift of seven days (and every multiple of seven days) establishes that there is a weekly periodicity in teaching that will support the notion that more stuff gets taught over the weekend than weekdays. Autocorrelation is an important concept in predicting future events because it means we can use previous observations as features in a model. The heuristic that tomorrow’s weather will be similar to today’s is based on autocorrelation, with a log of p = 1 days.

A logarithm is the inverse exponential function y = bᵡ an equation that can be rewritten as x= Logb y=y.

Exponential functions grow at a very fast rate: Consider b = {2¹, 2², 2³, 2⁴,...}

In contrast, Logarithms grow at a very slow rate: these are just the exponents of the previous series {1, 2, 3, 4…}. They are associated with any process where we are repeatedly multiplying by some value of b, or repeatedly dividing by b.

Definition: y = logb x ↔ bᵞ = x