Heterogeneous Data

Skills4Industry taxonomy dataset offers discrete set of possibilities represented by a data cluster that is classified into occupations by Industry, Sectors, Trade, and Domains (ISTD) columns and competency (academic, relational, work and domain contextual culture) in rows across all occupations. A functional analysis was conducted on over 200,000 job positions crawled from different sites on the web. These sites include the Canadian National Occupational Classification (NOC), UK occupational standards and jobs placement companies. To analyze the function of a job position we defined the purpose of the job title to the industry, sector, trade and cultural domain (community). A statistical best fit method was used to place the task definitions in an order of importance. With the most important at the top, and down to the list important. The most important tasks that spoke to trade (the lowest business groups), where organized trade groups do not exist we used industry (the largest grouping), as the benchmark of performance. While the tasks were statistically fit into requirements for knowledge, work skills, work tools, relational and domain cultural context.

A weighted sum using excel power function and different correlation tools were used to determine the competency requirements to achieve each task, based on over 228 dependent variables within each competency (independent variables). It is noteworthy that a common thread found as we transition from level one to ten is that requirements for relational skills increases, which follow an increasing number of Skills4Industry business network nodes.

Both the occupational and competency data are classified into a rules-based system designed to feed machine learning algorithms data structured within the decision boundary.

Skills4Industry hierarchical competency models are explicitly descriptive, allowing anyone to trace a final decision back to the appropriate top-level sub-problem, and report how strongly it contributed to making the observed results of a student (work-based learning) or employee (lifelong learning) or work transition (lifelong) learner.

Competency: Competence is defined by Skills4Industry as the mastery of academic knowledge, relational skills, work skills, and domain contextual cultural skills by an individual.

Ranking Competencies: Competence provide a means to define movement (vertically or horizontally) within a job. To achieve this movement levels based on the degree of demonstrable competence are implicit. By analyzing occupational profile and jobs advertising websites to extract job titles and that were decomposed into over 3.6 million tasks. Scoring functions were developed to rank these tasks by the amount of competency required to perform the task within a domain described under a job title. Although, the job title containing several tasks is only a pointer to the work domain and doesn’t represent all distinguishable properties of the job domain where a particular task is performed. We rolled these tasks under their different job titles into high groups (industry), middle group (sectors) and lower groups (trades). These occupations were ranked into ten level starting at level one by their degree of competency requirements. As a result, each column contains a set of occupational titles, job tasks, work skills, work tools, relational and domain context, each of these independent variables contain several dependent variables.

Every data scientist is a good storyteller, to reveal the story behind every task and job profile we ask very important questions to perhaps produce pages of explanations based on a single task. The information contained here is taken from our first book “The Global Competence Standards” capturing comprehensive details about transitions and the idea of global corporate citizenship. Our Future of Learning blog and podcast stories will connect different functional groups to global domain communities based on our continuous research on individuals transition to work across communities globally. Every one of these stories are true and will continue to represent what we call industry, sector, trade, and domain (ISTD) real-world application examples. The stories will improve in the retelling, and the dialogue will be punched up to make them more interactive for ISTD specific VR, AR, and game environment. This is intended to honestly trace the process of acquiring, applying and reinforcing competencies, including the bottleneck in cross-cultural transitions and acculturation experiences globally. People can watch how their competencies unfolded and acquire the proficiency to stay on top of their competency game from cradle through retirement.

Structured Data

Collecting and analyzing data to tell stories about which competencies are needed for what tasks, given how many relevant competencies there are, and how much domain-specific competencies are required to address specific tasks, requires heterogeneous data. Heterogeneous data are difficult to structure into a simple matrix, where the rows of the matrix represent distinct items or records, and the columns represent distinct properties of these items. For example, a data set about an individual’s competencies might contain one row for each domain with columns representing features like state, population, natural resources, climate, industry cluster, laws, affluence, and firm. However, confronted with unstructured data sources, such as a collection of occupational profiles and job descriptions from websites, our first step is generally to build a matrix to structure it. However, these sources tell less than half the story about all the different types of skills, knowledge and understanding employers need today. A competency model will thus, construct a matrix with a row for each task, and a column for each frequently used competency. such that Matrix entry M[i,j] denotes the number of times task i contains competency j.

Categorical Data

Categorical data consists of labels describing the properties of the objects (work tasks Vs competency) under investigation, like gender, hair color, and occupation. This descriptive information can be every bit as precise and meaningful as numerical data, but it cannot be worked with using the same techniques. Categorical data can usually be coded numerically. For example, gender might be represented as male = 0 or female = 1. But things get more complicated when there are more than two characters per feature, especially when there is not an implicit order between them. We may be able to encode hair colors as numbers by assigning each shade a distinct value like gray hair = 0, red hair = 1, and blond hair = 2. However, we cannot really treat these values as numbers, for anything other than simple identity testing. Does it make any sense to talk about the maximum or minimum hair color? What is the interpretation of my hair color minus your hair color? Classification and clustering methods can be thought of as generating categorical labels from numerical data and represent our adopted throughout Skills4Industry.