Educational resource data are a collection of final documents obtained by users, including full-text journals, books, dissertations, newspapers, conference papers, and other database materials. While searching for information in the educational resource database, these resources also have functions such as copying, downloading, reproduction, and dissemination, which raise the issue of expression and protection of intellectual property. Machine learning takes how computers simulate human learning behaviors as the main research content, which can independently determine learning objects, construct their characteristics, perform additional operations beyond the limitations of preset instructions, and discover value from the expression of relative works. On the basis of summarizing and analyzing previous research works, this paper expounded the current research status and significance of intellectual property expression and protection of educational resource data; elaborated the development background, current status, and future challenges of machine learning technology; introduced the methods and principles of data classification algorithm and protection authority identification; performed the technical framework design and expression system establishment of the intellectual property expression of educational resource data based on machine learning; analyzed the mode optimization and rule management of intellectual property protection of educational resource data based on machine learning; and finally conducted a simulation experiment and its result analysis. The results show that the machine learning technology can build a subject-oriented, highly integrated, and time-changing educational resource data storage environment; the comprehensive, analysis-oriented decision-supporting system formed by machine learning can give full play to the potential role of data integration and value discovery and is therefore of great significance for the intellectual property expression and protection of integrated and complexly-related educational resource data. The study results of this paper provide a reference for further research on the intellectual property expression and protection of educational resource data based on machine learning.
1. Introduction
Educational resource data are a collection of final documents obtained by users, including full-text journals, books, dissertations, newspapers, conference papers, and other database materials. In the data expression and protection of intellectual property educational resources, many documents are collected in commercial databases with the right to use, and the resources in the library are searched, downloaded, and integrated into characteristic databases [1]. These resources also have copying, downloading, copying, dissemination, and other functions, which caused intellectual property issues. According to different types of educational resources, machine learning educational resource data can be divided into government educational resource data, other public institution educational resource data, corporate educational resource data, and personal educational resource data [2]. The mode optimization can generate massive amounts of data in real time and process them in real time to ensure that intellectual property education resource data become a handy resource anytime and anywhere. In the machine learning environment, educational secret data may not be lost due to the protection of a data backup system, but educational resources may lose control of educational secret data due to data migration obstacles [3]. Machine learning may be suspected of infringement of technical measures or violations of laws and regulations, which can independently determine learning objects, construct their characteristics, perform additional operations beyond the limitations of preset instructions, and discover value from the expression of works [4].
The original material for machine learning is data, and how to deal with the rights above data, such as the right to privacy, personal information, and trade secrets, is a major legal issue facing the development of artificial intelligence technology [5]. The rule management of intellectual property rights protection can test not only the correlation between the characteristic variables and the data quality of intellectual property education resources but also the importance of the characteristic variables to the data quality of intellectual property education resources. The strict regulation and improvement of the liability clause, dispute settlement clause, and confidentiality clause of the license agreement can ensure that the work is copied and disseminated under reasonable use purposes, thereby restricting the dissemination and scope of the data work and preventing illegal use, so that the legitimate rights and interests of the right owner of the data work will not be harmed and machine learning is a science of artificial intelligence [6]. Algorithms mainly including decision trees, support vector machines, neural networks, genetic algorithms, and machine learning technology are a key link in data mining and data protection [7]. The intellectual property expression and protection of educational resource data must be established on the basis of technological innovation in order to implement the concept of public welfare, expression, and protection of knowledge [8].
On the basis of summarizing and analyzing previous research works, this paper expounded the current research status and significance of intellectual property expression and protection of educational resource data, elaborated the development background, current status and future challenges of machine learning technology, introduced the methods and principles of data classification algorithm and protection authority identification, performed the technical framework design and expression system establishment of the intellectual property expression of educational resource data based on machine learning, analyzed the mode optimization and rule management of intellectual property protection of educational resource data based on machine learning, and finally conducted a simulation experiment and its result analysis. The study results of this paper provide a reference for further researches on the intellectual property expression and protection of educational resource data based on machine learning. The detailed chapter arrangement is as follows: Section 2 introduces the methods and principles of data classification algorithm and protection authority identification; Section 3 performs the technical framework design and expression system establishment of the intellectual property expression of educational resource data; Section 4 analyzes the mode optimization and rule management of intellectual property protection of educational resource data; Section 5 conducts a simulation experiment and its result analysis; Section 6 is the conclusion.
2. Methods and Principles
2.1. Data Classification Algorithm
Intellectual property education resource data constitute two important data sets in data mining: training data and test data. In terms of content selection, the possibility of intellectual property protection is inversely proportional to the breadth of data and information; that is, the more comprehensive the collection of data and information is, the less selective it is and the less its originality in the selection of content. The comprehensiveness of the data information content is exactly where its educational value lies.
The consistency of educational resource data refers to the consistency of the distribution of data in different domains. Suppose n different fields X = {x1, x2, …, xn} and the number of resources in each field is {y1, y2, …, yn}, then the educational resource data consistency y (xi) of the candidate data xi is defined aswhere f (xi) is the number of expressions of candidate data appearing in resource xi. When the candidate data f (xi) is more evenly distributed in each resource of the intellectual property, the consistency of the educational resource data y (xi) is also greater, indicating that it is likely to be filtered data.
When the computer uses the training set to train the model, overfitting may occur; that is, the training sample reaches very high approximation accuracy, but the approximation error of the test sample first decreases and then rises with the number of training times. The random forest model will use the exponential gain Q (xi) as the basis for the selection of the decision tree:where a (xi) is the proportion of the number of samples of category xi in all samples; b (xi) is the decision tree judgment node of the samples of category xi; c (xi) is the number of split points of samples of category xi; and d (xi) is the category xi of the number of samples in all samples.
For machine learning, it is easy by changing the protection structure of data information to circumvent copyright protection, which will make the protection of data information meaningless. The value and significance of data information lies in the information material itself rather than the structural order of the information material. The structure that loses the content of the information is just an empty shelf without a soul, with little protection value at all. Therefore, even for original data information, the principle of only protecting expressions but not protecting ideas makes the protection of data information still very weak.
The weighted average of the expression difference ratio of the candidate educational resource data is used in the field of intellectual property, and a comprehensive index to filter the infringement data in the resource data set is defined as follows:where e (xi) and are respectively the total number of expressions of resource data xi in the intellectual property field; h (xi) and k (xi) are the respective contributions of the intensity and expression difference ratio of data resources xi; and l (xi) and m (xi) indicate the frequency of expression and protection of candidate data resources xi in the field of intellectual property.
As a result, data information that does not have originality is a ubiquitous thing and will increase greatly with the development of the information service industry, but intellectual property cannot provide corresponding protection for data information that does not have originality. According to the principle that machine learning only protects expressions but not ideas, what intellectual property protects of data and information is its original choice or protected expression, not the content it chooses or protects.
2.2. Protection Authority Identification
The intellectual property expression and protection of educational resource data refer to the whole process of resource data collection, input, processing, analysis, regeneration, and output. At present, many intellectual property education resource data management systems reflect the expression and protection of some resource data. Through the realization of data expression and protection, the informatization, intellectualization, and even decision-making management of intellectual property education resource data can be completed.
The average expression probability of educational resource data xi in a single intellectual property indicates the intensity of xi in the resource field, so the intensity E (xi) of resource data xi can be expressed aswhere o (xi) is the frequency of expression of educational resource data xi in the entire intellectual property; and p (xi) is the number of expressions of resource data xi in the field of intellectual property.
In addition, how to use data warehouse technology to build a subject-oriented, highly integrated, stable, and time-changing data storage environment to form a comprehensive, analysis-oriented decision support environment has become an urgent issue for data integration of intellectual property education resources. Another important question is related to the integral educational resources, which reflects the complex correlation between intellectual property education resource data, similar to the intertwined network of relationships.
The time domain analysis method is to express the information distribution of the educational resource data xi as a function of time; the frequency domain analysis method is to obtain the frequency domain and its energy frequency domain distribution through the transformation of the educational resource field, so the transformation of the educational resource data T (xi) iswhere ai is the expression range factor; bi is the protection efficiency factor; and ci is the total number of data to be protected. The intellectual property expression of educational resource data includes two steps: one is to establish a statistical model through the identification and calculation of the classification of the training set flow; the other is to apply the established statistical model to the unknown and new flow classification in the network traffic. So the probability that each educational resource data belongs to a specific application iswhere q (xi) is the prior probability of educational resource data xi; r (xi) is the conditional probability of given educational resource data xi; s (xi) is the number of times the educational resource data xi performs information automation; and t (xi) is the total number of times of information processing of educational resource data xi. This method assumes that the characteristics of educational resource data are independent of each other.
From an overall point of view, the competent information department takes charge of the overall situation and integrates all business systems. Based on the classification of intellectual property education resource data, data expression and protection will realize the standardization and distributed management of data and ensure the smooth flow of data. Partially, the administrative and teaching departments of universities will be responsible for the management of specific educational resource data, and data expression and protection will realize the specific business of each department.
3. Intellectual Property Expression of Educational Resource Data Based on Machine Learning
3.1. Technical Framework Design
Educational resource data types include not only structured data, but also a large amount of semistructured data and unstructured data, and structured data refer to data with a fixed format and limited length. Unstructured data refer to data with variable length and no fixed format. Data expression and products of educational resources based on machine learning are usually protected by copyright in the early days, but copyright protects the expression form of ideas, the data expression and the selection, arrangement, system, and structure of the work, rather than the protection of the data itself [9]. And if the data are provided in the form of public dissemination, it will inevitably be exposed to and obtained by an unspecified number of people, and the service provider cannot guarantee that its application purpose must comply with the rules, which creates the risk of copyright infringement. Therefore, the data work operator should establish a reasonable control method, use one-to-one network transmission, and protect the security of the data work in terms of identity authentication, access permission control, and control of the operating environment, so as to effectively prevent piracy, illegality, and the occurrence of copying behavior. In this case, the license agreement for the data work becomes a legally binding agreement between the user and the data operator regarding the purchase of the copyright license for the data work. Figure 1 shows the framework for intellectual property expression and protection of educational resource data based on machine learning.