Design and Implementation of an Educational Aid System Based on Data Mining
https://doi.org/10.65281/639844
Peipei Xin1, a
1School of Information and Business Management, Dalian Neusoft University of Information,Dalian, 116023,China,
a15140399221@163.com
Abstract: In the development of higher education construction, based on information technology to optimize the practice of management mode, the information technology needs of various functional departments show a diversified trend, the development of information technology in colleges and universities is moving towards the goal of decision-making data center as the core, so the application of the auxiliary system of colleges and universities based on data mining is highly valued by people. In order to guarantee that the basic information of teaching, enrollment, employment and other basic information of colleges and universities is stored and applied, and to provide information support for the management decision-making of each functional department of the school, it is vital to explore the auxiliary system of college education under the support of existing information technology. In this paper, after understanding the application content of genetic BP neural network algorithm, we explored the design architecture of the auxiliary decision support system for colleges and universities based on data mining, and tested and analyzed the performance of the system application, so as to verify the effectiveness and feasibility of the system architecture design.
Keywords: bp neural network; genetic algorithm; data mining; higher education; assistive system
- Artificial Neural Networks
Artificial neural networks contain a large number of functionally simple and adaptive information processing units, which are also known as artificial neurons, and are able to be effectively connected using topology in accordance with a massively parallel manner [1]. Assuming that xirepresents the information income from neuron I acquired by neuron j, and Ojrepresents the information output from neuron j, then the corresponding expression formula is as follows:
In the above equation, represents the neuron to neuron weight values and f() represents the neuron excitation function.
Different excitation functions are chosen to present different information processing characteristics, which is one of the main factors affecting the overall performance of neural networks [2]. Threshold type excitation function as one of the common forms, the actual unit step function is as follows:
Neurons with this role are also called threshold-type neurons and belong to the simplest type of neuron model. And the segmented linear excitation function means that the input and output of the neuron conform to a linear relationship within a certain range, and this kind of function can also be called pseudo-linear function, and the actual expression formula is as follows:
In the nonlinear excitation function, the function belongs to the non-decreasing continuous function of the closed set of the real number domain R to [0, 1], which represents the state-continuous neuron model, and the single-stage type S-type function is defined by the following formula:
1.1 BP neural network algorithm
BP networks, as one of the most widely used and researched content, actually only have three and more layers of hierarchical type neural networks, where the neurons between the layers can be effectively connected and there is no connection within the neurons of each layer [3]. Assuming that the set of input samples of the BP neural network is (X, Y), X represents the input vector and Y represents the ideal output vector corresponding to X, then it can be obtained:
The input and output vectors of the implicit layer unit are as follows, respectively:
The output and actual output vectors of the output layer are as follows:
The connection weights from the input layer to the implicit layer and from the implicit layer to the output layer are as follows:
In the above equation, n represents the number of units in the input layer, h represents the number of units in the hidden layer, m represents the number of units in the output layer, and P is the Pth sample.
In the forward propagation stage, samples are taken from the sample set and fed into the network to compute the corresponding actual output OOp. The input NETH of the analyzed hidden layer is first computed as follows:
Then get the final actual output OO of the network based on NETH:
And then the input NETO is obtained from the OH output layer as follows:
Then the actual output OO of the network is obtained based on NETO with the following formula:
In the backward propagation stage, the network error needs to be calculated and analyzed first. In general, the error function of the network about the pth sample is calculated as follows:
The error function of the network with respect to the entire set of samples is defined by the following formula:
To reduce the network training bias due to the sample order, the weight values are generally adjusted according to the total error of the network about the whole sample set, so the error E formula of the network is as follows:
In the above equation, recnum represents the memorized data of the sample set.
After obtaining the error, the weight matrix W of the output layer is to be adjusted first according to the result of its adjustment to the weight coefficients of the network, which is given in the following formula:
The flow of the overall algorithm is shown in Figure 1:
Fig. 1 Flowchart of BP neural network algorithm
1.2 Genetic Algorithms
Genetic algorithm as an adaptive global optimization probabilistic search algorithm formed by simulating the whole process of genetic evolution of organisms in the natural environment, originated from the research on natural and artificial adaptive systems by scientific researchers and scholars in the 1960’s, which can provide an effective way to solve the optimization problem [4]. The whole genetic algorithm can be described by utilizing nine variables, which are formulated as follows:
where the initial members are:
In the above formulation, the coding of the solution is , the number of members is , the coding length is , the evaluation function is , the biparental selection operation is , the crossover operation is , the mutation operation is , and the termination judgment basis is . The flow of the genetic algorithm is shown in Fig. 2 below:
Fig. 2 Flowchart of the genetic algorithm
- Research on the design of auxiliary decision-making system for college district based on data mining
2.1 System architecture
After mastering the BP neural network algorithm and genetic algorithm in the artificial neural network, according to the practical significance of the design and application of the college auxiliary decision-making system, the creation of data mining as the core of the college auxiliary decision-making system, selecting the more popular B/S mode development research, database construction selection of the more mature technology of the Greenplum database, platform access to utilize the form of Web data access to reduce additional programming work [5]. In order to reduce the additional programming work [5]. which contains four aspects of enrollment management, teaching management, performance management and employment management.
2.2 Functional Modules
First, enrollment management. In the auxiliary system of colleges and universities based on data mining, this module will collect and organize the enrollment year, volunteer information, admission information and many other contents, which can be based on the analysis results provided by the data mining system to provide convenient conditions for the daily enrollment of colleges and universities, to simplify the workflow of practical management, and to improve the efficiency of enrollment work on a daily basis [6].
Second, grade management. This module contains a number of contents such as student information, course information, grade information, etc., which can be analyzed according to the information feedback from the database to provide help for daily education and teaching decision-making, and avoid mistakes in efficient education management as much as possible [7].
Third, teaching management. This module contains student information, school resources, teacher information and other content, which is mainly used for the implementation of the daily teaching management work, effective integration of teachers and students, curriculum, teaching and research and other information, in the integrated and automated management way, reduce human error, can provide statistical analysis function for school management and subject teachers, they put forward effective education management decisions as soon as possible [8].
Fourth, employment management. This module contains a number of information such as student employment, recruiters, industry specialties and geographic areas, and can use data mining technology and the corresponding algorithms to effectively match students of various majors with enterprises in various industries, providing an effective platform for communication and exchange between them, thus improving the employment rate of the school to enhance the employment level of the students. In addition, the platform also provides industry data analysis services, which can effectively count the employment situation of various regions and majors, and provide powerful support for university leaders to improve the existing training system, enrollment and teaching direction [9].
2.3 Database and data integration
As the overall system architecture design has been made clear, the existing network has the characteristics of multi-level and multi-level, so the database security has been given high priority, the existing network database system design will use advanced operating systems, select the multi-level security database multi-level security model, and comprehensively consider the operating system and database servers, application servers and the security of the security of the priority to the use of virtual private network technology The construction of the corresponding database system, in order to meet the operational requirements of the system architecture [10]. Generally speaking, the data table is generally one-to-one, one-to-many, many-to-many relationships, but the data can only correspond to an entity, and an entity must have a primary view, there can be no external view, and some of the data is the opposite [11]. The data table itself must have the following characteristics: first, the fields in the basic table cannot be broken down any further; second, the records in the basic table are the original data records at the very beginning; again, according to the data in the basic table and the code table, it can be developed into other data tables; finally, the structure of the basic table is fixed and the records in the table need to be stored for a long time [12]. According to a series of functional requirements of the management system, the overall database table design number is high, in which the user login information table is shown in Table 1, the basic student information table is shown in Table 2, and the course information table is shown in Table 3:
Table 1 User login information table
| Column Name | Description | Data Type | Length | Nullable | Primary Key |
| Username | User login | varchar | 30 | No | Yes |
| Password | Login password | varchar | 30 | No | No |
| Permission | Access rights | varchar | 20 | No | No |
Table 2 Basic information of students
| Column Name | Description | Data Type | Length | Nullable | Primary Key |
| _Student number | Student ID | varchar | 30 | No | Yes |
| _Name | Name | varchar | 10 | No | No |
| _Gender | Gender | bit | 1 | No | No |
| _Resumenumber | Resume Number | varchar | 10 | No | No |
| _Nation | Ethnicity | varchar | 30 | No | No |
| _Birth Place | Birth Place | varchar | 30 | No | No |
| _Birth Date | Birth Date | datetime | 8 | No | No |
| _Height | Height | real | 4 | No | No |
| _Weight | Weight | real | 4 | No | No |
| _Political Face | Political Face | varchar | 10 | No | No |
| _Major | Major | varcha r | 30 | No | No |
| _Major Rank | Major Rank | varchar | 20 | No | No |
| _Certificate | Certificate | varchar | 10 | No | No |
| _Student Work | Student Work | varchar | 50 | No | No |
| _Honor | Honors | varchar | 50 | No | No |
| _Social Practice | Social Practice | varchar | 50 | No | No |
| _Hobby | Hobby | varchar | 30 | No | No |
| _Job Direction | Job Direction | varchar | 30 | No | No |
| _Selfassessment | Self-assessment | varchar | 50 | No | No |
| _Photo | Photo | varchar | max | No | No |
| _Contact | Contact | varchar | 30 | No | No |
Table 3 Course Information Sheet
| Column Name | Description | Data Type | Length | Nullable | Primary Key |
| _ClassID | Class ID | varchar | 10 | No | Yes |
| _ClassName | Class Name | varchar | 30 | No | No |
| _ClassNO | Class Number | varchar | 10 | No | No |
| _ClassYear | Class Year | varchar | 10 | No | No |
| _Major | Major | varchar | 30 | No | No |
| _Members | Number of Members | varchar | 5 | No | No |
| _IsEntity | Is Entity Class | bit | 1 | No | No |
| _Instructor | Instructor | varchar | 30 | No | No |
- Analysis of results
3.1 System testing
In software testing, a variety of means will be used to test a certain function of a system, to analyze whether the final test results are in line with the requirements analysis, and can be continuously improved and improved according to the practical work [13]. On the one hand, in the process of formulating software test plan, editing software test, deciding to configure the software test environment, designing and generating test cases, implementing test and outputting report, to complete the basic work of software testing, to make it clear that all the links of the software error, to ensure that the test results can be suppressed and reproducible; on the other hand, according to the program description, the requirement to complete the verification and analysis of the correctness and completeness of the software. To analyze the test data automatic generation program, static analysis program, dynamic analysis program and other content [14].
In the experimental process, to teaching management, for example, the system design provides more service functions, such as deletion, which is equivalent to the user in the database directly release the relevant information. The original data will not appear in the database, the overall interface presents a more concise and convenient, the practice of technology operation is fast and effective; as an example, in the case of student evaluation of teaching management, you need to select a course first in the system interface, and only after selecting the course can you evaluate the teacher [15]. System design situation, after the completion of the selection, the relevant evaluation information will establish a database to be stored directly, the student evaluation is completed without other permissions to view the information, which protects the authenticity and validity of the evaluation of teaching data, but also reduces the risk of unnecessary security; from the system interface design operation procedures, its main and the database has a close connection, so the system test the most important is to assess the database connection, which contains database updating, querying, and the database connection, and the system test is the most important is to assess the database connection. Which contains the database update, query, delete and other basic functions [16]. Since the technical operation of the above content can be directly reflected in the log, and can monitor the login information of teachers, students and other users in real time, so the overall system architecture design is more efficient management. The overall experiments were chosen to test the uploading of 100K and E1M files in 1M, 2M, and 4M environments, respectively, to test and analyze the operation of data query, tools, etc., and ultimately found that the working time can be controlled within 5 seconds, which proves that the overall performance of the system is good. At the same time, each module can run normally and smoothly, basically meeting the performance test requirements.
3.2 Decision Realization
In the reform and innovation of China’s higher education, colleges and universities around the world are expanding the scale of operation at the same time, put forward higher requirements for practical education guidance management. The relationship between the number of students in colleges and universities has a far-reaching impact on the future development, and the good or bad source of students has a certain relationship with the student’s region, the type of examination, the gender of the candidates, the study of the profession and other information, through in-depth discussion of the relevant content can help the relevant departments to put forward the correct enrollment decision-making, help them to develop a reasonable enrollment program, optimize the professional teaching structure within the campus [18]. Good teaching environment and management mode is closely related to student information management, resource information management, faculty information management, etc. The use of data mining technology to analyze some kind of rules between each other can provide an effective basis for efficient teaching management decisions in the new era [19]. For example, in the process of correlation analysis of student information, it can be found that the current education and teaching in colleges and universities are facing various problems, such as the learning content does not match the students’ ability, the learning mode can not stimulate the students’ learning interest, etc., and it can also be understood that the students’ interests and professional skills in the learning activities, which can provide the recommended data for the guidance of the future employment and entrepreneurship education; in the process of correlation analysis of the information, it can be deeply excavated between teaching and In the correlation analysis information, it can deeply explore the correlation between teaching and teachers’ age, education and other information, and then help teachers and schools to optimize the management of teaching and form a good teaching atmosphere in the development of practice. As a direct result of students’ learning ability, students’ performance can also reflect the teaching ability of college teachers and the important direction of colleges and universities in training students. Analyzed through data mining techniques, the correlation between student performance and teachers and students, curriculum design and other influencing elements can not only help to efficiently build a high-quality curriculum system, rationally arrange the teaching tasks of various disciplines, but also guide teachers and students to adjust the existing teaching methods and learning methods, and improve the quality of teaching in schools [20].
Conclusion
In summary, based on data mining technology and artificial neural network algorithms, to provide technical support for the design and application of educational support systems for colleges and universities in the new era, to verify the advantages of the application of related technology algorithms in practical exploration, to learn to discover potential association rules from more information, to integrate and analyze the advantages and disadvantages of existing management decisions and traditional management modes, and to ultimately obtain the system architecture of the function module is perfect and runs smoothly, so that This will not only achieve the expected goals of education management, but also promote the high-quality development of colleges and universities.
References
[1] Xiuyu He. Research on Collaborative Filtering Recommendation Algorithm Based on Data Mining and Cluster Analysis [J]. Electronic Design Engineering, 2024, 32(9): 47-50.
[2] Ning Guo, Zhifu Gong. Data Information Analysis Algorithm Based on Data Mining and Feature Recognition [J]. Electronic Design Engineering, 2023, 31(5): 46-50.
[3] Fangjun Fan, Yuanping Ou, Xiaolong Liu, et al. Research on Information Security Algorithm Based on Data Mining [J]. Electronic Design Engineering, 2023, 31(3): 105-108. DOI: 10.14022/j.issn1674-6236.2023.03.021.
[4] Wujun Yan, Zhiqi Sun. Research and Application of Cluster Analysis Algorithm Based on Data Mining [J]. Journal of Taiyuan Normal University: Natural Science Edition, 2023, 22(1): 53-57.
[5] Honglue Zhang, Yi Wan, Jiajun Wang, et al. Method for Extracting Abnormal Data of Power Grid Dispatching Signals Based on Data Mining Algorithm [J]. Journal of Terahertz Science and Electronic Information, 2024, 22(7): 800-806. DOI: 10.11805/TKYDA2023381.
[6] Binbin Wang. Intelligent Instrument Data Recommendation Algorithm Based on Data Mining Technology [J]. Journal of Chengdu University of Technology, 2024, 27(2): 41-46.
[7] Tao Li, Jiang Xu. Multi-source Geographic Spatial Vector Data Mining Method Assisted by GIS [J]. Computer Simulation, 2024, 41(9): 465-469.
[8] Zhengyang Shi. Analysis of Information Security Algorithm Based on Data Mining [J]. Electronic Technology, 2025(1): 282-283.
[9] Huhe Qi. Research on Data Mining Classification Algorithm Based on Neural Network [J]. Digital User, 2023, 29(9): 28-30.
[10] Jiapei Zhou, Xiaochan Wang, Haoran Li, et al. Medicinal Pattern of Traditional Chinese Medicine in the Adjunctive Treatment of Peptic Ulcers in the Elderly Based on Data Mining [J]. World Chinese Medicine, 2023, 18(10): 1442-1446. DOI: 10.3969/j.issn.1673-7202.2023.10.017.
[11] Ruixia Yang, Andong Zhang, Ruiting He. Research on the Acupoint Selection Pattern of Acupuncture and Moxibustion in the Adjunctive Treatment of Coronary Heart Disease Based on Data Mining Theory [J]. Guangxi Medicine, 2023(2): 236-241.
[12] Junqiu Zhang, Jianguang Zhao, Fanming Meng, et al. Review of Related Models and Algorithms Based on Data Mining Technology [J]. China New Communications, 2023, 25(2): 45-48.
[13] Qingyun Tian, Cheng Wen, Liang Xu. Research on Clustering Algorithm of Data Mining Based on Cloud Computing [J]. Yangtze River Information and Communication, 2024, 37(9): 203-205.
[14] Ji Yao Lei. Application Research of Data Mining Algorithm Based on Association Rules in the Field of E-commerce [J]. Information and Computer, 2023, 35(16): 73-75.
[15] Lifen Qiu, Jian He, Huan Xiong, et al. Analysis of Director Chen Jianping’s Experience in Treating Epigastric Pain Based on Traditional Chinese Medicine Auxiliary Inheritance Platform [J]. Chinese Journal of Modern Distance Education of Traditional Chinese Medicine, 2024, 22(19): 104-106.
[16] Zaimei Zhang. Teaching Design of “Financial Data Mining” Course Based on BOPPPS Model – Taking “Apriori Association Analysis Algorithm and Financial Application” as an Example [J]. Science and Technology Education Herald, 2023(5): 112-114.
[17] Limin Liu, Yong Zhang. Data Mining Algorithm under Multi-label Implicit Knowledge Explicitization [J]. Computer Simulation, 2023, 40(4): 504-508.
[18] Xiaofeng Wu. Design of Intelligent Library Information Automation Retrieval System Based on Data Mining [J]. Automation Technology and Application, 2024, 43(4): 155-158.
[19] Jing Liu, Zhixun You. Design of Online Learning Prediction and Evaluation Model Based on Deep Learning and Data Mining [J]. Electronic Design Engineering, 2023, 31(15): 131-134.
[20] Zijian Chen. Fast Data Mining and Intelligent Screening Based on Cluster Analysis Optimization Algorithm [J]. Adhesives, 2024, 51(1): 189-192.