Score of each node in Dom tree.

Score of each node in Dom tree.

Source publication
Article
Full-text available
Usually, in addition to the main content, web pages contain additional information in the form of noise, such as navigation elements, sidebars and advertisements. This kind of noise has nothing to do with the main content, it will affect the tasks of data mining and information retrieval so that the sensor will be damaged by the wrong data and inte...

Context in source publication

Context 1
... extract the results of a webpage based on this rating. From Figure 5, we can see that the difference between the text density of the two Dom tree nodes is 42, and after filtering by Equation 2, we can see from Figure 6 shows that the smallest difference between the Dom tree node scores is 1130. Enlarging the difference in text density is more conducive to finding noise links in dynamic template data. ...