Please use this identifier to cite or link to this item:
Title: Scalability and Robustness Testing for Open Source Web Crawlers
Authors: Desheng Yang
Pree Thiengburanathum
Authors: Desheng Yang
Pree Thiengburanathum
Keywords: Arts and Humanities;Computer Science;Engineering
Issue Date: 3-Mar-2021
Abstract: This paper implemented the proposed framework. It focuses on evaluating the crawlers based on scalability and robustness on e-commerce websites the scalability is a feature that the system can adapt to the amount of data continuing to increase, and the performance does not decrease the robustness is an ability that can handle exceptions when web crawlers are crawling. Multiple testing environments were set up on e-commerce websites. Scalability testing and robustness testing were used to measure the scalability and robustness of web crawlers the scalability attributes and robustness failure rate were used to quantify the scalability and robustness. Statistical methods such as the Friedman test and the Nemenyi test were used to analyze the significant differences among crawlers the experimental results show Heritrix, Scrapy, and Nutch have the best overall scalability. In the non-interference test, Scrapy has the best robustness. However, Webmagic, Webcolletor, and Gecco have the best robustness in the interference test based on general test and database test.
Appears in Collections:CMUL: Journal Articles

Files in This Item:
There are no files associated with this item.

Items in CMUIR are protected by copyright, with all rights reserved, unless otherwise indicated.