
Web-bot and Webcrawler Development ServicesWe develop web-bots or webcrawlers, which fetch the data from the web or other sources. Either the data that the web-bots or webcrawlers retrieves is publicly available or not. We do not develop malicious software that is intended for spam or infringement of anybody's rights.
Web-bot or webcrawler is a program that crawls through the web sites and collects the needed information from them. What info they can collect? In one word, any you want - product descriptions, prices, links, addresses, pictures etc. The collected information is then stored in the required database or file.
Features of our web-bots or webcrawlers:
- full automation of a web site visitor’s actions (including automatic browsing, signing up new accounts, login using different user accounts, filling and submitting forms etc.)
- regular expressions to retrieve the needed data from web pages
- XML parser to extract the needed data from web services
- sophisticated algorithms and methods to filter and search the interested information
- multi-threads to increase the performance
- retrieving web pages in compressed format e.g. gzip
- caching downloaded web pages to save time and bandwidth
- using open source JavaScript engine, such as V8 from Google and Rhino from Mozilla, to go through dynamically-constructed web pages
- storing output data in a preferable format: database, CSV file, excel, XML file or any you need
- sending email notifications in the predetermined cases
- http, https, ftp, ftps support
- http or socks proxy support to crawl web sites anonymously
- restoring the previous crawling session if it was broken, so that the web-bot or webcrawler can restart its work from the point where it was interrupted or crashed
- automatic quality assurance to ensure the data harvested from the web sites
- web browser interface so that you have a possibility to see the work session and intervene into it manually if need
- web graphic user interface to manage and monitor the web-bots or webcrawlers
We built a framework to develop web-bots or webcrawlers much efficiently based on Java technology, and a network system to host, monitor and manage web-bots or webcrawlers efficiently and cheaply. The network system can be scaled up when the number of web-bots or webcrawlers increases. The user can specify schedules, export crawled data, and view the daily reports for each web-bot or webcrawler.
See our demo site. Use "admin" to login as an administrator, "developer" to login as a developer, "user" to login as a user / client, "guest" to login as a guest. No password is required.
If you would like further details on this or on having your own web-bots or webcrawlers, please contact us via our contact page. Put our state-of-art technology to work for you.









