g5ktestbed

Automated testbed on top of Grid5000

Informations

Languages
  • Python->=3.4
  • Ansible
  • Bash
Lines
~4078
Status
stopped

Links


Description


Artemis

Distributed, fault-tolerant web crawler.

Informations

Language
  • Python->=3.4
Services
  • transmission-deamon
  • Stem(Tor)
Lines
~4800
Status
stopped
Version
3.4.9.dev1

Description

Main characteristics

  1. Protocols : http(s)/ftp(s)/tor hidden service/magnet URIs
  2. Automatic authentication handling for : ftp/http basic/http digest/form html
  3. Customs handling rules (domain specific)
  4. Customs harvesting rules
  5. Smart crawling in order to spare bandwidth
  6. Optional secure connection(ssl) between crawling nodes

Architecture overview

  1. Slave : node harvesting the remote services(ex: web pages).
  2. Master : dividing the work over slaves (based on URIs) and handling harvesting rules.
  3. Monitor : handling the loadbalacing over manager. Only one active leader.
  4. Admin : used in order to monitor the crawling cluster.

Bibliography :

  1. Adrian Kosowski, Time and Space-Efficient Algorithms for Mobile Agents in an Anonymous Network, , 2013.
  2. J Villadangos, Frederico Fariña, Manuel Prieto, Alberto Corboda, Efficient leader election in complete networks, PDP, 2005.
  3. David Thaler, Chinya V. Ravishankar, Using Name-Based Mappings to Increase Hit Rates, IEEE/ACM Trans. Netw. 6(1), 1998.

Fstral

Fork of Flink in order to support integration with Planner

Informations

Language
  • Java
Lines
~3000
Status
stopped

Links


Description