Datasets: 31
Downloaded: 401

Recent datasets

  • Image data
  • Laurent Besacier
  • SPEECH-COCO is an augmentation of MS-COCO dataset where speech is added to image and text. Speech captions were generated using text-to-speech (TTS) synthesis resulting in 616,767 spoken captions (>600h) paired with images. Disfluencies and speed perturbation were added to the signal in order to sound more natural. Each speech signal (WAV) is paired with a JSON file containing exact timecode for each word/syllable/phoneme in the spoken caption. Such a corpus could be used for Language and Vision (LaVi) tasks including speech input or output instead of text.> ...   ...
  • Open
  • Web data
  • A sample of the INA RDF data
  • Manuel Atencia
  • This is a sample of the RDF data owned by INA (Institut national de l'audiovisuel) — a repository of all French radio and television audiovisual archives — made it publicly available for scientific purposes. The whole INA RDF data (around 6 million RDF facts) was used in experiments for evaluating a novel import-by-query algorithm for data interlinking (see the related publication). These experiments allowed discovering person homonyms in the INA dataset (see the related dataset "A sample of owl :sameAs links within the INA RDF data").> ...   ...
  • Open
  • Web data
  • A sample of owl:sameAs links within the INA RDF dataset
  • Manuel Atencia
  • This is a sample of the owl:sameAs links discovered by the import-by-query algorithm (see the related publication) within the INA RDF dataset. The sample refers to person homonyms. The algorithm used DBpedia as a external source and a set of 35 rules translating semantic constraints associated to the RDF datasets, domain knowledge, vocabulary mappings, and owl:sameAs transitivity. In total, 4,884 owl:sameAs links and 9,764 owl:differentFrom links were discovered. A sample of the corresponding IN A RDF data may be found in the related dataset "A sample of INA RDF data" also available at Perscido platform.> ...   ...
  • Restricted
  • Video data
  • MobileRGBD
  • Dominique Vaufreydaz
  • MobileRGBD is corpus dedicated to low level RGB-D algorithms benchmarking on mobile platform. We reversed the usual corpus recording paradigm. Our goal is to facilitate ground truth annotation and reproducibility of records among speed, trajectory and environmental variations. As we want to get rid of unpredictable human moves, we used dummies in order to play static users in the environment. Interest of dummies resides in the fact that they do not move between two recordings. It is possible to record the same robot move in order to evaluate performance of detection algorithms varying speed. This benchmark corpus is intended for "low level" RGB-D algorithm family like 3D-SLAM, body/skeleton tracking or face tracking using a mobile robot.> ...   ...
  • Open
  • Trace data
  • LTTng Execution Traces of 10 Phoronix Benchmarks
  • Vania Marangozova-Martin
  • This dataset contains the execution traces of 10 Phoronix benchmarks (e.g. compress-gzip, ffmpeg, iozone, network-loopback, phpbench, pybench, ramspeed, scimark2, stream, unpack-linux). The traces concern three different tracing configurations, namely kernel, memory and performance counters. They have been obtained on a standard Linux machine and on the Juno platform. Each configuration has been run 32 times on the Linux machine and 1 time on the Juno board. > ...