Nalazite se na CroRIS probnoj okolini. Ovdje evidentirani podaci neće biti pohranjeni u Informacijskom sustavu znanosti RH. Ako je ovo greška, CroRIS produkcijskoj okolini moguće je pristupi putem poveznice www.croris.hr
izvor podataka: crosbi !

A preliminary study on similarity-preserving digital book identifiers (CROSBI ID 626013)

Prilog sa skupa u zborniku | izvorni znanstveni rad | međunarodna recenzija

Vladimir, Klemo ; Šilić, Marin ; Romić, Nenad ; Delač, Goran ; Srbljić, Siniša A preliminary study on similarity-preserving digital book identifiers // Proceedings of the 9th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities. 2015

Podaci o odgovornosti

Vladimir, Klemo ; Šilić, Marin ; Romić, Nenad ; Delač, Goran ; Srbljić, Siniša

engleski

A preliminary study on similarity-preserving digital book identifiers

Due to proliferation of digital publishing, e-book catalogs are abundant but noisy and unstructured. Tools for the digital librarian rely on ISBN, metadata embedded into digital files (without accepted standard) and cryptographic hash functions for the identification of coderivative or near-duplicate content. However, unreliability of metadata and sensitivity of hashing to even smallest changes prevents efficient detection of coderivative or similar digital books. Focus of the study are books with many versions that differ in certain amount of OCR errors and have a number of sentence-length variations. Identification of similar books is performed using small-sized fingerprints that can be easily shared and compared. We created synthetic datasets to evaluate fingerprinting accuracy while providing standard precision and recall measurements.

locality-sensitive hashing; simhash; digital book; clustering

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

Podaci o prilogu

2015.

objavljeno

Podaci o matičnoj publikaciji

Proceedings of the 9th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities

Podaci o skupu

ACL Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities

predavanje

30.07.2015-30.07.2015

Peking, Kina

Povezanost rada

Računarstvo