Difference between revisions of "Yaminabe2"

From eLinux.org
Jump to: navigation, search
Line 2: Line 2:
 
= Instlation=
 
= Instlation=
 
== TLSH install ==
 
== TLSH install ==
 +
 +
= yaminabe2 execution =
 +
== Generating the databases for Yaminabe2 ==
 +
 +
Two databases are used for Yaminabe2:
 +
 +
* database with metainformation about packages, versions, checksums, download locations, origin, and so on. This is created using the database creation script of the Binary Analysis Tool (http://www.binaryanalysis.org/).<br>
 +
* database with exploded Git information. This is created using the gittlsh.py program. For this script it is important to make sure that the information in the file sourceverify.config is correct, especially locations of databases, locations of Git URLs and repositories and priorities/importance, which can differ per person.
 +
 +
It is then invoked as follows:
 +
 +
<pre>
 +
$ python gittlsh.py -c /path/to/configuration/file
 +
</pre>
 +
 +
To update the script simply make sure that the Git repositories are updated (git pull) and rerun the same command.
 +
 +
For the Linux kernel the first run might take quite long (5 or 6 hours). It is very much recommended to use a ramdisk to store the Git repositories because the script is very I/O intensive.
 +
 +
== Running the TLSH compare scripts ==
 +
 +
There are two scripts that can compute TLSH checksums:
 +
 +
# gittreecompare.py
 +
# sourceverifier.py
 +
 +
* The first script compares tags from two Git repositories and computes the TLSH score. The second script compares a directory of source code to all data in all branches of many Git trees (a Git "forest").
 +
 +
* The tag file used for gittreecompare.py consists of several rows of data,tab separated. The first row has the Git URLs of two Git repositories, each subsequent row has Git tags from the Git repositories. The script will check how far the Git tag in the first column is removed from the tag in the second column. Depending on the situation it might be useful to also look at the reverse, for example if the second repository contains many files that are not in the first repository, as it is not a symmetric problem. In the "results"
 +
directory the results of a few test runs are stored. These tests have been done both ways and yield different scores.
 +
 +
* Running the gittreecompare.py script is simple:
 +
 +
<pre>
 +
$ python gittreecompare.py -c /path/to/configuration/file -t /path/to/tag/file
 +
</pre>
 +
 
= Resources =
 
= Resources =
== yaminabe2 scripts ==
+
== independent script ==
- [[File:yaminabe2-0.2.tar.gz]]
+
- [[File:gittlsh.py]] script to explode Git repositories and store metadata like SHA256 and TLSH checksums out of band<br>
 +
- [[File:gittreecompare.py]] script to compare two tags in Git repositories and compute a TLSH score<br>
 +
- [[File:sourceverifier.py]] script for both the Yaminabe and Yaminabe2 projects<br>
 +
- [[File:sourceverify.config]] configuration file used for the Python scripts<br>
 +
 
 +
== archives ==
 +
- [[File:yaminabe2-0.2.tar.gz]]<br>
 +
 
 
== prebuild database ==
 
== prebuild database ==
- [[File:kerneldb.sqlite3.xz]]<br />
+
- [[File:kerneldb.sqlite3.xz]]<br>
- [[File:kernelgit.sqlite3.xz]]
+
- [[File:kernelgit.sqlite3.xz]]<br>
= Reference =
 

Revision as of 17:51, 3 April 2016

Introduction

Instlation

TLSH install

yaminabe2 execution

Generating the databases for Yaminabe2

Two databases are used for Yaminabe2:

  • database with metainformation about packages, versions, checksums, download locations, origin, and so on. This is created using the database creation script of the Binary Analysis Tool (http://www.binaryanalysis.org/).
  • database with exploded Git information. This is created using the gittlsh.py program. For this script it is important to make sure that the information in the file sourceverify.config is correct, especially locations of databases, locations of Git URLs and repositories and priorities/importance, which can differ per person.

It is then invoked as follows:

$ python gittlsh.py -c /path/to/configuration/file

To update the script simply make sure that the Git repositories are updated (git pull) and rerun the same command.

For the Linux kernel the first run might take quite long (5 or 6 hours). It is very much recommended to use a ramdisk to store the Git repositories because the script is very I/O intensive.

Running the TLSH compare scripts

There are two scripts that can compute TLSH checksums:

  1. gittreecompare.py
  2. sourceverifier.py
  • The first script compares tags from two Git repositories and computes the TLSH score. The second script compares a directory of source code to all data in all branches of many Git trees (a Git "forest").
  • The tag file used for gittreecompare.py consists of several rows of data,tab separated. The first row has the Git URLs of two Git repositories, each subsequent row has Git tags from the Git repositories. The script will check how far the Git tag in the first column is removed from the tag in the second column. Depending on the situation it might be useful to also look at the reverse, for example if the second repository contains many files that are not in the first repository, as it is not a symmetric problem. In the "results"

directory the results of a few test runs are stored. These tests have been done both ways and yield different scores.

  • Running the gittreecompare.py script is simple:
$ python gittreecompare.py -c /path/to/configuration/file -t /path/to/tag/file

Resources

independent script

- File:Gittlsh.py script to explode Git repositories and store metadata like SHA256 and TLSH checksums out of band
- File:Gittreecompare.py script to compare two tags in Git repositories and compute a TLSH score
- File:Sourceverifier.py script for both the Yaminabe and Yaminabe2 projects
- File:Sourceverify.config configuration file used for the Python scripts

archives

- File:Yaminabe2-0.2.tar.gz

prebuild database

- File:Kerneldb.sqlite3.xz
- File:Kernelgit.sqlite3.xz