Welcome to the e(BE:L) Documentation!
e(BE:L)
e(BE:L) is a Python package built for both validating and modeling information extracted from publications using Biological Expression Language (BEL) <https://language.bel.bio/>
_.
This software package serves a comprehensive tool for all of your BEL needs and serves to create enriched knowledge graphs
for developing and testing new theories and hypotheses.
e(BE:L) have implemented several other knowledge bases to extend the BEL knowledge graph or map identifiers.
- BioGrid
- ChEBI
- ClinicalTrials.gov
- ClinVar
- DisGeNET
- DrugBank
- Ensembl
- Expression Atlas
- GWAS Catalog
- HGNC
- IntAct
- Guide to PHARMACOLOGY
- KEGG
- MirTarBase
- Different resources from NCBI
- OffSides
- Pathway Commons
- The Human Protein Atlas
- Reactome
- STRING
- UniProt
Installation
The easiest way to install ebel is to use docker-compose
. See below instructions to use the docker installation.
ebel
can be directly installed from PyPi with pip
pip install ebel
But we want to encourage you to use the latest development version which can be installed with::
pip install git+https://github.com/e-bel/ebel
Package Requirements
Installing OrientDB
This software package is designed to work in conjunction with OrientDB, a NoSQL, multi-model database that acts as both a graph and relational database. e(BE:L) uses OrientDB for generating the knowledge graph derived from BEL files. To get started with e(BE:L), first download OrientDB and get a server up and running. The first time the server is started, you will need to create a root password. Once it is up and running, you can get start importing BEL files into it!
On Linux you can use following commands
wget https://repo1.maven.org/maven2/com/orientechnologies/orientdb-community/3.2.2/orientdb-community-3.2.2.tar.gz
tar -xvzf orientdb-community-3.2.2.tar.gz
cd orientdb-community-3.2.2/bin
./server.sh
SQL Databases
This package is capable of enriching the compiled knowledge graphs with a lot of external information, however, this requires a SQL databases for storage. While, a SQLite database can be used, this is not recommended as the amount of data and complexity of queries will be quite slow. Additionally, SQLite will not be directly supported, the methods will be built such that they should work with both SQLite and MySQL, but we will not address performance issues due to using SQLite.
Instead, we recommend setting up a MySQL server or MariaDB to use with e(BE:L). By default, PyMySQL is installed as a driver by e(BE:L), but others can also be used.
On Linux Ubuntu you can use following command
sudo apt install mysql-server -y
or
sudo apt install mariadb-server -y
Configuration
Before you start working with e(BE:L), a simple to use wizard helps you to setup all configurations. Make sure OrientDB and MySQL (or MariaDB) are running. Then start the configuration wizard with
ebel settings
The wizard will create the needed databases and users in OrientDB and MySQL/MariaDB.
Package Components
To test the different components you find here several BEL and already compiled JSON files.
BEL Validation
BEL is a domain-specific language designed to capture biological relationships in a computer- and human-readable format. The rules governing BEL statement generation can be quite complex and often mistakes are made during curation. e(BE:L) includes a grammar and syntax checker that reads through given BEL files and validates whether each statement satisfies the guidelines provided by BEL.bio. Should any BEL statement within the file not adhere to the rules, a report file is created by e(BE:L) explaining the error and offering suggested fixes.
You can use the following command to validate your BEL file
ebel validate /path/to/bel_file.bel
In a single command, you can validate your BEL file as well as generate error reports if there are errors and if there are none, produce an importable JSON file::
ebel validate /path/to/bel_file.bel -r error_report.xlsx -j
BEL documents should be properly formatted prior to validation. e(BE:L) contains a repair tool that will check the format and it is highly recommended that this is used prior to validation. The repaired will overwrite the original if a new file path is not specified. Here is an example::
ebel repair /path/to/bel_file.bel -n /path/to/repaired_file.bel
Import Process
BEL Modeling - OrientDB
BEL files that have passed the validation process can be imported into the
database individually or en masse. During the import process, e(BE:L) automatically creates all the relevant nodes and edges
as described in the BEL files. Additionally, e(BE:L) also automatically adds in missing nodes and edges that are known to exist
e.g. protein nodes with a respective RNA or gene node with have these automatically added to the graph with the appropriate translatedTo
and
transcribedTo
edges.
Model Enrichment - MySQL
e(BE:L) goes one step farther when compiling your BEL statements into a knowledge graph by supplementing your new graph model with information derived from several
publicly available repositories. Data is automatically downloaded from several useful sites including UniProt
,
Ensembl
, and IntAct
and added as generic tables in your newly built database.
Information from these popular repositories are then linked to the nodes and edges residing in your graph model, allowing for more complex and
useful queries to be made against your data. This data is automatically downloaded, parsed, and imported into a specified SQL database.
Importing - Getting Started
e(BE:L) supports OrientDB as graph database and MySQL and MariaDB as RDBMS
Make sure you have downloaded/installed and running
OrientDB
- MySQL or MariaDB
- Relational Database
- MySQL
- MariaDB
This can be configured as a service in both Windows and Unix systems.
Set your MySQL connection parameters in e(BE:L)
ebel set-mysql --host localhost --user root --password myPassWord --database ebel
Once you have made sure both OrientDB and MySQL are running, you can now import an e(BE:L) compiled JSON file
ebel import-json /path/to/checked_bel.json -u root -p orientdbPassword -d ebel -h localhost -p 2424
After you have successfully connected to the OrientDB database at least once, the login credentials will be written to the config file and no longer need to be passed (same with enrich
command)
ebel import-json /path/to/checked_bel.json
You can also import all e(BE:L) compiled JSON files in a passed directory
ebel import-json /path/to/bel_json/dir/
If you do no wish to enrich the graph, or wish to disable the protein/RNA/gene extension step, you can toggle these with the following options
ebel import-json /path/to/checked_bel.json -e -g
You can run an enrichment step later using the enrich
command
ebel enrich
This command can also be given a list of resources to either skip or include during enrichment
ebel enrich -i uniprot,hgnc
or
ebel enrich -s intact,kegg