Loading Data in Infobright on Amazon EC2 using DLP
Many of our (current and new) community members and customers ask about using Infobright on Amazon’s EC2 cloud environment. A common question we receive is about the ability to use the cluster to distribute the load process into our Enterprise Edition. For those using the trial version of Infobright Enterprise or if you are an IEE customer, here are some steps to set up and use the Distributed Load Processor (DLP) on EC2. (If you don't have a license for DLP you can download an eval version). I assume you already have an account on EC2, and I assume you’re comfortable with setting up your security groups. The steps also assume you have two distinct instances set up on the cloud.
If you’d like to discuss Infobright and EC2, I invite you to post your questions on this Infobright Forum thread: http://bit.ly/ec2_infobright_forum
The Steps:
Definitions:
DLP EC2 Instance – The Amazon Server which will be hosting your distributed load processor binaries.
DB EC2 Instance – The Amazon Server which will be hosting your Infobright Enterprise Database.
Set up Permissions
- ON EC2 SECURITY GROUP: If you plan to use an FTP server to load binaries onto EC2, add your FTP Server to the Security Group on Ports 20-21
- Ex: upload.sourceforge.net : 216.34.181.56/32 – Ports 20-21
- ON EC2 SECURITY GROUP: Ensure Inbound permissions on your Database EC2 instance for all DLP EC2 instances
- Ex: “Custom TCP Rule”, Port: 5029, Source: INTERNAL_IP_ADDRESS/32
- ON EC2 SECURITY GROUP: Add inbound permissions for SSH into the instance from your laptop/home network
- Ex: “SSH” – YOUR_IP_ADDRESS/32
- ON EACH DLP EC2 INSTANCE: Use ssh-keygen –t rsa to generate private/public keys. Add the DLP EC2 Instance public key to the authorized_keys on the Database EC2 instance’s ~/.ssh/authorized_key file
- On DLP Instance: Open ~/.ssh/id_rsa.pub with a text editor
- On DB Instance: Open ~/.ssh/authorized_keys with a text editor
- Copy and paste the DLP Instance key string into the DB Instance authorized_key file on a new line
Install the Database and DLP:
- Install Infobright DB, DLP, and keys/licenses if necessary on the appropriate servers.
- DB Server: license file goes into /usr/local/Infobright
- DLP Server: license file is listed in the command line
- Create the database and tables on the database
- CREATE DATABASE foo;
- CREATE TABLE bar (cola int, colb int)
- For the Infobright DB, add access permissions for the appropriate user from the IP address of the DLP EC2 Instance
- CREATE USER ‘BLAH’@’IP ADDRESS’ ;
- GRANT SELECT, INSERT, FILE ON *.* TO ‘BLAH’@’IP ADDRESS’;
- Create dummy data on DLP Server.
- > echo “1,2” >> ~/2col_data.csv
Load the data using DLP
- Ex: > alias dlp-mydatabase=’ dataprocessor -E /opt/infobright/tools/distributed-load-processor/iblicense-dlp-eval.dat -v -H THE_IP_ADDRESS -D foo -T bar -f txt_variable --fields-terminated-by "," -L root -X -l /tmp/dlp.log -I file –I’
By doing so, you can load data with either of the following commands:
> dataprocessor -E /opt/infobright/tools/distributed-load-processor/iblicense-dlp-eval.dat -v -H THE_IP_ADDRESS -D foo -T bar -f txt_variable --fields-terminated-by ',' -L root -X -l /tmp/dlp.log -I file -i ~/2col_data.csv
OR
> dlp-mydatabase ~/2col_data.csv
- –E tells you where the license key is
- –v tells you to be verbose in your output. As you get use to DLP, I recommend you keep this option ON.
- –H is the host ip
- –D is the database
- –T is the able
- –f is the format type
- –fields-terminated-by you should know
- –L is the user
- –X is execute immediately (as opposed to storing to a file, manually transferring, and manually loading)
- –l is the log file location
- –i is the input file
Test your data on the Database Instance
Very good article! This is a highly beneficial blogging site which you have.
emaar mgf palm gardens gurgaon | bestech latest project
Hi MKTMJN,
DLP is now available for both Windows and Linux. In this scenario, I didn’t include Windows, but it should work. You may need to modify the Windows-specific firewall/etc.
Cheers,
Jeff
Hi Jeff,
Have you been able to get this scenario to work on Windows as well as Linux?
Post Comment