Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New ec2setup.txt suggestion and hints... #60

Open
pjm4github opened this issue Sep 13, 2015 · 1 comment
Open

New ec2setup.txt suggestion and hints... #60

pjm4github opened this issue Sep 13, 2015 · 1 comment

Comments

@pjm4github
Copy link

This is my version of ec2setup.txt that I modified to work on my own home grown Ubuntu 12.04 LTS instance.

Start with AMI # ami-3fec7956 (Ubuntu 12.04), 32GB
(ec2-run-instances ami-3fec7956 -t m1.large --region us-east-1 -z us-east-1d --block-device-mapping /dev/sda1=:32:false -k )

sudo apt-add-repository -y ppa:olivier-berten/geo
sudo add-apt-repository -y ppa:webupd8team/java
sudo aptitude update
sudo aptitude safe-upgrade -y
sudo aptitude full-upgrade -y
sudo aptitude install -y build-essential apache2 apache2.2-common apache2-mpm-prefork apache2-utils libexpat1 ssl-cert postgresql libpq-dev ruby1.8-dev ruby1.8 ri1.8 rdoc1.8 irb1.8 libreadline-ruby1.8 libruby1.8 libopenssl-ruby sqlite3 libsqlite3-ruby1.8 git-core libcurl4-openssl-dev apache2-prefork-dev libapr1-dev libaprutil1-dev subversion postgresql-9.1-postgis autoconf libtool libxml2-dev libbz2-1.0 libbz2-dev libgeos-dev proj-bin libproj-dev ocropus pdftohtml catdoc unzip ant openjdk-6-jdk lftp php5-cli rubygems flex postgresql-server-dev-9.1 proj libjson0-dev xsltproc docbook-xsl docbook-mathml gettext postgresql-contrib-9.1 pgadmin3 python-software-properties bison dos2unix
sudo aptitude install -y oracle-java7-installer
sudo aptitude install -y libgdal-dev
sudo aptitude install -y libgeos++-dev
sudo bash -c 'echo "/usr/lib/jvm/java-7-oracle/jre/lib/amd64/server" > /etc/ld.so.conf.d/jvm.conf'
sudo ldconfig

Note that here you should create a new user called ubuntu (I used my own user and had to modify various scripts and config files which is described below)

mkdir ~/sources
cd ~/sources
wget http://download.osgeo.org/postgis/source/postgis-2.0.3.tar.gz
tar xfvz postgis-2.0.3.tar.gz
cd postgis-2.0.3
./configure --with-gui

./configure --with-gui --without-topology

If the GEO version is incorrect then perform the following steps:

wget http://download.osgeo.org/geos/geos-3.3.8.tar.bz2
tar xjf geos-3.3.8.tar.bz2
cd geos-3.3.8
./configure
make
sudo make install
cd ~/sources/postgis-2.0.3
./configure --with-gui

Note that the above steps didnt work. It appears that there should be a way to setup the load libraries correctly but I gave up.

otherwise continue here:

make
sudo make install
sudo ldconfig
sudo make comments-install

sudo sed -i "s/ident/trust/" /etc/postgresql/9.1/main/pg_hba.conf
sudo sed -i "s/md5/trust/" /etc/postgresql/9.1/main/pg_hba.conf
sudo sed -i "s/peer/trust/" /etc/postgresql/9.1/main/pg_hba.conf
sudo /etc/init.d/postgresql restart
createdb -U postgres geodict

sudo -u postgres createdb template_postgis
sudo -u postgres psql -d template_postgis -f /usr/share/postgresql/9.1/contrib/postgis-2.0/postgis.sql
sudo -u postgres psql -d template_postgis -f /usr/share/postgresql/9.1/contrib/postgis-2.0/spatial_ref_sys.sql
sudo -u postgres psql -d template_postgis -f /usr/share/postgresql/9.1/contrib/postgis-2.0/postgis_comments.sql
sudo -u postgres psql -d template_postgis -f /usr/share/postgresql/9.1/contrib/postgis-2.0/rtpostgis.sql
sudo -u postgres psql -d template_postgis -f /usr/share/postgresql/9.1/contrib/postgis-2.0/raster_comments.sql
sudo -u postgres psql -d template_postgis -f /usr/share/postgresql/9.1/contrib/postgis-2.0/topology.sql
sudo -u postgres psql -d template_postgis -f /usr/share/postgresql/9.1/contrib/postgis-2.0/topology_comments.sql
sudo -u postgres psql -d template_postgis -f /usr/share/postgresql/9.1/contrib/postgis-2.0/legacy.sql
sudo -u postgres psql -d template_postgis -f /usr/share/postgresql/9.1/contrib/postgis-2.0/legacy_gist.sql

cd ~/sources
git clone git://github.com/petewarden/dstk.git
git clone git://github.com/petewarden/dstkdata.git
cd dstk
sudo gem install bundler
sudo bundle install

cd ~/sources/dstkdata

If you want to save disk space and don't need geo-statistics, you can skip everything

up until the comment indicating the end of the geostats loading.

I SKIPPED TO %%%%%%%%% BELOW

createdb -U postgres -T template_postgis statistics

tar xzf statistics/gl_gpwfe_pdens_15_bil_25.tar.gz
export PATH=$PATH:/usr/lib/postgresql/9.1/bin/
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I gl_gpwfe_pdens_15_bil_25/glds15ag.bil public.population_density | psql -U postgres -d statistics
rm -rf gl_gpwfe_pdens_15_bil_25
unzip statistics/glc2000_v1_1_Tiff.zip
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I Tiff/glc2000_v1_1.tif public.land_cover | psql -U postgres -d statistics
rm -rf Tiff

sudo mkdir /mnt/data
sudo chown pjm /mnt/data
cd /mnt/data

The zip files are here: http://gis-lab.info/data/srtm-tif/, or here http://srtm.csi.cgiar.org/ or here https://hc.app.box.com/shared/1yidaheouv password = ThanksCSI!

sudo curl -O "http://static.datasciencetoolkit.org.s3-website-us-east-1.amazonaws.com/SRTM_NE_250m.tif.zip"

unzip SRTM_NE_250m.tif.zip

I got the TIF files from here instead!

sudo curl -O "https://hc.box.net/shared/1yidaheouv/SRTM_SE_250m_TIF.rar"
unrar SRTM_NE_250m_TIF.rar
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 SRTM_NE_250m.tif public.elevation | psql -U postgres -d statistics
rm -rf SRTM_NE_250m*
curl -O "http://static.datasciencetoolkit.org.s3-website-us-east-1.amazonaws.com/SRTM_W_250m.tif.zip"
unzip SRTM_W_250m.tif.zip
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -a SRTM_W_250m.tif public.elevation | psql -U postgres -d statistics
rm -rf unzip SRTM_W_250m*
curl -O "http://static.datasciencetoolkit.org.s3-website-us-east-1.amazonaws.com/SRTM_SE_250m.tif.zip"
unzip SRTM_SE_250m.tif.zip
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -a -I SRTM_SE_250m.tif public.elevation | psql -U postgres -d statistics
rm -rf SRTM_SE_250m*

curl -O "http://static.datasciencetoolkit.org.s3-website-us-east-1.amazonaws.com/tmean_30s_bil.zip"
unzip tmean_30s_bil.zip
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I tmean_1.bil public.mean_temperature_01 | psql -U postgres -d statistics
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I tmean_2.bil public.mean_temperature_02 | psql -U postgres -d statistics
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I tmean_3.bil public.mean_temperature_03 | psql -U postgres -d statistics
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I tmean_4.bil public.mean_temperature_04 | psql -U postgres -d statistics
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I tmean_5.bil public.mean_temperature_05 | psql -U postgres -d statistics
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I tmean_6.bil public.mean_temperature_06 | psql -U postgres -d statistics
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I tmean_7.bil public.mean_temperature_07 | psql -U postgres -d statistics
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I tmean_8.bil public.mean_temperature_08 | psql -U postgres -d statistics
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I tmean_9.bil public.mean_temperature_09 | psql -U postgres -d statistics
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I tmean_10.bil public.mean_temperature_10 | psql -U postgres -d statistics
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I tmean_11.bil public.mean_temperature_11 | psql -U postgres -d statistics
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I tmean_12.bil public.mean_temperature_12 | psql -U postgres -d statistics
rm -rf tmean_*

curl -O "http://static.datasciencetoolkit.org.s3-website-us-east-1.amazonaws.com/prec_30s_bil.zip"
unzip prec_30s_bil.zip
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I prec_1.bil public.precipitation_01 | psql -U postgres -d statistics
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I prec_2.bil public.precipitation_02 | psql -U postgres -d statistics
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I prec_3.bil public.precipitation_03 | psql -U postgres -d statistics
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I prec_4.bil public.precipitation_04 | psql -U postgres -d statistics
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I prec_5.bil public.precipitation_05 | psql -U postgres -d statistics
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I prec_6.bil public.precipitation_06 | psql -U postgres -d statistics
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I prec_7.bil public.precipitation_07 | psql -U postgres -d statistics
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I prec_8.bil public.precipitation_08 | psql -U postgres -d statistics
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I prec_9.bil public.precipitation_09 | psql -U postgres -d statistics
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I prec_10.bil public.precipitation_10 | psql -U postgres -d statistics
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I prec_11.bil public.precipitation_11 | psql -U postgres -d statistics
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I prec_12.bil public.precipitation_12 | psql -U postgres -d statistics
rm -rf prec_*

unzip /home/pjm/sources/dstkdata/statistics/us_statistics_rasters.zip -d .
for f in .tif; do raster2pgsql -s 4236 -t 32x32 -I $f basename $f .tif | psql -U postgres -d statistics; done
rm -rf us

rm -rf metadata

This is the end of the geostats loading, continue from here if you decide to skip that part.

%%%%%%%% START HERE AGAIN

sudo gem install passenger
sudo passenger-install-apache2-module

You'll need to update the version number below to match whichever actual passenger version was installed

This is what the build said:

LoadModule passenger_module /var/lib/gems/1.8/gems/passenger-5.0.18/buildout/apache2/mod_passenger.so

PassengerRoot /var/lib/gems/1.8/gems/passenger-5.0.18

PassengerDefaultRuby /usr/bin/ruby1.8

I changed the passenger version in the lines below to match what was found from the lines above:

sudo bash -c 'echo "LoadModule passenger_module /var/lib/gems/1.8/gems/passenger-5.0.18/buildout/apache2/mod_passenger.so" > /etc/apache2/mods-enabled/passenger.load'
sudo bash -c 'echo "PassengerRoot /var/lib/gems/1.8/gems/passenger-5.0.18" > /etc/apache2/mods-enabled/passenger.conf'
sudo bash -c 'echo "PassengerRuby /usr/bin/ruby1.8" >> /etc/apache2/mods-enabled/passenger.conf'
sudo bash -c 'echo "PassengerMaxPoolSize 3" >> /etc/apache2/mods-enabled/passenger.conf'
sudo sed -i "s/MaxRequestsPerChild[ \t][ \t][0-9][0-9]/MaxRequestsPerChild 20/" /etc/apache2/apache2.conf

I needed to change the DocumentRoot to match the actual location where the data was installed. In my case the sources directory was /home/pjm/sources instead of /home/ubuntu/sources.

Ideally there should have been a new user called ubuntu but I didnt know about this until I was too far into the process.

sudo bash -c 'echo "
<VirtualHost *:8000>
ServerName 127.0.1.1
DocumentRoot /home/pjm/sources/dstk/public
RewriteEngine On
RewriteCond %{HTTP_HOST} ^datasciencetoolkit.org$ [NC]
RewriteRule ^(.)$ http://www.datasciencetoolkit.org$1 [R=301,L]
RewriteCond %{HTTP_HOST} ^datasciencetoolkit.com$ [NC]
RewriteRule ^(.
)$ http://www.datasciencetoolkit.com$1 [R=301,L]
<Directory /home/pjm/sources/dstk/public>
AllowOverride all
Options -MultiViews


" > /etc/apache2/sites-enabled/000-default'
sudo ln -s /etc/apache2/mods-available/rewrite.load /etc/apache2/mods-enabled/rewrite.load

sudo /etc/init.d/apache2 restart

sudo gem install postgres -v '0.7.9.2008.01.28'

cd ~/sources/dstk
./populate_database.rb

cd ~/sources
mkdir maxmind
cd maxmind
wget "http://geolite.maxmind.com/download/geoip/database/GeoLiteCity.dat.gz"
gunzip GeoLiteCity.dat.gz
wget "http://geolite.maxmind.com/download/geoip/api/c/GeoIP.tar.gz"
tar xzvf GeoIP.tar.gz
cd GeoIP-1.4.8/
libtoolize -f
./configure
make
sudo make install
cd ..
svn checkout svn://rubyforge.org/var/svn/net-geoip/trunk net-geoip
cd net-geoip/
ruby ext/extconf.rb
make
sudo make install

cd ~/sources
wget http://ftp.gnu.org/pub/gnu/libiconv/libiconv-1.11.tar.gz
tar -xvzf libiconv-1.11.tar.gz
cd libiconv-1.11
./configure --prefix=/usr/local/libiconv
make
sudo make install
sudo ln -s /usr/local/libiconv/lib/libiconv.so.2 /usr/lib/libiconv.so.2

createdb -U postgres -T template_postgis reversegeo

cd ~/sources
git clone git://github.com/petewarden/osm2pgsql
cd osm2pgsql/
./autogen.sh
sed -i 's/version = BZ2_bzlibVersion();//' configure
sed -i 's/version = zlibVersion();//' configure
./configure
make
sudo make install
cd ..

osm2pgsql -U postgres -d reversegeo -p world_countries -S osm2pgsql/styles/world_countries.style dstkdata/world_countries.osm -l
osm2pgsql -U postgres -d reversegeo -p admin_areas -S osm2pgsql/styles/admin_areas.style dstkdata/admin_areas.osm -l
osm2pgsql -U postgres -d reversegeo -p neighborhoods -S osm2pgsql/styles/neighborhoods.style dstkdata/neighborhoods.osm -l

The above commands take several hours to complete

I started the next set of commands in a new window...

cd ~/sources
git clone git://github.com/petewarden/boilerpipe
cd boilerpipe/boilerpipe-core/
ant
cd src
javac -cp ../dist/boilerpipe-1.1-dev.jar boilerpipe.java

cd ~/sources/dstk/
psql -U postgres -d reversegeo -f sql/loadukpostcodes.sql

osm2pgsql -U postgres -d reversegeo -p uk_osm -S ../osm2pgsql/default.style ../dstkdata/uk_osm.osm.bz2 -l

psql -U postgres -d reversegeo -f sql/buildukindexes.sql

cd ~/sources
git clone git://github.com/geocommons/geocoder.git
cd geocoder
make
sudo make install

Build the latest Tiger/Line data for US address lookups

cd /mnt/data
mkdir tigerdata
cd tigerdata
lftp ftp2.census.gov:/geo/tiger/TIGER2012/EDGES
mirror --parallel=5 .
cd ../FEATNAMES
mirror --parallel=5 .
cd ../ADDR
mirror --parallel=5 .
exit
cd ~/sources/geocoder/build/
mkdir ../../geocoderdata/
./tiger_import ../../geocoderdata/geocoder2012.db /mnt/data/tigerdata/

Completed to here

cd ~/sources
git clone git://github.com/luislavena/sqlite3-ruby.git
cd sqlite3-ruby
ruby setup.rb config
ruby setup.rb setup
sudo ruby setup.rb install

cd ~/sources/geocoder
bin/rebuild_metaphones ../geocoderdata/geocoder2012.db
chmod +x build/build_indexes
build/build_indexes ../geocoderdata/geocoder2012.db
rm -rf /mnt/data/tigerdata

createdb -U postgres names
cd /mnt/data
curl -O "http://www.ssa.gov/oact/babynames/names.zip"
dos2unix yob*.txt
~/sources/dstk/dataconversion/analyzebabynames.rb . > babynames.csv
psql -U postgres -d names -f ~/sources/dstk/sql/loadnames.sql

Fix for postgres crashes,

sudo sed -i "s/shared_buffers = [0-9A-Za-z]*/shared_buffers = 512MB/" /etc/postgresql/9.1/main/postgresql.conf
sudo sysctl -w kernel.shmmax=576798720
sudo bash -c 'echo "kernel.shmmax=576798720" >> /etc/sysctl.conf'
sudo bash -c 'echo "vm.overcommit_memory=2" >> /etc/sysctl.conf'
sudo sed -i "s/max_connections = 100/max_connections = 200/" /etc/postgresql/9.1/main/postgresql.conf
sudo /etc/init.d/postgresql restart

Remove files not needed at runtime

rm -rf /mnt/data/*
rm -rf ~/sources/libiconv-1.11.tar.gz
rm -rf ~/sources/postgis-2.0.3.tar.gz
cd ~/sources/
mkdir dstkdata_runtime
mv dstkdata/ethnicityofsurnames.csv dstkdata_runtime/
mv dstkdata/GeoLiteCity.dat dstkdata_runtime/
rm -rf dstkdata
mv dstkdata_runtime dstkdata

Up to this point, you'll have a 0.50 version of the toolkit.

The following will upgrade you to a 0.51 version

cd ~/sources/dstk
git pull origin master

I found that the toolkit wass already uptodate

TwoFishes geocoder

cd ~/sources
mkdir twofishes
cd twofishes
mkdir bin
curl "http://www.twofishes.net/binaries/latest.jar" > bin/twofishes.jar
mkdir data

The source link above is obsolete

curl "http://www.twofishes.net/indexes/revgeo/latest.zip" > data/twofishesdata.zip

This one might work... its unknown what was in latest.zip versus 2015-03-05.zip

curl "http://www.twofishes.net/indexes/revgeo/2015-03-05.zip" > data/twofishesdata.zip

The ~/sources/dstk/twofishd.sh must be edited to point to the new directory.

change

java -Xmx1500M -jar /home/ubuntu/sources/twofishes/bin/twofishes.jar --hfile_basepath /home/ubuntu/sources/twofishes/data/latest/

to this

java -Xmx1500M -jar /home/pjm/sources/twofishes/bin/twofishes.jar --hfile_basepath /home/pjm/sources/twofishes/data/2015-03-05-20-05-30.753698/

The entire ~/sources/dstk/ directory should be check to see if there is any reference to /home/ubuntu and renamed to point to /home/pjm instead

I looked through the dstk and found several instances like this:

cd ~/sources/dstk

grep '/home/ubuntu' *

geodict_daemon.rb:Daemons.run('/home/ubuntu/sources/dstk/dstk_server.rb', {

twofishes.conf:exec start-stop-daemon --start -c root --exec /home/ubuntu/sources/dstk/twofishesd.sh

twofishesd.sh:java -Xmx1500M -jar /home/ubuntu/sources/twofishes/bin/twofishes.jar --hfile_basepath /home/ubuntu/sources/twofishes/data/latest/

cd data
unzip twofishesdata.zip

sudo cp ~/sources/dstk/twofishes.conf /etc/init/twofishes.conf
sudo service twofishes start

Here is what the VirtualHost field looks like already

sudo bash -c 'echo "

<VirtualHost *:8000>

ServerName 127.0.1.1

DocumentRoot /home/pjm/sources/dstk/public

RewriteEngine On

RewriteCond %{HTTP_HOST} ^datasciencetoolkit.org$ [NC]

RewriteRule ^(.*)$ http://www.datasciencetoolkit.org$1 [R=301,L]

RewriteCond %{HTTP_HOST} ^datasciencetoolkit.com$ [NC]

RewriteRule ^(.*)$ http://www.datasciencetoolkit.com$1 [R=301,L]

<Directory /home/pjm/sources/dstk/public>

AllowOverride all

Options -MultiViews

" > /etc/apache2/sites-enabled/000-default'

This will be changed to this now:

sudo bash -c 'echo "
<VirtualHost :8000>
ServerName 127.0.1.1
DocumentRoot /home/pjm/sources/dstk/public
RewriteEngine On
RewriteCond %{HTTP_HOST} ^datasciencetoolkit.org$ [NC]
RewriteRule ^(.)$ http://www.datasciencetoolkit.org$1 [R=301,L]
RewriteCond %{HTTP_HOST} ^datasciencetoolkit.com$ [NC]
RewriteRule ^(.
)$ http://www.datasciencetoolkit.com$1 [R=301,L]
# We have an internal TwoFishes server running on port 8081, so redirect
# requests that look like they belong to its API
ProxyPass /twofishes http://localhost:8081
<Directory /home/pjm/sources/dstk/public>
AllowOverride all
Options -MultiViews
Header set Access-Control-Allow-Origin "
"
Header set Cache-Control "max-age=86400"


" > /etc/apache2/sites-enabled/000-default'
sudo ln -s /etc/apache2/mods-available/rewrite.load /etc/apache2/mods-enabled/rewrite.load
sudo ln -s /etc/apache2/mods-available/proxy.load /etc/apache2/mods-enabled/proxy.load
sudo ln -s /etc/apache2/mods-available/proxy_http.load /etc/apache2/mods-enabled/proxy_http.load
sudo ln -s /etc/apache2/mods-available/headers.load /etc/apache2/mods-enabled/headers.load

sudo /etc/init.d/apache2 restart

I now go to http://192.168.0.5:8000 and I get the datasciencetoolkit webpage along with all the tools!! Nice!!

@chapmanjacobd
Copy link

chapmanjacobd commented Feb 23, 2019

I think something may be wrong with DSTK including this guide...

The SRID of all the imported datasets is 4236 but it should probably be 4326. The 3 and the 2 are switched.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants