PaaS is happily buzzing in the Cloud and it seems to be hottest topic in the infrastructure services today, so I decided to test Openshift – PaaS offering from Red Hat. Couple of reasons make this platform interesting – firstly it’s open source solution, so we can use it to build your own private solution, secondly on public service we get 3 gears ( linux containers with predefined configuration) for free forever, so it’s easy to experiment with this platform. As a sample project we will create very simple Python Flask web application with MongoDb.
Intial Setup
After creating account, few actions is required:
- Install client tool
rhc
(it’s Ruby based – so we need also ruby interpreter and gem package manager to be installed) - We also need git and python virtualenv (our example is for python 3)
- register ssh key with our account (this can be done as part of nest step)
- run
rhc setup
Now we are ready for our first application.
Create And Deploy Application
We create application template using sample application availabe here at github :
rhc app create testpy python-3.3 --from-code https://github.com/izderadicka/openshift-test.git #beware there is also app-create command, but it will not create local git repo by default rhc cartridge add mongodb-2.4 -a testpy
Openshift provides base template for many common web application development platforms like python (with django, flask …), php, node.js, java (tomcat, jboss) etc. Also for each web application we can add additional ‘catridges’, which are additional services like database, cron, etc. In our case we add MongoDb cartridge.
First we need to create virtual environment so we can test application locally:
cd testpy virtualenv -p python3 . source bin/activate
Next we need to install required python libraries – they should be listed in file requirements.txt. They are installed automatically during Openshift deployment, however there is one issue there – it looks like by default Openshift is installing packages from its own mirrors of python repositories and it could not find some packages for this application or right versions ( it also caused troubles in another project – where it installed older versions of django and django-registration and application was not working then)- enforcing official repository in requirements.txt helped:
--index-url https://pypi.python.org/simple/
Locally we can install dependencies with:
pip install -r requirements.txt
For Openshift deployment there are two other important files:
setup.py, which is a standard python setup file, here we should edit metadata for our application and add any additional setup tasks (like creating database). setup.py is also run automatically during deployment. Here is for instance code to create postgresql database (if we choose postgresql instead of mongo) :
from setuptools import setup from setuptools import Command import os.path class InitDbCommand(Command): user_options = [] def initialize_options(self): """Abstract method that is required to be overwritten""" def finalize_options(self): """Abstract method that is required to be overwritten""" def run(self): from flaskapp import db res=db.engine.execute(""" SELECT EXISTS ( SELECT 1 FROM information_schema.tables WHERE table_schema = 'public' AND table_name = 'thought' ); """) exists=list(res)[0][0] if exists: print('Table already exists, skipping creation') else: print('Will create table') db.create_all() setup(name='random_thoughts', version='0.1', description='Very simple flask app to test Openshift deployment', author='Ivan', author_email='ivan@zderadicka.eu', url='https://testpy-ivanovo.rhcloud.com/', cmdclass={'initdb': InitDbCommand}, )
wsgi.py – Openshift is using mod_wsgi to run python code, by default it’s looking for file wsgi.py in the root directory of our code. For us it’s just enough, to import flask application, which is WSGI compatible:
from flaskapp import app as application
Openshift also allows us to define custom scripts, which can run at different stages of deployment – so called action hooks. Action hooks can be added to directory .openshift/action_hooks. In our case we add deploy script, which enables fulltext in MongoDb configuration.
When our code is ready and works OK locally:
python flaskapp.py
we can deploy to Openshift easily with git:
git push origin master # we may need to restart app first time due to mongodb config change to enable fulltext rhc app restart testpy
Scalable Application
Openshift enables automatic scaling of applications – when number of connections reaches certain threshold additional gears with our web application are automatically created and web traffic is load balanced between them (Openshift is using HAProxy, installed in the first gear – it’s so called Web Load Balancer cartridge).
When application is created it must be explicitly enabled for scaling. Existing applications cannot be enabled for scaling after creation. So we first need to delete our exiting non-scalable application:
cd .. rhc app delete testpy rm -rf testpy
And recreate it as a scalable application ( with -s argument):
rhc app create testpy python-3.3 -s cd testpy
We try something bit different to get code from github:
git rm -r wsgi.py setup.py .openshift git commit -a -m 'clean' # lets use differend branch for deployment git checkout -b scaled rhc app configure --deployment-branch scaled git add remote github https://github.com/izderadicka/openshift-test.git git pull github scaled rhc push origin scaled
In this scenario we need shared MongoDb database, we can use MongoLab from Openshift Marketplace. Just order MongoLab Free service there and then add it to this application via marketplace UI. Now our application looks like:
rhc app show testpy testpy @ http://testpy-ivanovo.rhcloud.com/ (uuid: ...) ---------------------------------------------------------------------------- Domain: ivanovo Created: 7:49 AM Gears: 1 (defaults to small) Git URL: ssh://...@testpy-ivanovo.rhcloud.com/~/git/testpy.git/ SSH: ...@testpy-ivanovo.rhcloud.com Deployment: auto (on git push) haproxy-1.4 (Web Load Balancer) ------------------------------- Gears: Located with python-3.3 python-3.3 (Python 3.3) ----------------------- Scaling: x1 (minimum: 1, maximum: available) on small gears mongolab-mongolab-1.0 (MongoLab) -------------------------------- From: https://marketplace.openshift.com/api/custom/openshift/v1/accounts/... Gears: none (external service)
And we have environment variable to connect to MongoDb:
rhc env list MONGOLAB_URI=mongodb://xxx:zzz.mongolab.com:37447/openshift_zzzz
So we just need to modify our application to use this connection URL:
app.config['MONGO_URI'] = os.environ.get('MONGOLAB_URI', 'mongodb://localhost/test')
Scaling is configured by environment variable OPENSHIFT_MAX_SESSIONS_PER_GEAR (default is 16), which is maximum number of connections that HAProxy passes to one backend application. According to the documentation, if number of total connections is sustained at 90% of capacity (max_connections x num_of_gears) for some period, new gear is added (if free gears are available). Web application is copied to the new gear, deployed, started and added as another backend to HAProxy load balancer.
For better demonstration of scaling we can decrease value of OPENSHIFT_MAX_SESSIONS_PER_GEAR:
rhc env set OPENSHIFT_MAX_SESSIONS_PER_GEAR=8
We can try how application scales – we use Apache HTTP benchmark tool ab
to put some load on our application:
ab -n 100000 -c 100 http://testpy-ivanovo.rhcloud.com/
After a while new gear is added, which we can see with command rhc app show
(Scaling: x2). It still takes quite some time (minutes), before new gear is ready and is added as new backend to HAProxy – we can see HAProxy status at URL: http://testpy-your-domain.rhcloud.com/haproxy-status. Little bit later another gear (last remaining) is added. Again it takes some time for it to be ready, then if we again take a look HAProxy status, we can see that the backend in the first gear is taken down (highlighted in brown) – this is an intended functionality – according to documentation: ‘‘Once you scale to 3 gears, the web gear that is collocated with HAProxy is turned off, to allow HAProxy more resources to route traffic.”
Results from ab may look like:
Server Software: Apache/2.2.15 Server Hostname: testpy-ivanovo.rhcloud.com Server Port: 80 Document Path: / Document Length: 2866 bytes Concurrency Level: 100 Time taken for tests: 1032.052 seconds Complete requests: 100000 Failed requests: 0 Total transferred: 314133754 bytes HTML transferred: 286600000 bytes Requests per second: 96.89 [#/sec] (mean) Time per request: 1032.052 [ms] (mean) Time per request: 10.321 [ms] (mean, across all concurrent requests) Transfer rate: 297.24 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 105 142 65.6 131 3137 Processing: 127 889 504.0 774 3597 Waiting: 127 885 501.5 772 3596 Total: 249 1031 503.7 912 4016 Percentage of the requests served within a certain time (ms) 50% 912 66% 1049 75% 1182 80% 1300 90% 1887 95% 2104 98% 2352 99% 2540 100% 4016 (longest request)
Actually when I was observing behaviour of the scalable application, above mentioned rule was not obviously demonstrated (I got around 100 connections to backend, OPENSHIFT_MAX_SESSIONS_PER_GEAR=16, but application was still scaled to 2 gears), so maybe the scaling is bit more complex.
Finally after a while. when traffic is down, application returns back to 1 gear. (Application restart will not reset scaling).