AMS JavaScript MVC Meetup Dec 2013

Last week I went the JavaScript MVC Meetup and here I have written up my notes in the form of mind-map using neography (a ruby interface to Neo4j).

I appreciate that the links here are a reflection of my personal thoughts and observations and probably slightly choatic… but still I find mindmaps very useful and I am enjoying being able to generate them programatically!

So to the two presentations….

Peter Peerdeman gave a presentation of Restangular vs AngularJS… it was a very good presentation, with good audience interactions. Possibly the topic was a bit advanced for my more basic Javascript MVC knowledge, but still I got a lot out of it:

JavascriptMVC_Restangular

The 2nd presentation was Or Hiltch from AVG, sme really insight into how widespread their adoption of Javascript has already become: ‘everything which can be written in Javascript will eventually be written in Javascript’:

JavascriptMVC_AVG

The best quote of the evening was :

everything which can be written in Javascript will eventually be written in Javascript

Alex Gorbatchev’s WordPress SyntaxHighlighter

This is a great WordPress plugin : http://en.support.wordpress.com/code/posting-source-code/

The sample source code in my blog posts now has proper highlighting, for example:

@bottle.route('/')
def blog_index():

    cookie = bottle.request.get_cookie("session")

    username = sessions.get_username(cookie)

    # even if there is no logged in user, we can show the blog
    l = posts.get_posts(10)

    return bottle.template('blog_template', dict(myposts=l, username=username))

Unfortunately my existing blog posts all needed updating, I had used the following bash script:

http://wiki.ebabel.eu/index.php?title=Sed/syntax_high_lighter

Mongo University M101P and M102… more great MOOCs

I am working my way through and enjoying the Mongo University courses:

- M101P: MongoDB for Developers

This course will go over basic installation, JSON, schema design, querying, insertion of data, indexing and working with language drivers. We will also cover working in sharded and replicated environments. In the course, you will build a blogging platform, backed by MongoDB. Our code examples will be given in Python.

- M102: MongoDB for DBAs

Then you will learn about JSON and Mongo’s extensive query capabilities through the Mongo shell. We will cover importing, and exporting data into Mongo. After that, we cover replication and fault tolerance. Then it is on to scaling out with MongoDB, including indexing, performance tuning, monitoring, and sharding. Finally, we cover backups and recovery.

The courses are very well structured, about a year ago I did the Udacity CS101 to learn python to build a web crawler:

In this course you will learn key concepts in computer science and learn how to write your own computer programs in the context of building a web crawler.

David Evans is a Professor of Computer Science at the University of Virginia where he teaches computer science and leads research in computer security. He is the author of an introductory computer science textbook and has won Virginia’s highest award for university faculty. He has PhD, SM, and SB degrees from MIT.

Basically I really like the modern MOOC format (‘Massive Open Online Course’), and in this Mongo course is that the problem solving is the particularly well paced problems.

The following problem sounds pretty simple:

The students collection has 200 records

 
>db.students.count()
200

but the scores for each student are slightly awkwardly structured, with an array of dictionary elements:

> db.students.find({"_id" : 1},{scores:1}).pretty()
{
"_id" : 1,
"scores" : [
{
"type" : "exam",
"score" : 60.06045071030959
},
{
"type" : "quiz",
"score" : 52.79790691903873
},
{
"type" : "homework",
"score" : 71.76133439165544
},
{
"type" : "homework",
"score" : 34.85718117893772
}
]
}

are our task is to remove the homework documents with the lowest scores:

> db.students.find({"_id" : 1},{scores:1}).pretty()
{
"_id" : 1,
"scores" : [
{
"score" : 60.06045071030959,
"type" : "exam"
},
{
"score" : 52.79790691903873,
"type" : "quiz"
},
{
"score" : 71.76133439165544,
"type" : "homework"
}
]
}

my solution was based on
1) loading the data:

for ids_scores_cursor in students.find():
   ids_scores_dict[ids_scores_cursor["_id"]]=ids_scores_cursor["scores"]

2) for each record I create a new_scores array:

for rec_id in ids_scores_dict:
   new_scores=[]
   min_homework=0

and work through the original scores list and load in relevant records…

3) update the students record with the new_scores

   db.students.update({'_id':rec_id},{'$set':{'scores': new_scores}},upsert=False, multi=False)

Working on MongoDB with NodeJS and Mongoose

I am starting a new project to build a new website / webservice for an Eritrean language dictionary with Nadjib.

We are going to use the MEAN stack … MongoDB, ExpressJS, AngularJS and NodeJS

My task is to work on the MongoDB – Mongoose side of the application.

I now have the very basics working:

http://wiki.ebabel.eu/index.php/MongoDB-Mongoose

This work is based on / summary of several very useful webpages.

Firstly to setup MongoDB on Ubuntu:

http://docs.mongodb.org/manual/tutorial/install-mongodb-on-ubuntu/
https://www.digitalocean.com/community/articles/how-to-install-mongodb-on-ubuntu-12-04

Next getting Mongoose working:

http://blog.modulus.io/getting-started-with-mongoose

NB The npm install of express and mongoose was handled by Nadjib for me.

Riak Zombie Apocalypse 2nd meetup (Oct 2013) – levelDB and secondary indexes

I have written up some details from “Zombie Apocalypse” part two which was held in the very beautiful offices of TTY Amsterdam.

Under my riak wiki page, I have first written up details of a vagrant project which enables to replicate the work we did on the AWS cloud in a 5-node devrel environment:

http://wiki.ebabel.eu/index.php/Riak#Vargrant_Riak_1.4_with_Zombies

https://github.com/dgapitts/vagrant-riak-1.4.2-zombies.git
I have also setup some useful aliases and give a quick overview of the main riak / riak-admin commands:

http://wiki.ebabel.eu/index.php/Riak#Getting_your_riak_devrel_cluster_started

Finally now we are running on the levelDB engine, after running a python bulk load (using protocol buffers) we can perform secondary index searches:

http://wiki.ebabel.eu/index.php/Riak#Python_Load_Scripts

In the meetup there was some in discussion of complex issues like concurrency limitations and how in the future CRDTs are meant to help:

Abstract: A CRDT is a data type whose operations commute when they are concurrent. Replicas of a CRDT eventually converge without any complex concurrency control. As an existence proof, we exhibit a non-trivial CRDT: a shared edit bu er called Treedoc. We outline the design, implementation and performance of Treedoc. We discuss how the CRDT concept can be generalised, and its limitations.
Key-words: Data replication, optimistic replication, commutative operations

http://pagesperso-systeme.lip6.fr/Marc.Shapiro/papers/RR-6956.pdf

Finally after the meetup I was chatting with Joel and he recommended the following presentation by Kyle Kingsbury (aphyr) , which is available online in both video and webpage formats

video: Call Me Maybe: Carly Rae Jepsen and the Perils of Network Partitions
webpage: http://aphyr.com/posts/285-call-me-maybe-riak

It is well worth watching/reading (i.e both), and show the limitations of all the current no-sql solutions struggle to deal with a ‘hard netowrk partition’ e.g. a drop in communication between two datacenters.

vagrant nodejs angularjs

Summary

The following vagrant project will build a nice sandbox/training environment for the nodejs and angularjs.

The main features are that it:
* Builds an Ubuntu x86_64 (12.04) virtualbox environment (vagrant-nodejs-angularjs) with nodejs and npm pre-installed
* vagrant-nodejs-angularjs command line includes tree, vim, git, unzip packages plus some “useful aliases”
* All files can be accessed from the host side, enabling usage of more “user friendly” editors like sublime text
* I can also use my regular web-browser via port-forwarding (http://localhost:4567/).
* finally as part of the install process it will clone git clone https://github.com/angular/angular-phonecat.git and branch to step-0 (git checkout -f step-0 )

I have setup a vagrant-nodejs-angularjs github repository (https://github.com/dgapitts/vagrant-nodejs-angularjs-tutorial.git)

So all you need to do (assuming you have vagrant and virtualbox already installed) is

git clone https://github.com/dgapitts/vagrant-nodejs-angularjs-tutorial.git
cd vagrant-nodejs-angularjs-tutorial
vagrant up

It takes about 2 mins to build the new nodejs virtualbox server.

For more details see : http://wiki.ebabel.eu/index.php/Vagrant-nodejs-angularjs-tutorial

Manually Installing IO on Ubuntu

Summary

The follow steps were run Ubuntu 12.04.1 LTS (GNU/Linux 3.2.0-31-generic x86_64)

Good instructions from Jetpack Flight Log
First Io requires the cmake build system so make sure that is available.

$ sudo apt-get install cmake

Next download and extract the source code.

$ wget --no-check-certificate http://github.com/stevedekorte/io/zipball/master -O io-lang.zip
$ unzip io-lang.zip
$ cd stevedekorte-io-[hash]

Io provides a build script, however it is setup to install the language to /usr/local. Since I want it to go in $HOME/local you just need to modify that file. Here is a quick one liner:

$ sed -i -e 's/^INSTALL_PREFIX="\/usr\/local/INSTALL_PREFIX="$HOME\/local/' build.sh

Now build and install.

$ ./build.sh
$ ./build.sh install

Since we are installing into a location our OS doesn’t really know about, we need to configure a few paths.

$ vim ~/.bashrc
export PATH="${HOME}/local/bin:${PATH}"
export LD_LIBRARY_PATH="${HOME}/local/lib:${LD_LIBRARY_PATH}"

# You might want these too
export LD_RUN_PATH=$LD_LIBRARY_PATH
export CPPFLAGS="-I${HOME}/local/include"
export CXXFLAGS=$CPPFLAGS
export CFLAGS=$CPPFLAGS
export MANPATH="${HOME}/local/share/man:${MANPATH}"

http://jetpackweb.com/blog/2011/02/05/installing-the-io-language-in-ubuntu/

This did initial fail, until I pre-installed the build-essential:

sudo apt-get install build-essential

There was one suggestion you could just install g++ (I didn’t try this)

There is a check script:

~/stevedekorte-io-5d35419$ io ./libs/iovm/tests/correctness/run.io
...........................................E..........................
......................................................................
......................................................................
sh: 1: _build/binaries/io_static: not found
Files /home/vagrant/stevedekorte-io-5d35419/./libs/iovm/tests/correctness/UnicodeTest-helper/UnicodeTest.tmp and /home/vagrant/stevedekorte-io-5d35419/./libs/iovm/tests/correctness/UnicodeTest-helper/UnicodeTest.txt differ
sh: 1: _build/binaries/io_static: not found
Files /home/vagrant/stevedekorte-io-5d35419/./libs/iovm/tests/correctness/UnicodeTest-helper/UnicodeTest.tmp and /home/vagrant/stevedekorte-io-5d35419/./libs/iovm/tests/correctness/UnicodeTest-helper/UnicodeTest.txt differ
sh: 1: _build/binaries/io_static: not found
Files /home/vagrant/stevedekorte-io-5d35419/./libs/iovm/tests/correctness/UnicodeTest-helper/UnicodeTest.tmp and /home/vagrant/stevedekorte-io-5d35419/./libs/iovm/tests/correctness/UnicodeTest-helper/UnicodeTest.txt differ
sh: 1: _build/binaries/io_static: not found
Files /home/vagrant/stevedekorte-io-5d35419/./libs/iovm/tests/correctness/UnicodeTest-helper/UnicodeTest.tmp and /home/vagrant/stevedekorte-io-5d35419/./libs/iovm/tests/correctness/UnicodeTest-helper/UnicodeTest.txt differ
sh: 1: _build/binaries/io_static: not found
Files /home/vagrant/stevedekorte-io-5d35419/./libs/iovm/tests/correctness/UnicodeTest-helper/UnicodeTest.tmp and /home/vagrant/stevedekorte-io-5d35419/./libs/iovm/tests/correctness/UnicodeTest-helper/UnicodeTest.txt differ
sh: 1: _build/binaries/io_static: not found
Files /home/vagrant/stevedekorte-io-5d35419/./libs/iovm/tests/correctness/UnicodeTest-helper/UnicodeTest.tmp and /home/vagrant/stevedekorte-io-5d35419/./libs/iovm/tests/correctness/UnicodeTest-helper/UnicodeTest.txt differ
sh: 1: _build/binaries/io_static: not found
Files /home/vagrant/stevedekorte-io-5d35419/./libs/iovm/tests/correctness/UnicodeTest-helper/UnicodeTest.tmp and /home/vagrant/stevedekorte-io-5d35419/./libs/iovm/tests/correctness/UnicodeTest-helper/UnicodeTest.txt differ
..........EEEEEEE..
======================================================================
FAIL: SequenceTest testAsNumber
----------------------------------------------------------------------
Exception: `Number constants nan asString != "" asNumber asString` --> `"-nan" != "nan"`
---------
Exception raise UnitTest.io 136
SequenceTest fail UnitTest.io 158
SequenceTest assertEquals SequenceTest.io 35
SequenceTest testAsNumber doString 1
======================================================================
FAIL: UnicodeTest testPrintFile
----------------------------------------------------------------------
Exception: `outcome ==(0) != nil` --> `false != true`
---------
Exception raise UnitTest.io 136
UnicodeTest fail UnitTest.io 158
UnicodeTest assertTrue UnicodeTest.io 44
UnicodeTest assertEquals UnitTest.io 179
UnicodeTest assertDiff UnicodeTest.io 76
UnicodeTest testPrintFile doString 1
======================================================================
FAIL: UnicodeTest testPrintTriQuote
----------------------------------------------------------------------
Exception: `outcome ==(0) != nil` --> `false != true`
---------
Exception raise UnitTest.io 136
UnicodeTest fail UnitTest.io 158
UnicodeTest assertTrue UnicodeTest.io 44
UnicodeTest assertEquals UnitTest.io 179
UnicodeTest assertDiff UnicodeTest.io 81
UnicodeTest testPrintTriQuote doString 1
======================================================================
FAIL: UnicodeTest testPrintMonoQuote
----------------------------------------------------------------------
Exception: `outcome ==(0) != nil` --> `false != true`
---------
Exception raise UnitTest.io 136
UnicodeTest fail UnitTest.io 158
UnicodeTest assertTrue UnicodeTest.io 44
UnicodeTest assertEquals UnitTest.io 179
UnicodeTest assertDiff UnicodeTest.io 86
UnicodeTest testPrintMonoQuote doString 1
======================================================================
FAIL: UnicodeTest testArgsMonoQuote
----------------------------------------------------------------------
Exception: `outcome ==(0) != nil` --> `false != true`
---------
Exception raise UnitTest.io 136
UnicodeTest fail UnitTest.io 158
UnicodeTest assertTrue UnicodeTest.io 44
UnicodeTest assertEquals UnitTest.io 179
UnicodeTest assertDiff UnicodeTest.io 91
UnicodeTest testArgsMonoQuote doString 1
======================================================================
FAIL: UnicodeTest testArgsTriQuote
----------------------------------------------------------------------
Exception: `outcome ==(0) != nil` --> `false != true`
---------
Exception raise UnitTest.io 136
UnicodeTest fail UnitTest.io 158
UnicodeTest assertTrue UnicodeTest.io 44
UnicodeTest assertEquals UnitTest.io 179
UnicodeTest assertDiff UnicodeTest.io 96
UnicodeTest testArgsTriQuote doString 1
======================================================================
FAIL: UnicodeTest testArgsFile
----------------------------------------------------------------------
Exception: `outcome ==(0) != nil` --> `false != true`
---------
Exception raise UnitTest.io 136
UnicodeTest fail UnitTest.io 158
UnicodeTest assertTrue UnicodeTest.io 44
UnicodeTest assertEquals UnitTest.io 179
UnicodeTest assertDiff UnicodeTest.io 101
UnicodeTest testArgsFile doString 1
======================================================================
FAIL: UnicodeTest testArgsEval
----------------------------------------------------------------------
Exception: `outcome ==(0) != nil` --> `false != true`
---------
Exception raise UnitTest.io 136
UnicodeTest fail UnitTest.io 158
UnicodeTest assertTrue UnicodeTest.io 44
UnicodeTest assertEquals UnitTest.io 179
UnicodeTest assertDiff UnicodeTest.io 109
UnicodeTest testArgsEval doString 1
----------------------------------------------------------------------
Ran 229 tests in 0.635715s
FAILED (failures 8) run

I guess 8 failed tests out 229 is not to bad… I have started using Io and it seems to be working well for me.

Hello pycassa World … installing this python client library for Apache Cassandra

Introduction

This python driver is open-source

pycassa is a python client library for Apache Cassandra with the following features:
- Automatic failover and operation retries
- Connection pooling
- Multithreading support
- A batch interface
- A class for mapping classes to Cassandra column families
https://github.com/pycassa/pycassa

not too suprising as Cassandra is a 100% open source project.


Cassandra example blog KEYSPACE with posts COLUMNFAMILY

Before we start using pycassa, here is a simple blog KEYSPACE with posts COLUMNFAMILY

CREATE KEYSPACE blog WITH
strategy_class = 'SimpleStrategy'
AND strategy_options:replication_factor = '1';
USE blog;

CREATE COLUMNFAMILY posts (
id bigint PRIMARY KEY,
user bigint,
message text);

and using the cqlsh interface

vagrant@mariadb-cassandra:~$ cqlsh
Connected to Test Cluster at localhost:9160.
[cqlsh 2.2.0 | Cassandra 1.1.5 | CQL spec 2.0.0 | Thrift protocol 19.32.0]
Use HELP for help.
cqlsh> use blog;
cqlsh:blog> SELECT * FROM posts;
 id | message | user
----+---------+------
  1 |   Hello |    1
  2 |   World |    1
  3 |    Nice |    2
cqlsh:blog> exit


Install pycassa via pip

First install pip

sudo apt-get install python-setuptools
sudo easy_install pip

then using pip we can install pycassa

sudo pip install pycassa

although this did work, it did also throw an odd error (“src/protocol/fastbinary.c:20:20: fatal error: Python.h: No such file or directory“)

vagrant@mariadb-cassandra:~$ sudo pip install pycassa
Downloading/unpacking pycassa
  Downloading pycassa-1.9.1.tar.gz (82kB): 82kB downloaded
  Running setup.py egg_info for package pycassa
Downloading/unpacking thrift (from pycassa)
  Downloading thrift-0.9.1.tar.gz
  Running setup.py egg_info for package thrift
Installing collected packages: pycassa, thrift
  Running setup.py install for pycassa
    changing mode of build/scripts-2.7/pycassaShell from 644 to 755
    changing mode of /usr/local/bin/pycassaShell to 755
  Running setup.py install for thrift
    building 'thrift.protocol.fastbinary' extension
    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c src/protocol/fastbinary.c -o build/temp.linux-x86_64-2.7/src/protocol/fastbinary.o
    src/protocol/fastbinary.c:20:20: fatal error: Python.h: No such file or directory
    compilation terminated.
    ********************************************************************************
    An error occured while trying to compile with the C extension enabled
    Attempting to build without the extension now
    ********************************************************************************
    /usr/bin/python -O /tmp/tmpZpyIFB.py
    removing /tmp/tmpZpyIFB.py
Successfully installed pycassa thrift
Cleaning up…

I might have to research this further, but it did actually install pycassa successfully!?


Hello pycassa World!

next running hello world via pycassa

vagrant@mariadb-cassandra:~$ python
Python 2.7.3 (default, Aug  1 2012, 05:14:39)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from pycassa.pool import ConnectionPool
>>> pool = ConnectionPool('blog')
>>> from pycassa.columnfamily import ColumnFamily
>>> col_fam = ColumnFamily(pool, 'posts')
>>> col_fam.get(1)
OrderedDict([(u'message', u'Hello'), (u'user', 1)])
>>> col_fam.get(1)['message']
u'Hello'
>>> col_fam.get(2)['message']
u'World'
>>> str(col_fam.get(1)['message']) + ' ' + str(col_fam.get(2)['message'])
'World World'

Finally wrapping this up into a simple python script to improve readability:

vagrant@mariadb-cassandra:~$ cat hello_pycass_world.py
from pycassa.pool import ConnectionPool
from pycassa.columnfamily import ColumnFamily
pool = ConnectionPool('blog')
col_fam = ColumnFamily(pool, 'posts')
print str(col_fam.get(1)['message']) + ' ' + str(col_fam.get(2)['message'])

and running this…

vagrant@mariadb-cassandra:~$ /usr/bin/python hello_pycass_world.py
Hello World

Investigating Cassandra Further

Following on from the “Cassandra user group Amsterdam Sept 2013 – Overview and demo of BlueConic“, I have been investigating the Cassandra data model in general.

HBase vs. BigTable Comparison
Cassandra is loosely based on Google BigTable architecture

HBase vs. BigTable Comparison
HBase is an open-source implementation of the Google BigTable architecture. That part is fairly easy to understand and grasp. What I personally feel is a bit more difficult is to understand how much HBase covers and where there are differences (still) compared to the BigTable specification. This post is an attempt to compare the two systems.
http://www.larsgeorge.com/2009/11/hbase-vs-bigtable-comparison.html

This is useful for me, as I am familar with HBase as this is covered in depth in

Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement
- PostgreSQL
- Riak
- HBase
- MongoDB
- CouchDB
- Neo4J
- Redis
- The CAP Theorem
I still think the five genres and seven databases we chose satisfy the criteria that we set out to achieve. But there are others I’d like to write about as well. These include some old favorites like SQLite and some databases you might not think of as such, like OpenLDAP and SOLR (an inverted index/search engine).
http://pragprog.com/book/rwdata/seven-databases-in-seven-weeks

this is an excellent book and I would strongly recommend it.

ColumnFamily syntax example
I am also familar with the notion of the ColumnFamily syntax, as I have done a few basic opertations in Cassandra using the new CQL interface from MariaDB:

CREATE KEYSPACE blog WITH
strategy_class = ‘SimpleStrategy’
AND strategy_options:replication_factor = ’1′;
USE blog;
CREATE COLUMNFAMILY posts (
id bigint PRIMARY KEY,
user bigint,
message text);
http://haildata.net/2013/09/cassandra-maria-db-machine/

Useful Software Engineering links

Finally I would also strongly recommend Software Engineering Radio’s Episode 179 “Cassandra with Jonathan Ellis [CTO and co-founder at DataStax]”

Recording Venue: O’Reilly Scala 2011, Santa Clara California
Guest: Jonathan Ellis
Host: Robert
Cassandra is a distributed, scalable non-relational data store influenced by the Google BigTable project and many of the distributed systems techniques pioneered by the Amazon Dynamo paper. Guest Jonathan Ellis, the program chair of the Apache Cassandra project, discusses Cassandra’s data model, storage model, techniques used to achieve high availability and provides some insight into the trend away from relational databases.
- Amazon Dynamo
- Google Bigtable
- Apache Cassandra
- Log-Structured Merge Tree
- Cassandra research paper by Lakshman and Malik
- The Phi Failure Detector
- DataStax

http://www.se-radio.net/2011/10/episode-179-cassandra-with-jonathan-ellis/

After a slightly hard to understand start (poor sound quality) this turns into an excellent program giving a really good overview of Cassandra and the evolution of NoSQL solutions.

There are two other Software Engineering links which are highly relevant:

Episode 162: Project Voldemort with Jay Kreps
Recording Venue: QCon
Guest(s): Jay Kreps
Host(s): Robert
Jay Kreps talks about the open source data store Project Voldemort. Voldemort is a distributed key-value store used by LinkedIn and other high-traffic web sites to overcome the inherent scalability limitations of a relational database. The conversation delves into the workings of a Voldemort cluster, the type of consistency guarantees that can be made in a distributed database, and the tradeoff between client and the server.
- Project Voldemort
- Jay Kreps presentation at QCon San Francisco 2009
- Google mailing list for Project Voldemort
- Project Voldmort on github
- LinkedIn blog entry on Project Voldemort
- Amazon’s paper on Dynamo
- NoSQL hub
- Google mailing list for NoSQL
http://www.se-radio.net/2010/05/episode-162-project-voldemort-with-jay-kreps/

I don’t know how much over lap there is between Project Voldemort and Cassandra, but they are clearly trying to solve similar problems and share a lot of heritage!

Next this gives some good background on SOLR and Lucene, which currently seems to be one of the biggest use cases for Cassandra:

Episode 187: Grant Ingersoll on the Solr Search Engine
Recording Venue: Lucene Revolution 2012 (Boston)
Guest: Grant Ingersoll
Grant Ingersoll, a committer on the Apache Solr and Lucene, talks with Robert about the problems of full-text search and why applications are taking control of their own search, and then continues with a dive into the architecture of the Solr search engine. The architecture portion of the interview covers the Lucene full-text index, including the text ingestion process, how indexes are built, and how the search engine ranks search results. Grant also explains some of the key differences between a search engine and a relational database, and why both have a place within modern application architectures. They close with a discussion of how Solr can scale up to serve very large indexes.
- Apache Solr project
- Apache Lucene project
- Grant Ingersoll’s blog
- Lucid Imagination
- Taming Text
http://www.se-radio.net/2012/07/episode-187-grant-ingersoll-on-the-solr-search-engine/

Cassandra user group Amsterdam Sept 2013 – Overview and demo of BlueConic

The presentator was Martijn Vanberkum from GX Software (Nijmegen)
- 93 Computer Science Class of Nijmegen
- 15 years of Content Management
- offices in San Franscio and Boston
- Older Product WebManager (Content Management) … not as sexy as it is used (no 40% growth anymore)
- New Product BlueConic (running SOLR / Cassandra)

The challenge for BlueConic trying to help big companies target advertising i.e. better than vanilla AB testng, which is market dominated by Optimizely (who also have offices in Amsterdam):

Optimizely was founded by two former Google product managers, Dan Siroker and Pete Koomen. Dan served as the Director of Analytics during the Obama 2008 presidential campaign. While there, his team relied on the use of A/B and multivariate testing to maximize e-mail sign-ups, volunteers, and donations to raise more than $100 million in additional revenue for the campaign.
https://www.optimizely.com/about

BlueConic is tracking and recomciling user actions running over Cassandra on AWS:

Linear scalability
We envisioned that BlueConic would quickly serve as a Big Data repository that would literally contain billions of artefacts on the behavior of all online visitors across all (online and offline) customer touch points. So it should be able to store a billion visitor profiles and to handle several thousands of parallel decisions per second. This made us look for an extremely scalable, responsive and flexible data store. We selected Apache Cassandra as a NoSQL data store to build our data model on.

On demand availability in the cloud
… We believe the future lies in the cloud. That’s why we made BlueConic available as on-demand service in a multi-tenant cloud infrastructure. Its data persistence layer is tenant aware, which means that each tenant has its own key space within Apache Cassandra. A fully managed on-demand cloud environment is available, including 24/7 support, automatic upgrades, backup service, and a clustered set-up for high availability and failover purposes. The environment is running on the Amazon cloud infrastructure and supports Amazon Cloudfront for content delivery.
http://www.gxsoftware.com/nl/bedrijf/blog/posts/Architectural-Considerations-for-a-Customer-Engagement-Solution.htm

Martijn gave a live demo of the product, after getting over some demo/VPN issues he went through:
- how he would demo the product via a proxy server and injecting javascript over companies website. This often causes some alert for managers / non-technies as they perceive their companies site has been hacked … a good way of getting your clients attention ;)
very like optimizely
- in the demo he uses a lightbox affect (overlay in jQuery) which appears after 30 clicks (solr/lucene facets)
- he briefly went over solr on top of lucene
- there was some discussion solandra solution… and googling solandra after there is a good pdf over of this “Solandra Scaling Solr with Cassandra – DataStax” (http://www.datastax.com/wp-content/uploads/2011/07/Scaling_Solr_with_Cassandra-CassandraSF2011.pdf)
- there was also some discussion of “3rd party cookies” regulatios and whether they are a ‘little fake’ : football international and pyshcology are not obivously the same parent company where as ve philips.nl and philips.com . There is a “do not track” standard but “Efforts to standardize Do Not Track by the W3C have so far been unsuccessful” http://en.wikipedia.org/wiki/Do_Not_Track

Most of clients are hosted (i.e. on a multi tenancy AWS solution )…
- apache cascahndra
- really fast read and writes and is persistent
- multitenancy
- OSTI stack
- DOJO interface
- HTML jQuery
- monitoring via nagios and pingdom
- one large client is self-hosted (as they have a lot of their owner servers)

The sacle of their current operations is:
- 60 tenants
- 800 channels (website)
- 150 miiliion profiles
- 7 billion interactions
- there was some discussion around any “cleanup algorithms” but this is tricky as most of the data needs to be kept for a long time for regulation purposes (that surprised me)