Choices, Choices

... for a Larger Project

Martin Andrews / @redcatlabs

7 May 2015

Project Sponsor

  • Existing SG company = "Handshakes"
  • Entities and Relationships in Graph
Handshakes Screenshot

Project Goals

  • Aim : Increase productivity
    • Natural Language Processing ("NLP")
    • 9 & 18-month Milestones (low head-count)
    • Building Watson-style "Many Experts"
    • Internal and External Customers

Key Factors

  • Developer-Friendliness
  • Language Agnostic
  • Maintainability
  • No need for "WebScale"

OS : Windows vs Linux

  • Existing System vs Developer
  • Windows Server (familiar to systems people) :
    • MS SQL
    • Bonobo git
  • Development Server (Virtual Machine) :
    • Linux inside large Hyper-V hypervisor
  • Everybody Happy

Component Parts

  • Configuration
  • Web front-end
  • Database
  • Micro-Services
  • Documentation
  • "Other"

Configuration

  • Want to be language-agnostic
  • Why not JSON?
    • More quotes, more brackets, no comments,
  • Chose YAML instead :
    • Nice overlay tooling
    • node-yaml-config

Configuration :
With Overlays


default:
  env: production
  groundtruth:
    server:
      host: groundtruth.sage
      port: 3002
    mssql: DSN=seer-dev;UID=seer;PWD=OXLEY;DATABASE=Sage_Testing;
    cache:
      dir: ./cache/
    findname:
      initialval:  50              # Descriptive text
production:
  groundtruth:
    server:
      port: 3072                   # Override
    findname:
      genres: [ person, company ]  # Extra data
   

Webserver

  • bootstrap (wrapstrap theme)
  • Node.js - express - jade
  • socket.io
  • PDF.js

Webserver : jade


body
  #wrapper
    block side-navbar
      nav.navbar-default.navbar-static-side(role='navigation')
        - // .... stuff + indentation, etc .... 
        span.clear
          span.block.m-t-xs
            strong.font-bold #{user.json.name || 'User'}
          span.text-muted.text-xs.block
            | #{user.json.title || 'Title'}
            b.caret
        ul.dropdown-menu.animated.fadeInRight.m-t-xs
          li
            a(href='profile.html') Profile
          li.divider
          li
            a(href='/user/logout') Logout
   

Webserver : socket.io


var app  = express();
var http = require('http').Server(app);
var io = require('socket.io')(http);
io.use(auth.socket_io({secret: config.web.server.session_secret})); 

io.on('connection', function(socket) {
  console.log('socket.io : '+socket.request.user.username+' connected');

  socket.on('*admin/docs/regen', function(data) {
    // make a response
    socket.emit('*admin/docs/progress', {type:type, pct:pct_update});
  });
  
  socket.on('disconnect', function() {
    console.log('socket.io : user disconnected');
  });
});
   

PDF.js Embedding

PDF.js Screenshot

Database

  • Single point of access to MS SQL
  • Need to segregate access to different snapshots
    • Run service on different ports
  • Also 'fast-levenshtein' analysis

Search Info-Scoring


curl "http://gt.sage:3002/findname/lookup/person?name=Looi%20Kok%20Loon"

{
  "success":true,
  "data":[
    { "guid":"7657398E-8E71-44C4-8309-FCE8F313111E",
      "names":[
        {"name":"Looi Kok Loon","id":210564,"active":true}
      ],
      "score":20.96
    },
    { "guid":"F84AB4AE-B0EB-4210-9FCE-710DABC653F0",
      "names":[
        {"name":"Woo Kok Loon","id":313421,"active":true}
      ],
      "score":15.68
    },
    { "guid":"8B04B8BD-245B-4335-B42F-9E174E2061C4",
      "names":[
        {"name":"Messrs Goon Kok Loon","id":368792,"active":false},
        {"name":"Goon Kok Loon","id":131638,"active":true}
      ],
      "score":15.68
    },
    ...
   

Microservices

  • REST APIs :
    • Idea : ZeroMQ / nanomsg
    • but... REQREP not quite there
    • Fallback : 'hapi'
  • Launch services : systemd
  • Service 'placement' :
    • DNS ~ /etc/hosts

hapi Routing Example


var server = new Hapi.Server();
server.connection({ port: config.server.port });

server.route({
  method: 'GET',
  path: '/relationship/guid/{guid}',
  handler: function (req, rep) {
    var guid = req.params.guid;
    return route.relationships_by_guid(db, guid, rep);
  }
});

if(!module.parent) {  // Disable start if testing elsewhere
  server.start(function () {
    console.log('Server running at:', server.info.uri);
  });
}
   

Launch via systemd


>>> /etc/systemd/system/node-service-groundtruth.service

[Service]
ExecStart=/usr/bin/node app.js --service=groundtruth
Restart=always
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=node-service-groundtruth
User=andrewsm
Group=andrewsm
Environment=NODE_ENV=production
WorkingDirectory=/home/andrewsm/SEER/services/groundtruth/server/

[Install]
WantedBy=multi-user.target
   

Poor-man's Mesh


>>> /etc/hosts

# Uncomment these when running service locally
#127.0.0.1	groundtruth.sage 
127.0.0.1	coordination.sage     
#127.0.0.1	textextract.sage 

# Add to these every time a service is enabled-by-default on sage
192.168.2.8	groundtruth.sage 
192.168.2.8	coordination.sage
192.168.2.8	textextract.sage 
   

Documentation

  • Want to embed docs in source
  • Python's Sphinx is nice
  • Added REST API-specific generators :
    • Generic JS
    • Python Flask

Documentation : JS-API


/*""".. http:get:: /relationship/guid/(string:relationship_guid)
 * 
 *  **Example query**:
 * 
 *  .. sourcecode:: bash
 * 
 *    curl http://groundtruth.sage:3002/relationship/guid/FB5B6AEB-D9B0
 * 
 *  **Example response**:
 *
 *  .. sourcecode:: http
 * 
 *    HTTP/1.1 200 OK
 *    Content-Type: text/javascript
 * 
 *      [ ... ]
 * 
 *  :param guid-string relationship_guid: the GUID of the relationship
 *  :>json array: List of data found for this relationship, as above
 *  :status 200: success
 */
   

Sphinx HTML

Sphinx HTML Screenshot

Sphinx PDF

Sphinx PDF Screenshot

"Other"

  • Message-Bus : nanomsg
    • PUBSUB is solid
  • Promises : bluebird
    • Caching
    • ... & other async
  • Testing (Integration) : mocha

nanomsg PUB-SUB


var nanomsg = require('nanomsg');

var pub = nanomsg.socket('pub');
pub.bind(config.pubsub.bind);

var status=0;
setInterval(function() {
  pub.send('/test-chan/!' + JSON.stringify({status:status}));
  status++;
}, 500);

// -- Subscriber
var sub = nanomsg.socket('sub');
sub.connect(config.pubsub.conn);
sub.on('message', function(buf) {
  console.log("Hearing : ", buf);
});     
   

Caching with Promises


var Promise = require('bluebird');
var fs = Promise.promisifyAll(require('fs'));

var p = fs.readFileAsync(cache_file, "utf8")
  .then(JSON.parse)        // Parse the file contents, or ...
  .catch(function (err) {  // The file wasn't found, etc
    return new Promise(function(resolve, reject) {
      _load_entity_names(db, subtypes, function(err, res) {
        if(err) { return resolve([]); }
        fs.writeFileAsync(cache_file, JSON.stringify(res))
          .then(function(d){ console.log("Cache file written"); }); 
        return resolve(res);
      });
    });
  });
  
p.then(function(data) {
  ...     
   

Testing


var Browser = require('zombie');
process.env.NODE_ENV = 'testing';  // force environment to 'testing'

Browser.localhost('seer.com', config.web.server.port);
Browser.runScripts = false; // This speeds things up a little...

describe('Authorization process', function() {
  before(function(done) {
    this.server = http.createServer(app).listen(port, done);
  });
  
  it('redirect new arrival to /user/login', function(done) {
    var browser = Browser.create();
    browser.visit('/')
      .then(function() {
        browser.assert.success();
        browser.assert.text('title', 'SEER | Login');
      })
      .then(done,done);
  });
});
   

Wrap-up

  • Lots of choices out there
  • Understand the trade-offs
  • ... now on to the real work ...

- QUESTIONS -


Martin.Andrews @
RedCatLabs.com


http:// RedCatLabs.com /


We are Looking for 1 Keen Singaporean...