Saturday, February 18, 2017

Node js C++ Addon Overload Resolution - v0.7.0

Another minor release but with a major new type!

- AsyncCallback - this is a new type the module exposes, it allows a stand alone thread to do callbacks into v8 by queuing the call data and signalling libuv to execute it next time the event loop executes, by far the most useful feature of this release!

Note that there is an option to change the callback from weak to strong and by that making node js wait until the callback is destroyed before exiting making it useful in many situations.

- expose get_type to classes outside the node-overload resolution - while its very beneficial to hide most of the type information from classes using this module, in some cases such as property accessors which don't pass through the module but still need to make sure the data types passed to them is the one expected, get_type exposes this functionality.

- fix array convertible checks - when using arrays, the old method wasn't checking the correct conversion is possible from array types to other array types. while it is possible to convert array to number (in JavaScript anything is possible ;-) ), it should not be considered as a valid convertible but now it is also allowed to convert an array of derived to array of base.

Saturday, February 4, 2017

Node js C++ Addon Overload Resolution - v0.5.0

After being stuck with a few tests for node-alvision, I've decided to see why they take so much time.

From what I could gather while doing performance analysis, the major bottlenecks were the functions that determine the appropriate overload in general and the ones that do the actual type analysis and convertible checks.

Doing type analysis in v8 is not so straight forward as it seems, numbers are convertible to strings and back, almost anything can be converted to boolean and on top of that, checking if a certain object belongs to a C++ class also needs some work.

What the POC did is go over all the registered types and then determine which type the class belongs to but that can be shortened to checking only the names of the prototype chain, so if we have a type system of 230 types give or take, it will go over 3-5 the most after the change.

Now, if we have a function with 10 parameters and with 10 overloads, it would go over 100 type checks the most. so I've took out the arguments checks out of the overload matching function and created another object called function_arguments, which queried the argument type only once instead of multiple times and returned a cached result.

Another performance improvement is the array type checks, which went over the entire array and checked each element type, instead of it, I've decided to make it a little less reliable for the sake of performance, so now it will check 10 the most, so if we have a 1000 items in the array, it will skip every 100 items and check only their types.

So all of these are actually a sort of type system, which I refactored it into a new type_system class.

While it did boost the performance, I wasn't pleased with the boost, so instead of waiting about a minute for results, it would take only 20 seconds.

I went ahead and analyzed the bottlenecks further and found it that while the type checks themselves are no longer an issue, going over 10-20 overload for certain functions is still time consuming and after doing it once, if I know the argument types, there's no reason to do it again for the same function/classes.

This is where function_rank_cache class came into play, it caches the correct function given the same conditions apply, same class/function and argument types.

This improved the performance greatly and now I'm left with about 5 seconds of actual testing time for the said test.

All of these improvements were made in Debug, compiling it as Release improved things beyond my expectations for the moment.

In the end, what helped the most is doing only the work necessary :-)

Monday, January 30, 2017

OpenCV Node js Bindings - Progress

I've decided to try and take a break today from implementing APIs, optimizing the node-overload-resolution and trying to execute OpenCV tests and just see if the fruits of my labor are actually working.

So I've started with this piece of code:

 var m = new alvision.Mat(500, 500, alvision.MatrixType.CV_8SC3);  
 alvision.circle(m, new alvision.Point(250, 250), 100, new alvision.Scalar(0, 0, 255), 3, alvision.LineTypes.FILLED);  
 alvision.imshow("test window", m);  
 alvision.waitKey(10000);  

Seems simple enough no?

but then I've discovered that I didn't implement circle yet:

alvision.circle(m, new alvision.Point(250, 250), 100, new alvision.Scalar(0, 0, 255), 3, alvision.LineTypes.FILLED);
         ^
Error: error executing +circle not implemented
    at Error (native)
    at Object. (node-alvision\test\tmp.ts:73:10)
    at Module._compile (module.js:556:32)
    at Module.m._compile (node-alvision\node_modules\ts-node\src\index.ts:299:25)
    at Module._extensions..js (module.js:565:10)
    at Object.require.extensions.(anonymous function) [as .ts] (node-alvision\node_modules\ts-node\src\index.ts:302:14)
    at Module.load (module.js:473:32)
    at tryModuleLoad (module.js:432:12)
    at Function.Module._load (module.js:424:3)
    at Function.Module.runMain (module.js:590:10)


but now that I got the overload resolution working, I've added these lines:

POLY_METHOD(imgproc::circle){
  auto img =info.at<IOArray*>(0)->GetInputOutputArray();
  auto center =info.at<Point*>(1)->_point;
  auto radius =info.at<int>(2);
  auto color =info.at<Scalar*>(3)->_scalar;
  auto thickness =info.at<int>(4);
  auto lineType =info.at<int>(5);
  auto shift =info.at<int>(6);
  cv::circle(img, *center, radius, *color, thickness, lineType, shift);

 }

Took about 20 seconds including compile time. 

And the result:



So far so good!

Friday, January 13, 2017

Node OpenCV Addon

I'm glad to finally say I have a good start as far as I can tell for implementing a good OpenCV addon for node js.

Its not a sprint but a marathon, OpenCV is one of the largest, most complex libraries I had to deal with and I've tested many ways to implement it. finally found a sound way to actually do it.


The adventure started some time ago when I attempted to use a few functions to display an augmented reality diffusion tensor file on a marker with a tablet (I got some help from Frank from DSI Studio, Thanks Frank!), it later became a challenge as I learned more and more about C++, Node js and OpenCV.


The first few things I've tried was using ffmpeg api in node js, ffmpeg is written in C, which lacks automatic object lifetime management (constructors/destructors) but contains allocators where you need to explicitly free unused objects and buffers.


I've decided the best way was to try and implement a C++ wrapper on top of these APIs. it was challenging to find the appropriate lifecycle from the documentation but reading the C code helped a lot, eventually I've completed a C++ wrapper which could access most of ffmpeg API with an intuitive API.


The adventure didn't end there, I've used OpenCV Matrix (cv::Mat) as the base container for both image and audio frames which allowed me to explore the OpenCV API with both video and passthrough audio but with a possibility to eventually process the audio as well since its easily accessible.


I've started to explore the OpenCV API and I really liked the idea to manipulate the video frames with Node js, so I decided to implement a few of OpenCV APIs but then I thought how hard would it be to implement the entire OpenCV API in node js, so I went ahead and learned about Node Addons, V8 and NAN.


Many people think that Javascript is not suitable for processing Video/Audio but the truth of the matter is that Javascript is not really processing anything, its just a scripting engine. The real magic stays in C/C++ domain and the delay Node js/v8 adds is negligible when you see how much time compressing a frame or executing canny on it takes, those microseconds do add up, but in terms of percentage, it should be nearly invisible in the total program execution time, besides, this is more for fun than for science.

But... Javascript is not entirely suitable for scripting complex APIs and if you add the number of function overloads possible in OpenCV, it will probably be hell to use not to talk about development, this made me think in the Typescript direction, which can both help the intellisense and make development easier since it enforces some type checking at development time and that could eventually be used to make the API a lot more readable and intuitive.

A few things made developing the API not the most fun thing in the world, for example, the way v8 exposes its internal objects, everything looks like an object, all numbers are actually a double and there is no possibility to overload function implementation internally.


These reasons and more made me think and develop node-overload-resolution project, which addressed most of these problems by allowing me to add overloads to functions, add parameter validations, automatic conversion between v8 data types and C++ data types, automatically execute function implementations in the libuv threadpool as async functions and handling return values and exceptions without explicitly writing a single line of code in the addon itself.


So why am I writing all of this today?

Today the first test passed, not my tests, not all the tests, but the first OpenCV test which I ported to typescript to make sure the API is working properly.

Hopefully its not the last :-)


But let me be completely honest, it is by no way or shape ready for anything, the amount of APIs implemented are very little, even the build at the moment is complex and takes anywhere between 30 minutes to an hour. Since I've created the opencv_ts branch I didn't even check if the code compiles on linux, which it did before, on linux32/64 and even arm and NDK, sorry linux guys, in terms of IDE for C++, I have yet to see anything which comes close to Visual Studio ;-)

Saturday, December 24, 2016

Node js C++ Addon Overload Resolution - Update

Since the last update on December 13th, multiple enhancements have been made, So far I'm happy!

Major changes and improvements:


  • The major performance issue in the tests have been found, when many overloads are added, the engine goes over all of them to find the most suitable one, this takes time and CPU. The first part that needs optimization is probably the determineType function.
  • make_param added overloads and template type specialization for multiple types, which makes the overload creation simpler and more robust.
  • Automatic async function implementation is now simple and straight forward, as long as only c++ typed templated functions are used for This<T> and at<T> (replaces info[]) and SetReturnValue<T>, the function should be able to run async, just note that async functions are executed in libuv's threadpool, which means you'll have to handle locks manually or use thread safe function calls. With locks comes great responsibility, node js libuv threadpool default size is 4 threads, too many waiting threads will either kill or slow down node js significantly.
  • Accessing This, info[] and GetReturnValue will now throw an exception if attempted execution in async, no more crashes if these calls were made by mistake, just a normal v8 exception.
  • Partial logging for what happens in the engine, TRACE for function calls, DEBUG for diagnostics, WARN for something that's not as designed and ERROR for usage mistakes.

Monday, December 19, 2016

Node js Addon Logger Module

One of the issues I'm attempting to solve while programming node js addons is logging. I've looked for solutions for some time but there's no way to communicate to v8 the addon's internal logs. so I'm left with a few options which are not completely satisfactory. 

Dump the logs to console

I can dump the logs to the console but how do I redirect them to a file? if I redirect the entire stderr or stdout, I'll have irrelevant log entries in my log and trying to split them by module name could be useless work I'll need to do later.

Dump the logs to file

Dumping the logs to a log file is probably the easiest solution, I could use something like log4cpp or Pantheios, but importing them into a node addon could add a whole other set of problems, like how to set the appender at runtime and how do I manage modules, but Pantheios is claimed to be one of the fastest frameworks by their own benchmarks.

Eventually I've decided to try to POC my own with node js domain in mind, its not the best, but its good enough (tm), at least for now.


The basic requirements are:

- must support multithreaded logging
- must be easy on the v8 engine, e.g don't freeze the VM.
- must be able to filter out unneeded log levels, so debug/trace in debug sessons and info and above for production.
- optional - communicate with v8 the log messages

So I went ahead and planned it, its not too complicated, we have only two components at the moment, a queue for holding the debug messages between flushes and a notifier to v8 that there are pending log messages to be processed.


At the moment the queue is a quick write of locking std::queue implemented with std::lock_guard<std::mutex> with an atomic counter to keep track of the number of messages in the queue.


the notifier is implemented as libuv uv_async_send api, which basically schedules an async to be executed on the main node js thread.


on top of it I've added the following APIs exposed to node:


RegisterLogger - to register log callbacks, this way node js can use log4js and do everything in one place, not very efficient, but again, good enough for now.


Flush - which forces the callback to execute and flush any pending log messages

log_level property - to control what to log, so if we set this on Error but a Debug message comes in, the logger will ignore it.

batch_length property - how many log messages to process in one main thread event loop iteration, reducing it will make the log slower but the vm will be more responsive, increasing it may cause the vm to freeze when many log messages are being returned back to v8.


and of curse Log - which can be used for testing purposes or anything else, just don't use it for actual logging as the round trip from v8 -> c++ -> v8 is a waste of perfectly good cpu and memory resources.


Issues:

- Performance, which could be improved by using a better lock-free queue or perhaps even using a queue per thread as node js is essentially threadpooled. perhaps the callback can accept an array of log messages saving the round trip time between v8 and c++.

- Separation of logs, currently all log messages go in the same queue, so when your callback is getting called, its getting all the messages. 


- One callback only, no support for multiple callbacks


- Queue is only increasing in size, its not a memory leak, but if you don't flush your callback every once in a while, it might look like one. std::queue does not implement shrink_to_fit, so the only way to flush it is to create a new one and swap them.


- Log messages might get lost when the vm crashes, since all logging is done in memory, if the vm crashes without the logs being purged, the messages will be lost to the gods of memory dumps.


This module is a start, I'll probably improve it as the needs grow/change.


You can find the project here:

https://github.com/drorgl/node-addon-tracer

What's next? Perhaps an instrumentation module... :-)

Tuesday, December 13, 2016

Node js C++ Addon Overload Resolution - Refactor & Add C++ Type Converter

Remember node-overload-resolution ?

I've been working on it for a while to see how I can support automatic async support.

I always believed if you can work a little harder to save a lot of time later, its most likely going to pay off, one of the biggest motivation to write the node-overload-resolution project was to easily convert C++ libraries to Node js without changing the API too much.

One of the major roadblocks in the last version of node-overload-resolution to this goal is that v8 objects are not accessible through any other thread other than node js main thread since node doesn't release the locker on v8 vm. To solve this issue I thought and implemented a way to parse and store the v8 objects as their projected C++ objects. in essence copying v8::String to std::string, v8::Number to double and so on.

Some issues I've encountered with this approach is how to store v8 Objects, C++ Class wrappers and Arrays, but if I can actually tell the resolution module which C++ types each v8 type translates to, maybe it could work.

I've been working on this premise and implemented a converter and value holder on top of the working overload resolution module and currently it supports Number, Function, String, Buffer, a typed struct and ObjectWrap.

Currently there are issues with Function, its implemented as or::Callback, but the actual function is not called yet.

Other issues is that I couldn't find a translation for Promise, Proxy and RegExp.

I think this module is on the correct path at the moment, it should be relatively easy to support async function calls by adding a check if the matched function have an additional function callback parameter, caching the parameter value as C++ type, pushing the function call to libuv threadpool and executing the return value conversion and callbacks activation when the thread finishes execution.

You can find the code on the native_types branch.

The code is concentrated in 3 main components, the value_holder, the value_converter and the generic_value_holder. 

The value_holder is a derived template, storing the Value inside a template type.

template<typename T>
class value_holder : public value_holder_base {
public:
    T Value;
    value_holder() : value_holder_base() {}
    value_holder(T val) : value_holder_base(), Value(val) {}
    virtual ~value_holder() {}
};



The value_converter is a derived template with template specialization for each major type handled, the specialization is both for primitives (including v8 basic types) and for derived classes for IStructuredObject and ObjectWrap classes, allowing a more specific behavior for structures and C++ classes, such as parsing/creating new v8 objects as well as ObjectWrap::Wrap and ObjectWrap::Unwrap.

for example, this template specialization is for all derived classes of ObjectWrap, it wraps/unwraps to/from v8::Object:
template<typename T>
class value_converter<T*, typename std::enable_if<std::is_base_of<ObjectWrap, T>::value>::type> : publicvalue_converter_base {
public:

    virtual T* convert(v8::Local<v8::Value> from) {
        return or::ObjectWrap::Unwrap<T>(from.As<v8::Object>());
    }


    virtual v8::Local<v8::Value> convert(T* from) {
        return from->Wrap();
    }

    virtual v8::Local<v8::Value> convert(std::shared_ptr<value_holder_base> from) {
        auto from_value = std::dynamic_pointer_cast<value_holder<T*>>(from);
        return from_value->Value->Wrap();
    }

    virtual std::shared_ptr<value_holder_base> read(v8::Local<v8::Value> val) {
        auto parsed_value = std::make_shared<value_holder<T*>>();
        parsed_value->Value = convert(val);
        return parsed_value;
    }

};

lastly the generic_value_holder stores a pair of value_holder and value_converter and by that it can act as some sort of non-convertible variant that can return a v8 object from intended C++ types.

class generic_value_holder {
private:
    std::shared_ptr< or ::value_converter_base> _prefetcher;
    std::shared_ptr< or ::value_holder_base> _value;
public:
    void Set(std::shared_ptr< or ::value_converter_base> value_converter, std::shared_ptr< or ::value_holder_base> value) {
        _prefetcher = value_converter;
        _value = value;
    }

    template<typename T>
    void Set(T returnValue) {
        //store value_converter type
        auto returnPrefetcher = std::make_shared < or ::value_converter<T>>();

        //store value inside a valueholder
        auto valueHolder = std::make_shared < or ::value_holder<T>>();
        valueHolder->Value = returnValue;

        Set(returnPrefetcher, valueHolder);
    }

    v8::Local<v8::Value> Get() {
        return _prefetcher->convert(_value);
    }
};

Update 2016-12-14:
The automatic async has been partially implemented, at this moment, the tests are working as far as I can see. what happens is that the overload resolution is checking the last argument passed to the function, if its not part of the function parameters and its type is a function, its assumed to be an async request. so the overload resolution engine stores all the function arguments in memory as C++ objects, calls the function and post process the return value and lastly calls the callback function supplied.

for example, assuming you have a function test(arg1 : string, arg2 : number), if a match is found that contains test("a",1), it will execute the function synchronously. But if a match is found as test("a",1,function(err,val){}), it will be executed asynchronously (note that there is no way to know which parameters are defined for function(err,val), so the example above is for clarity sake.

The functions implementation have to use info.at<T>(index) to read the parameters and have to use info.SetReturnValue<T>(value) to return a value, otherwise the implementation will fail because node js keeps the vm locked and there's no access to v8 objects through normal info[index] mechanism, if the implementer insists on using info[index], an access violation will occur and the application will crash when executing as async.

TODO:
- Finish the callback implementation.
- Implement Async call detection and execution - partial.
- This() access from inside a class member function
- cleanup