Saturday, March 16, 2013

MemoryMappedFile


Ever since .NET 4.0 came out I've been wanting to play with MemoryMappedFile.

MemoryMappedFile is a way to map a file to memory address, so you're able to access it just like you would a byte array (for example).

At first I've attempted to create a simple document store like library, but MemoryMappedFile has its own quirks and working around them seemed a like a task for a library.

So feel free to use this FileStore library in your own projects, mind you, this library wasn't tested beyond simple sanity tests, no unit tests or integration tests have been done and in its current form it is in no way thread safe.

There are two main classes in the library:
1. MemoryMapper which is a wrapper around MemoryMappedFile, designed to allow file growing and multiple views/streams create/cleanup.
It has CreateView/CreateStream for creating accessors and Flush which is used to free unused accessors and flush the accessors changes to disk.

2. FileStoreManager - a document store-like object which stores data on one file and the data table with guid key on another file, benchmarks are pretty good for its scope, main bottlenecks are the binary formatter and the fact that there's no index so it has to scan through the data table file, I've added a dictionary for faster access, also there's no handling for modifying or deleting objects.


So here are a few conclusions from that project:

1. CreateViewAccessor and CreateViewStream are VERY slow, use blocks as large as possible over creating views/streams for each block.
2. There's a limit to how many Views and Streams you can create, since its mapping to a contiguous memory area, it might pose a problem for the calling program to allocate memory for other things or throw an exception itself ("Not enough storage is available to process this command."). this is more likely to happen in x86 because of the short address space of 32bit applications.
3. Disposing the accessors flushes their changes to disk, if you're doing many of these, it could slow things down.
4. Resizing the file requires disposing the MemoryMappedFile and its Streams/Views, doing it is an expensive operation so you should try to minimize file resizes.

You might want to take a look at RaptorDB by Mehdi Gholam.

You can find the project here:

https://github.com/drorgl/ForBlog/tree/master/MemoryMappedFileDemo

Saturday, February 9, 2013

NuGet Package for RTree

I've just created a NuGet Package for the RTree porting I've done a few years ago, I've seen people are still interested in that project so why not make their lives easier?


I've also updated the project at (Just SVN):


I've added simple tests and changed the log4net dependency to NuGet.

I've also seen that Aled Morris updated his java documentation at:


Its not exactly the same anymore as the version I've ported is from 2009.





Friday, February 1, 2013

Passing Multiple Values to SQL


This article started as a collection of ways to pass multiple values to SQL, some of the articles I've written in the past became benchmarks because I just can't resist the temptation of checking how much can one thing do over the other and what's the best uses for each case.

So allow me to show you some of my findings.

In the past (SQL 2005 era) when we wanted to pass multiple values, we probably used splits, XML, dynamic queries and whatnot.

SQL 2008 introduced user defined table types which allowed us to pass whole tables as a SQL parameter.

Starting SQL 2008 we got merge which could be used on user defined table types and XML for bulk inserts/updates, which, in my opinion, is a great possibility.

So lets see what we're going to cover.

1. Split

2. SQLBulkCopy

3. XML Parameters

4. Dynamic SQL

5. Pre-rendered SQL (which is just a fancy way of saying I'm going to convert SqlCommand to string so we can see if that's the way ADO.NET passes the values).

6. User Defined Table Types


So Lets Start:

1. Split

Lets say we have a column with multiple values in the same varchar field, but we need to join on these values to get a report our boss needs.

What do we do?

Well, here's a split function that Mladen Prajdić wrote:


CREATE FUNCTION dbo.Split(@data NVARCHAR(MAX), @delimiter NVARCHAR(5))
RETURNS @t TABLE (data NVARCHAR(max))
AS
BEGIN
    
    DECLARE @textXML XML;
    SELECT    @textXML = CAST('<d>' + REPLACE(@data, @delimiter, '</d><d>') + '</d>' AS XML);

    INSERT INTO @t(data)
    SELECT  T.split.value('.', 'nvarchar(max)') AS data
    FROM    @textXML.nodes('/d') T(split)
    
    RETURN
END



Performance is not so bad but also no great. so great, we have that report, but what if they decide to make that report available to clients?


Well, I've got a solution for that but I'm not sure you're gonna like it, it has something to do with triggers.

How then? well, create a cache table and triggers to update it, then you can index that cache table and get the job done.

2. SQLBulkCopy

One of the oldest ways of bulk inserts is with BULK INSERT , but its more of a DBA role than a developer, but ADO.NET does provide a way for bulk inserts, its called SQLBulkCopy.

We can set the batch size, timeouts, column mappings and some more settings.


using (SqlBulkCopy bulkCopy = new SqlBulkCopy(connection, SqlBulkCopyOptions.Default,null))
{
    bulkCopy.BatchSize = this.BulkOperationsCount;
    bulkCopy.DestinationTableName = "Entities";
    bulkCopy.WriteToServer(rows.ToArray());
}


3. XML Parameters

Starting SQL 2005, there is XML data type, which meant we could pass XML to stored procedures, store it in tables and query by it.

This made SQL server very robust as there is a possibility to pass multiple values without resorting to workarounds.


with upusers (EntityId, Name, Phone, Email, Address, City, Zip, State, Country, BirthDate)
as
(    
    SELECT 
        T.val.value('(@EntityId)','int') as EntityId,
        T.val.value('(@Name)','nvarchar(max)') as Name,
        T.val.value('(@Phone)','nvarchar(max)') as Phone,
        T.val.value('(@Email)','nvarchar(max)') as Email,
        T.val.value('(@Address)','nvarchar(max)') as Address,
        T.val.value('(@City)','nvarchar(max)') as City,
        T.val.value('(@Zip)','nvarchar(max)') as Zip,
        T.val.value('(@State)','nvarchar(max)') as State,
        T.val.value('(@Country)','nvarchar(max)') as Country,
        T.val.value('(@BirthDate)','nvarchar(max)') as BirthDate

    FROM @xmlval.nodes('/root/item') T(val)
)

merge Entities as target
using (select * from upusers) as source (EntityId, Name, Phone, Email, Address, City, Zip, State, Country, BirthDate)
on (target.EntityId = source.EntityId)
when matched then
    update 
        set Name = source.Name,
         Phone = source.Phone,
         Email = source.Email,
         Address = source.Address,
         City = source.City,
         Zip = source.Zip,
         State = source.State,
         Country = source.Country,
         BirthDate = source.Birthdate
when Not Matched Then
    insert (Name, Phone, Email, Address, City, Zip, State, Country, Birthdate)
    values (source.Name, source.Phone, source.Email, source.Address, source.City, source.Zip, source.State, source.Country, source.Birthdate)



4. Dynamic SQL

Dynamic was always an ugly love child of programmers, it made everything simpler from a program flow point of view, it made things hacky and ugly when it wasn't made properly and a hell to fix whenever there's a problem.

Dynamic SQL done wrong is the main cause of SQL injection, but like many good things Microsoft brought to us in SQL 2005 which sp_executesql was one of them, its possible to write dynamic SQL and avoid most of the injection problems.

It should be noted that writing proper insert queries is also affecting performance, which you can see in NoOptimizationsDynamicExecution.

So an unoptimized insert query would look like this:


INSERT INTO tablename(col1, col2,...)
     VALUES (..,..,..);
INSERT INTO tablename(col1, col2,...)
     VALUES (..,..,..);
INSERT INTO tablename(col1, col2,...)
     VALUES (..,..,..);  
...


while optimized insert query would look like this:


INSERT INTO tablename(col1, col2,...)
     VALUES (..,..,..),(..,..,..),(..,..,..)...


5. Pre-render SQL

RenderSQL.cs is a class from the IQueryable to SQL project, I've converted it to SqlCommand to see if I can better understand how the query is converted from parameters to sp_executesql.

For reference, the benchmarks is executing the User Defined Table Type query, passed through RenderSQL.

6. User Defined Table Type

User Defined Table Types provides us with reasonable performance and IMHO, huge readability and program flow benefits. Instead of going through multiple layers, converting the data from X to Y and doing the same in SQL, you just pass a List of SqlDataRecord, SqlDataReader or a DataTable and on SQL side, you just use the parameter as a normal table variable.

so assuming we have this:


string testcommand = @"
select * from Entities
where Entities.EntityId in ({0})
";

using (SqlConnection connection = new SqlConnection(this.ConnectionString))
{
    connection.Open();
    using (SqlCommand command = new SqlCommand(string.Format(testcommand, string.Join(",", this.SelectRecords.Select(i => i.GetValue(0)).ToArray())),connection)) 
    {
        command.CommandType = CommandType.Text;

        using (var reader = command.ExecuteReader())
        {
            ...
        }
    }
}


We just need to use this instead:

string testcommand = @"
select * from Entities
join @intt 
    on [@intt].Id = Entities.EntityId
";

using (SqlConnection connection = new SqlConnection(ConfigurationManager.ConnectionStrings["TestConnectionString"].ConnectionString))
{
    connection.Open();
    using (SqlCommand command = new SqlCommand(testcommand, connection))
    {
        command.CommandType = CommandType.Text;

        var param = command.CreateParameter();
        param.SqlDbType = SqlDbType.Structured;
        param.ParameterName = "@intt";
        param.TypeName = "udt_inttable";

        var dt = new DataTable();
        dt.Columns.Add("Id");
        dt.Rows.Add(1);
        dt.Rows.Add(2);
        dt.Rows.Add(3);
        ...

        param.Value = dt;

        command.Parameters.Add(param);

        using (var reader = command.ExecuteReader())
        {
            ...
        }
    }
}


Simple, right?

After digging around (and looking at SQL Statistics, Compilations/Recompilations and this article), it seems that the query is not cached and according to this, its by design.

Project
https://github.com/drorgl/ForBlog/tree/master/MultiplParametersSQL
References:
http://www.codeproject.com/Articles/25457/Multiple-Ways-to-do-Multiple-Inserts

And Benchmarks, I've attempted to make exaggerated caching so it will affect the benchmarks as little as possible, in the real world, actually getting the data to SQL might be your bottleneck.
http://uhurumkate.blogspot.com/p/sql-bulk-insert-benchmarks.html

During the test I got this very nice error message:

The query processor ran out of internal resources and could not produce a query plan. This is a rare event and only expected for extremely complex queries or queries that reference a
very large number of tables or partitions. Please simplify the query. If you believe you have received this message in error, contact Customer Support Services for more information.


Saturday, January 26, 2013

Using Addins in WPF

I was starting to program an idea I've had, you know what they say, Necessity is the mother of invention... 

Its about unifying all the notifications in my daily tasks, RSS, emails, website updates (that didn't bother implementing RSS), etc' into one place, under one application, with each one able to trigger its own workflow, execute applications, popup on screen, queue messages in the systray and more.

So, one of the first things to do when writing an application like that, is to make it pluggable, makes it easy to extend the functionality in a very easy manner.

When developing such an application, one needs a way to configure each plugin, eventually I've figured out a way to do it in WPF.

I'm using composition for simplicity, from performance point of view, you might want to use the alternative I offered here

ApplicationPlugins = _container.GetExportedValues<IWPFApplicationPlugin>();


The way I'm doing it is creating a container window which has cancel and ok buttons and a grid which will later contain the user control from the plugins.

<Window x:Class="WPFApplication.SettingsWindow"
        xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
        xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
        Title="SettingsWindow" mc:Ignorable="d" xmlns:d="http://schemas.microsoft.com/expression/blend/2008" 
        xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
        d:DesignHeight="386" d:DesignWidth="501" SizeToContent="WidthAndHeight" Closing="Window_Closing">
    <Grid Width="Auto" Height="Auto">
        <Button Content="Ok" Height="25" HorizontalAlignment="Right" Margin="0,0,12,12" Name="btnOk" VerticalAlignment="Bottom" Width="98" Click="btnOk_Click" />
        <Button Content="Cancel" Height="25" HorizontalAlignment="Left" Margin="12,0,0,12" Name="btnCancel" VerticalAlignment="Bottom" Width="98" Click="btnCancel_Click" />
        <Grid Margin="12,12,12,43" Name="UCContainer" Width="Auto" Height="Auto"  />
    </Grid>
</Window>


When a user wants to add another record from a specific plugin, I open up a new window, populate the UCContainer's children with the user control from the plugin and display it.


public bool? ShowDialog(IWPFApplicationPlugin plugin, string data)
{
    this.m_plugin = plugin;
    var uc = this.m_plugin.GetUserControl();
    this.m_plugin.Reset();
    if (data != null)
        this.m_plugin.SetData(data);
    this.UCContainer.Children.Add(uc);
    this.UCContainer.MinHeight = uc.MinHeight;
    this.UCContainer.MinWidth = uc.MinWidth;
    return this.ShowDialog();
}


After looking it for myself and seeing people struggling with setting the window's MinWidth/MinHeight based on the content, here's how I got it:


private void Window_Loaded(object sender, RoutedEventArgs e)
{
    Size margin = new Size();
    FrameworkElement contentElement = this.Content as FrameworkElement;
    if (contentElement != null)
    {
        margin.Width = this.Width - contentElement.ActualWidth;
        margin.Height = this.Height - contentElement.ActualHeight;
    }

    Rect size = VisualTreeHelper.GetDescendantBounds(this);

    this.MinHeight = size.Height + margin.Height;
    this.MinWidth = size.Width + margin.Width ;
}

You can find the project at:
https://github.com/drorgl/ForBlog/tree/master/WPFPlugins



Sunday, November 11, 2012

Getting Domain Name

How can you parse a sub-domain (e.g. www.google.com) and determine the domain (e.g. google.com) from it?

I've tried suggestions starting from regex to parsing it manually, but there are no clear rules.

So now what?

Well, there are lists of 1st level domains and 2nd level domains all over the net, browsers use them to determine which domain you can put cookies on, lets say you have your domain by the name of mydomain.com, you have many websites such as site1.mydomain.com and site2.mydomain.com but you want to share cookies between them, you can set cookies on mydomain.com and both sites will be able to access these cookies, but allowing the websites to read/write cookies on .com domain will be somewhat a security risk and browsers prevent that.

How do they determine what is the domain from the subdomain? 
We can have .com or .co.uk or .info.pl, so there no actual rules we can build an algorithm on.

From the tld and ccsld lists we can determine what the domain part is, by going through the levels until we can no longer find them in the lists, the next one up is the domain name.

This way of determining the domain from subdomains is pretty quick, for 100k domains it takes roughly 275ms on my machine once the tld/sld list is populated.

Here's the code:


/// <summary>
/// Retrieves a domain from a subdomain
/// </summary>
/// <param name="subdomain">the subdomain to be parsed</param>
/// <returns>domain</returns>
public static string GetDomain(string subdomain)
{
    if (string.IsNullOrWhiteSpace(subdomain))
        return null;

    //make sure we have a fresh version of the domain list
    CheckCache();

    //clean up the subdomain
    var cleandomain = subdomain.Trim().ToLower();
    
    //split it into parts by dot
    var domainparts = cleandomain.Split('.');

    //assign the top of the domain parts
    string result = domainparts[domainparts.Length - 1];

    //go over the rest of the parts and add them to the domain until we failed to find a 
    //match in the _domains HashSet, this means we've reached the domain.
    for (int i = domainparts.Length-2; i >= 0; i--)
    {
        if (!_domains.Contains("." + result))
            break;

        result = domainparts[i] + "." + result;
    }

    return result;
}


What happens is that we spit the domain to its parts, then we go over each part and see where we fail to locate that domain in our hashset.

The way to populate the hashset is just insert com, co.uk, info.pl etc'.

Here's the project:
https://github.com/drorgl/ForBlog/tree/master/DomainParsing


Friday, October 26, 2012

Translating Objects of One Type to Another with AutoMapper


If you need to convert object types from one to another, writing it by hand could be a pain, check out AutoMapper.

There's a good getting started on the project's wiki.

The basic idea is to define a map between the source and the destination:

Mapper.CreateMap<source, destination>();

To make sure your mappings are correct:

Mapper.AssertConfigurationIsValid();

You can flatten hierarchies if you name the destination properties as such:

Source:
Element.Name
Destination:
ElementName

You can define conditions in which a certain property will be mapped or not:

Mapper.CreateMap<SourceEntity, DestinationModel>()
    .ForMember(dest => dest.Name, opt=>opt.Condition(i=>((SourceEntity)i.Parent.SourceValue).Id == 2));

You can define custom mappings - for example, we'll switch between the name and the description:

Mapper.CreateMap<SourceEntity, DestinationModel>()
    .ForMember(dest => dest.Name, opt => opt.MapFrom(source => source.Description))
    .ForMember(dest => dest.Description, opt => opt.MapFrom(source => source.Name));

You can have custom mapping transformations:

For Example with lambda expressions:

Mapper.CreateMap<SourceEntity, FlatDestinationModel>()
    .ForMember(i => i.AttributesSomeValue, opt => opt.ResolveUsing(i=>string.Join(", ", i.Attributes.Select(a=>a.Value).ToArray())));

Or with a ValueResolver class:
class AttributeValueResolver : ValueResolver<SourceEntity,string>
{
    protected override string ResolveCore(SourceEntity source)
    {
        return string.Join(", ", source.Attributes.Select(i=>i.Value).ToArray());
    }
}

Mapper.CreateMap<SourceEntity, FlatDestinationModel>().ForMember(i => i.AttributesSomeValue, opt => opt.ResolveUsing<AttributeValueResolver>());

You can ignore certain properties:

Mapper.CreateMap<SourceEntity, DestinationModel>().ForMember(dest => dest.Description, opt => opt.Ignore());

You can set specific order in which the properties are mapped:

Mapper.CreateMap<SourceEntity, ChangeAwareDestinationModel>()
   .ForMember(dest => dest.Description, opt => opt.SetMappingOrder(3))
   .ForMember(dest => dest.Name, opt => opt.SetMappingOrder(2));

You can set a specific value for all mapping operations:

Mapper.CreateMap<SourceEntity, DestinationModel>().ForMember(dest => dest.Description, opt => opt.UseValue("No Description"));

You can execute code before/after mapping (like putting the object into a tracking list):

Mapper.CreateMap<SourceEntity, DestinationModel>()
    .BeforeMap((source, destination) =>
        {
            Console.WriteLine("Before Map");
        })
    .AfterMap((source, destination) =>
        {
            Console.WriteLine("After Map");
        });

You can create custom type converters:

With lambda:

Mapper.CreateMap<SourceEntity, DestinationModel>().ConvertUsing(s =>
    {
        return new DestinationModel
        {
            Description = s.Description,
            Name = s.Name
        };
    });

or with ITypeConverter:

class CustomTypeConverter : ITypeConverter<SourceEntity,DestinationModel>
{

    #region ITypeConverter<SourceEntity,DestinationModel> Members

    public DestinationModel Convert(ResolutionContext context)
    {
        var src = (SourceEntity)context.SourceValue;

        return new DestinationModel
        {
            Description = src.Description,
            Name = src.Name
        };
    }

    #endregion
}

Mapper.CreateMap<SourceEntity, DestinationModel>().ConvertUsing<CustomTypeConverter>();

You can create custom object initialization (if you want to use a different constructor):

Mapper.CreateMap<SourceEntity, ChangeAwareDestinationModel>().ConstructUsing(c => new ChangeAwareDestinationModel(((SourceEntity)c.SourceValue).Name));

Note that the difference between ConvertUsing and ConstructUsing is that ConvertUsing is invoking the method you provided and exits the conversion procedure while ConstructUsing only instantiates the object and continues execution on the rest of the mapping rules.

You can map lists to array and vice versa (including the normal mapping capabilities):

Mapper.CreateMap<SourceEntity, DestinationModel>();

var dst = Mapper.Map<SourceEntity[], List<DestinationModel>>(src);

Dynamic Map - Creates a proxy from a known interface and populates it based on the interface map (Note that the interface must be public):

Mapper.CreateMap<SourceEntity, IDestinationModel>();

var dst = Mapper.DynamicMap<IDestinationModel>(src);

You can also use LINQ (in AutoMapper.QueryableExtensions namespace):

var dst = src.Project().To<DestinationModel>();

src must be an IQueryable to use Linq

You can replace null with a value:

Mapper.CreateMap<SourceEntity, DestinationModel>().ForMember(dest => dest.Description, opt => opt.NullSubstitute("Value is null"));

You can leave a property alone:

Mapper.CreateMap<SourceEntity, DestinationModel>().ForMember(dest => dest.Description, opt => opt.UseDestinationValue());

DestinationModel dst = new DestinationModel
{
    Description = "prepopulated description"
};

dst = Mapper.Map<SourceEntity, DestinationModel>(src,dst);

Use Custom Formatter (implementing IValueFormatter):

Mapper.CreateMap<SourceEntity, DestinationModel>().ForMember(dest => dest.Description, opt => { opt.ResolveUsing<AttributeCountResolver>(); opt.AddFormatter<CustomFormatter>(); });

class CustomFormatter : IValueFormatter
{
    #region IValueFormatter Members

    public string FormatValue(ResolutionContext context)
    {
        int value = (int)context.SourceValue;

        if (value == 0)
            return "No Items";
        else if (value == 1)
            return "1 Item";
           
        return string.Format("{0} Items", value);
    }

    #endregion
}

You can convert Dictionaries:

Mapper.CreateMap<SourceEntity, DestinationModel>();

var dst = Mapper.Map<Dictionary<int,SourceEntity>, Dictionary<int,DestinationModel>>(src);

Multiple Mapping Engines for multiple mapping profiles:

ConfigurationStore enginestore1 = new ConfigurationStore(new TypeMapFactory(),MapperRegistry.AllMappers());
ConfigurationStore enginestore2 = new ConfigurationStore(new TypeMapFactory(), MapperRegistry.AllMappers());
MappingEngine engine1 = new MappingEngine(enginestore1);
MappingEngine engine2 = new MappingEngine(enginestore2);

enginestore1.CreateMap<SourceEntity,DestinationModel>().ForMember(dest => dest.Description, opt=>opt.Ignore());
enginestore2.CreateMap<SourceEntity, DestinationModel>().ForMember(dest => dest.Name, opt => opt.Ignore());

var dst1 = engine1.Map<SourceEntity,DestinationModel>(srcdata[0]);
var dst2 = engine2.Map<SourceEntity, DestinationModel>(srcdata[1]);

The way it works is by having two ConfigurationStores, each for each engine, you create a map on the store and execute the mapping on the engine.


The only negative thing I have to say about this project is that I haven't found any inline documentation.