Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sql Field Indexing - Limit fields and content types #8869

Open
giannik opened this issue Mar 15, 2021 · 23 comments
Open

Sql Field Indexing - Limit fields and content types #8869

giannik opened this issue Mar 15, 2021 · 23 comments
Milestone

Comments

@giannik
Copy link
Contributor

giannik commented Mar 15, 2021

I would like to start using the sql field indexing to run sql queries filtering on field values.
But Im concerned that you cant fine tune what you want to index (eg what fields per content type).
And everything is indexed by default , raising performance and maintanance issues.
Is there some reason we cannot specify what fields per content type to index, like we do in lucene indexing ?
Maybe Im missing something.
@Skrypt

@Skrypt
Copy link
Contributor

Skrypt commented Mar 15, 2021

That's a feature that would need to be added. The other option is to create a custom IndexingProvider if you want some fine tuned as of now.

@sebastienros
Copy link
Member

That's something that was designed, by defining something similar to a yessql index provide, in javascript, such that we could create dynamic projections of a content item, and store it in lucene or a sql table. This was postponed because it's not simple to implement.

@sebastienros sebastienros added this to the backlog milestone Mar 18, 2021
@giannik
Copy link
Contributor Author

giannik commented Mar 18, 2021

I was thinking of using the sql field indexing as is and only add an extension point to check if the field should be indexed based on a setting in the field definition screen (like lucene does).
Im not sure if this check should happen inside the index or extenally.
Then maybe hook this into the extensibity with queries that @deanmarcussen added recently ( with taxonomies, users etc).

I'll give it a shot and see what comes up.

@JoshTango
Copy link

I agree with this.

@sebastienros
Copy link
Member

/cc @jptissot

@Skrypt
Copy link
Contributor

Skrypt commented Apr 23, 2021

@gianik Yes, exactly what I was thinking of. Allow/Disallow to index a field or not by adding configuration when the SQL Indexing feature is enabled.

Also, at the same time, add a way to re-index everything because right now it will only index new content items ; not existing ones. So, we need a way to repopulate these index tables.

@JoshTango
Copy link

rebuild index button?

@sebastienros
Copy link
Member

This should use the same logic as Lucene, keep a cursor with the list of indexing tasks, this time in the database. I don't think it should follow the Lucene UI exactly, since there is a single index in this module. I don't think it supports multiple indexes, though it could technically be possible at least with an extra column in the tables (not the ideal for perf but will work).

But the ultimate solution is really to have something driven by javascript to create dynamic map/reduce projections. Like yessql does, but dynamically.

@JoshTango
Copy link

lets assume u do the map reduce,, wouldn't it still be nice to lessen the disk space used on the index by removing unwanted fields?

@Skrypt
Copy link
Contributor

Skrypt commented Apr 23, 2021

I agree, with @sebastienros

There are multiple index tables in this feature like @sebastienros says which differenciate it from Lucene. One per field type actually. I'm not sure how to implement the javascript part of it for now as I've not thinked about it at all. I need to focus on other PR's first.

@hyzx86
Copy link
Contributor

hyzx86 commented Mar 31, 2022

Here are some ways to dynamically update data using FreeSQL,
including the simple integration I've implemented in EasyOC so far

  1. Dynamically create/update table destructions
  2. Index table query

http://www.freesql.net/en/guide/Update-Data.html#_5-update-according-to-the-dto

@hyzx86
Copy link
Contributor

hyzx86 commented Apr 1, 2022

I tried a hook that would block updates to the content definition

  public interface IContentDefinitionHandler
  {
      Task AfterDeleteTypeDefinition(ContentTypeDefinitionRecord record);
      Task AfterStorePartDefinition(ContentPartDefinition contentPartDefinition);
      Task AfterStoreTypeDefinition(ContentTypeDefinition contentTypeDefinition);
      Task<bool> BeforeDeleteTypeDefinition(ContentTypeDefinitionRecord record);
      Task<bool> BeforeStorePartDefinition(ContentPartDefinition contentPartDefinition);
      Task<bool> BeforeStoreTypeDefinition(ContentTypeDefinition contentTypeDefinition);

      Task<bool> BeforeDeletePartDefinition(ContentPartDefinitionRecord record, IEnumerable<ContentTypeDefinition> typesWithPart);
      Task AfterDeletePartDefinition(ContentPartDefinitionRecord record, IEnumerable<ContentTypeDefinition> typesWithPart);
  }

The trigger in ContentDefinitionManager

public async void StoreTypeDefinition(ContentTypeDefinition contentTypeDefinition)
        {
            var results = await Handlers.InvokeAsync((handler) => handler.BeforeStoreTypeDefinition(contentTypeDefinition), _logger);
            if (results.Any(x => !x))
            {
                return;
            }
            Apply(contentTypeDefinition, Acquire(contentTypeDefinition));
            UpdateContentDefinitionRecord();
            await Handlers.InvokeAsync((handler) => handler.AfterStoreTypeDefinition(contentTypeDefinition), _logger);

        }

        public void StorePartDefinition(ContentPartDefinition contentPartDefinition)
        {
            var results = Handlers.InvokeAsync((handler) => handler.BeforeStorePartDefinition(contentPartDefinition), _logger).GetAwaiter().GetResult();
            if (results.Any(x => !x))
            {
                return;
            }
            Apply(contentPartDefinition, Acquire(contentPartDefinition));
            UpdateContentDefinitionRecord();
            Handlers.InvokeAsync((handler) => handler.AfterStorePartDefinition(contentPartDefinition), _logger).GetAwaiter().GetResult();

        }

@hyzx86
Copy link
Contributor

hyzx86 commented Apr 1, 2022

And then created a Extention Method on ISchemarBuilder

public static ISchemaBuilder CreateMapIndexTable(this ISchemaBuilder builder,
            ContentTypeDefinition contentType,
            Action<ICreateTableCommand> table, string collection)

@hyzx86
Copy link
Contributor

hyzx86 commented Apr 1, 2022

The idea is to trigger the update table structure hook every time the type /Part is updated

@hyzx86
Copy link
Contributor

hyzx86 commented Apr 1, 2022

In addition to the above calculation based on the ISchemaBuilder API,
I also wanted to implement dynamic code generation in C# and update table structures with the help of Freesql entity synchronization

Freesql entity synchronization:

 public async Task<int> CreateAsync()
{
        //Use FreeSql
        //create or update table , auto create or update index
        _freeSql.CodeFirst.SyncStructure<VbenMenuPartIndex>();
        SchemaBuilder.CreateForeignKey<VbenMenuPartIndex>();

        return await Task.FromResult(1);
}

The following code is my test with Natasha ,You can also export the generated types for use in source code

Note: Natasha also supports dynamic assembly loading,For example, it can be used to implement dynamic loading of OC modules

using Natasha.CSharp;
using System.Reflection;

NatashaInitializer.InitializeAndPreheating();
string text = @"
  namespace HelloWorld
  {
        public class Test{
            public Test(){
                Name=""111"";
            }
    public string Name;
    public int Age{get;set;}
    }
}";

//Create dynamic classes from the script
AssemblyCSharpBuilder oop = new AssemblyCSharpBuilder();

//Even if you add 100 classes, they will all be compiled in one assembly
oop.Add(text);

//The following assembly will have the classes you added to Syntax
Assembly assembly = oop.GetAssembly();


//Or use the secondary API NAssembly
//This class has API functions such as CreateClass/CreateInterface, but the final build is built in the same AssemblyCSharpBuilder
var asm = new NAssembly("MyAssembly");
asm.AddScript(text);
var type = asm.GetTypeFromShortName("Test");
var type1 = asm.GetTypeFromFullName("HelloWorld.Test");
Console.WriteLine();

@hyzx86
Copy link
Contributor

hyzx86 commented Apr 2, 2022

It looks like I've been reinventing the wheel,
Because I found that there was already an interface in OC: IContentDefinitionEventHandler

I tried a hook that would block updates to the content definition

  public interface IContentDefinitionHandler
  {
      Task AfterDeleteTypeDefinition(ContentTypeDefinitionRecord record);
      Task AfterStorePartDefinition(ContentPartDefinition contentPartDefinition);
      Task AfterStoreTypeDefinition(ContentTypeDefinition contentTypeDefinition);
      Task<bool> BeforeDeleteTypeDefinition(ContentTypeDefinitionRecord record);
      Task<bool> BeforeStorePartDefinition(ContentPartDefinition contentPartDefinition);
      Task<bool> BeforeStoreTypeDefinition(ContentTypeDefinition contentTypeDefinition);
 

@hyzx86
Copy link
Contributor

hyzx86 commented Apr 6, 2022

Hi @sebastienros
I am trying to dynamically generate table structure related feature, I ran into some problems when working with table structures. Can you help me? thank you.

I use the following method to create an ISchemaBuilder

var schemaBuilder = new SchemaBuilder(_store.Configuration, await YesSession.BeginTransactionAsync(), true);
  1. Delete old index tables
  2. Create a new index table
  3. Rebuild index data

1,2 worked fine, and the table was created successfully.
But when reconstructing index data:

The transaction object is not associated with the same connection object as this command

It occurs when a document is queried from the database

 var shellConfig = ShellScope.Current.ServiceProvider.GetRequiredService<IShellConfiguration>();
            var dbOptions = shellConfig.Get<DatabaseShellsStorageOptions>();
            var docs = YesSession.Query<ContentItem, ContentItemIndex>()
               .Where(x => x.Latest && x.ContentType == model.TypeName)
               .OrderBy(x => x.Id);
2022-04-06 18:37:02.3831|Default|00-554961884cbf9de4ce6c446745fdcb46-41d87133dee3a3f7-00||EasyOC.OrchardCore.DynamicTypeIndex.DynamicIndexAppService|ERROR|The transaction object is not associated with the same connection object as this command. System.InvalidOperationException: The transaction object is not associated with the same connection object as this command.
   at Microsoft.Data.Sqlite.SqliteCommand.ExecuteReader(CommandBehavior behavior)
   at Microsoft.Data.Sqlite.SqliteCommand.ExecuteReaderAsync(CommandBehavior behavior, CancellationToken cancellationToken)
   at Microsoft.Data.Sqlite.SqliteCommand.ExecuteDbDataReaderAsync(CommandBehavior behavior, CancellationToken cancellationToken)
   at Dapper.SqlMapper.QueryAsync[T](IDbConnection cnn, Type effectiveType, CommandDefinition command) in /_/Dapper/SqlMapper.Async.cs:line 418
   at YesSql.Store.ProduceAsync[T,TState](WorkerQueryKey key, Func`2 work, TState state)
   at YesSql.Services.DefaultQuery.Query`1.ListImpl()
   at YesSql.Services.DefaultQuery.Query`1.ListImpl()
   at EasyOC.OrchardCore.DynamicTypeIndex.DynamicIndexAppService.RebuildIndexData(DynamicIndexConfigModel model) in D:\SourceCodes\JZSoft\EasyOC.OrchardCore\EasyOC\src\Modules\EasyOC.OrchardCore.DynamicTypeIndex\Service\DynamicIndexAppService.cs:line 167
   at EasyOC.OrchardCore.DynamicTypeIndex.DynamicIndexAppService.UpdateDynamicIndexAsync(DynamicIndexConfigModel model) in D:\SourceCodes\JZSoft\EasyOC.OrchardCore\EasyOC\src\Modules\EasyOC.OrchardCore.DynamicTypeIndex\Service\DynamicIndexAppService.cs:line 263
   at lambda_method621(Closure , Object )
   at Microsoft.AspNetCore.Mvc.Infrastructure.ActionMethodExecutor.AwaitableObjectResultExecutor.Execute(IActionResultTypeMapper mapper, ObjectMethodExecutor executor, Object controller, Object[] arguments)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.<InvokeActionMethodAsync>g__Awaited|12_0(ControllerActionInvoker invoker, ValueTask`1 actionResultValueTask)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.<InvokeNextActionFilterAsync>g__Awaited|10_0(ControllerActionInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.Rethrow(ActionExecutedContextSealed context)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.Next(State& next, Scope& scope, Object& state, Boolean& isCompleted)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.InvokeInnerFilterAsync()
--- End of stack trace from previous location ---
   at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.<InvokeNextExceptionFilterAsync>g__Awaited|26_0(ResourceInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)    at Microsoft.Data.Sqlite.SqliteCommand.ExecuteReader(CommandBehavior behavior)
   at Microsoft.Data.Sqlite.SqliteCommand.ExecuteReaderAsync(CommandBehavior behavior, CancellationToken cancellationToken)
   at Microsoft.Data.Sqlite.SqliteCommand.ExecuteDbDataReaderAsync(CommandBehavior behavior, CancellationToken cancellationToken)
   at Dapper.SqlMapper.QueryAsync[T](IDbConnection cnn, Type effectiveType, CommandDefinition command) in /_/Dapper/SqlMapper.Async.cs:line 418
   at YesSql.Store.ProduceAsync[T,TState](WorkerQueryKey key, Func`2 work, TState state)
   at YesSql.Services.DefaultQuery.Query`1.ListImpl()
   at YesSql.Services.DefaultQuery.Query`1.ListImpl()
   at EasyOC.OrchardCore.DynamicTypeIndex.DynamicIndexAppService.RebuildIndexData(DynamicIndexConfigModel model) in D:\SourceCodes\JZSoft\EasyOC.OrchardCore\EasyOC\src\Modules\EasyOC.OrchardCore.DynamicTypeIndex\Service\DynamicIndexAppService.cs:line 167
   at EasyOC.OrchardCore.DynamicTypeIndex.DynamicIndexAppService.UpdateDynamicIndexAsync(DynamicIndexConfigModel model) in D:\SourceCodes\JZSoft\EasyOC.OrchardCore\EasyOC\src\Modules\EasyOC.OrchardCore.DynamicTypeIndex\Service\DynamicIndexAppService.cs:line 263
   at lambda_method621(Closure , Object )
   at Microsoft.AspNetCore.Mvc.Infrastructure.ActionMethodExecutor.AwaitableObjectResultExecutor.Execute(IActionResultTypeMapper mapper, ObjectMethodExecutor executor, Object controller, Object[] arguments)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.<InvokeActionMethodAsync>g__Awaited|12_0(ControllerActionInvoker invoker, ValueTask`1 actionResultValueTask)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.<InvokeNextActionFilterAsync>g__Awaited|10_0(ControllerActionInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.Rethrow(ActionExecutedContextSealed context)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.Next(State& next, Scope& scope, Object& state, Boolean& isCompleted)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.InvokeInnerFilterAsync()
--- End of stack trace from previous location ---
   at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.<InvokeNextExceptionFilterAsync>g__Awaited|26_0(ResourceInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)

My code has been submitted to Github:
https://github.com/EasyOC/EasyOC/blob/ff6f65480792764a1565023f78768ed6b9b26859/src/Modules/EasyOC.OrchardCore.DynamicTypeIndex/Service/DynamicIndexAppService.cs#L262

@giannik
Copy link
Contributor Author

giannik commented Apr 7, 2022

#11086 related

@giannik
Copy link
Contributor Author

giannik commented Apr 7, 2022

Does this also handle the mapping from the content item properties to index fields ?
something that is done by code like below

public class BookIndexProvider : IndexProvider<Book>
{
    public override void Describe(DescribeContext<Book> context) =>
        context.For<BookIndex>()
            .Map(book =>
                new BookIndex
                {
                    Author = book.Author,
                    Title = book.Title,
                });
}

@hyzx86
Copy link
Contributor

hyzx86 commented Apr 7, 2022

Done for me.

I introduced Natasha, the.NET dynamic compiler
A type is generated based on the type definition
Then update/create the tables using Freesql (ORM)
image

Then use Freesql to rebuild index data in batches

image

@hyzx86
Copy link
Contributor

hyzx86 commented Apr 7, 2022

The generated class,They are just strings that are used for table generation, and if you copy them and save them into your code, you can use them directly in background code development

using EasyOC.Core.Indexs;
using FreeSql.DataAnnotations;
namespace EasyOC.DynamicTypeIndex.IndexModels
{
    [EOCIndex("IDX_{tablename}_DocumentId", "DocumentId,ContentItemId")]
    [EOCTable(Name = "Customer_DIndex")]
    public class CustomerDIndex : FreeSqlDocumentIndex
    {
        [Column(StringLength = 26)]
        public string ContentItemId { get; set; }

        [Column(Name = "Name", IsNullable = true, StringLength = -1)]
        public System.String Name { get; set; }

        [Column(Name = "CustNum", IsNullable = true, StringLength = -1)]
        public System.String CustNum { get; set; }

        [Column(Name = "MarketSegment", IsNullable = true, StringLength = 26)]
        public System.String MarketSegment { get; set; }

        [Column(Name = "Source", IsNullable = true, StringLength = 26)]
        public System.String Source { get; set; }

        [Column(Name = "Industry", IsNullable = true, StringLength = 26)]
        public System.String Industry { get; set; }

        [Column(Name = "CustClass", IsNullable = true, StringLength = 26)]
        public System.String CustClass { get; set; }

        [Column(Name = "SalesOwner", IsNullable = true, StringLength = 26)]
        public System.String SalesOwner { get; set; }

        [Column(Name = "AddressPart_CountryName", IsNullable = true, StringLength = -1)]
        public System.String AddressPart_CountryName { get; set; }

        [Column(Name = "AddressPart_Province", IsNullable = true, StringLength = -1)]
        public System.String AddressPart_Province { get; set; }

        [Column(Name = "AddressPart_City", IsNullable = true, StringLength = -1)]
        public System.String AddressPart_City { get; set; }

        [Column(Name = "AddressPart_PostalCode", IsNullable = true, StringLength = -1)]
        public System.String AddressPart_PostalCode { get; set; }

        [Column(Name = "AddressPart_Details", IsNullable = true, StringLength = -1)]
        public System.String AddressPart_Details { get; set; }

        [Column(Name = "AddressPart_Name", IsNullable = true, StringLength = -1)]
        public System.String AddressPart_Name { get; set; }

    }
}

The generated table structure
image

Data after synchronization
image

@hyzx86
Copy link
Contributor

hyzx86 commented Jan 10, 2023

Does any one working on this?

@Skrypt
Copy link
Contributor

Skrypt commented Jan 10, 2023

Not on my side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants