[Request] Join method on TEnumerable

Issue #395 open
Markus created an issue

Hi,

I just wanted to ask if there's any chance that we could get a Join() method on TEnumerable (or/and instance function on IEnumerable) with the following signature:

class function Join<T>(const source: IEnumerable<T>; const seperator : String) : String

This method calls ToString() on each element and concatenates the string. In pseudocode this would be:

string.Join(source.Select(ToString).ToArray(), seperator)

I'm not very familiar with the shortcomings of the Delphi implementation of generics, but since you can't call ToString() on any non-TObject type for T, I'm not sure how to write the implementation efficiently and correctly. Otherwise I would have done it.

Let me know what you think.

Comments (10)

  1. Stefan Glienke repo owner

    Join in the context of IEnumerable<T> would have a different meaning - see https://learn.microsoft.com/en-us/dotnet/api/system.linq.enumerable.join?view=net-7.0 (Spring4D does not offer that method - mostly because due to the way memory management works in Delphi opposed to C# the use of such a function would be very limited given that you could not construct new objects on the fly without most likely leaking them).

    In .NET there is String.Join which provides this functionality - however as you correctly pointed out Delphi lacks a general ToString for any arbitrary type - I would most likely make an overload where you can provide a Func<T,string> and if not given use a similar approach as done in Spring.FormatValue.

  2. Markus reporter

    In other languages, such as Rust and Dart, the iterator type has a join(string)->string method. Given LINQs relationship to SQL, its no surprise that “join” means something else. However, I don’t have strong feelings about the name. I just lack a better idea for it.

    Here's my first quick shot at implementing it:

    class function TEnumerable.JoinString<T>(const source: IEnumerable<T>;
      const separator: string; const selector: Func<T, string>): string;
    var
      strings: TArray<string>;
    begin
      strings := TEnumerable.Select<T, string>(source, selector).ToArray;
      Result := string.Join(separator, strings);
    end;
    
    class function TEnumerable.JoinString<T>(const source: IEnumerable<T>;
      const separator: string): string;
    var
      selector : Func<T, string>;
    begin
      selector :=
        function(const x : T) : string begin
          var value : TValue;
          TValue.Make<T>(x, value);
          Result := value.ToString;  // calls Spring.TValueHelper
        end;
    
      Result := TEnumerable.JoinString<T>(source, separator, selector);
    end;
    

    I found that the sting conversion was not surprisingly the hardest part. So, I decided to reuse the existing code and rely on TValueHelper. I'm not sure about the performance implications. My first unreliable test with an array of a million integers took about 100ms. I’m not very happy with it.

    Is there a chance to implement this without using the conversion to TValue? I wasn’t able to find a way …

  3. Markus reporter

    For the name, maybe “StringAggregate”, or “AggregateString” like the TSQL function.

  4. Markus reporter

    Another workaround for the ToString() problem would be to write specialized variants of the method.

    class function TEnumerable.JoinString(const source: IEnumerable<Integer>; const separator: string): string; overload;
    class function TEnumerable.JoinString(const source: IEnumerable<String>; const separator: string): string; overload;
    // etc.
    class function TEnumerable.JoinString<T>(const source: IEnumerable<T>; const separator: string): string; overload;
    

    But the call of last overload always needs explicit type arguments, because Delphi can't infer the type parameter.

    EDIT: Oh, and of course, you’re back to the same problem as above with TValue in the case of T …

  5. Markus reporter

    Ok, the best I could come up with is to have specialized overloads for the basic types like int, float, bool etc. and one with the class type constraint for objects. For everything else you need to provide a selector function.

    class function TEnumerable.JoinString(const source: IEnumerable<Integer>; const separator: string): string;
    var selector : Func<Integer, string>;
    begin
      selector :=
        function(const x : Integer) : string begin
          Result := IntToStr(x);
        end;
    
      Result := TEnumerable.JoinString<Integer>(source, separator, selector);
    end;
    
    class function TEnumerable.JoinString(const source: IEnumerable<String>; const separator: string): string;
    var selector : Func<string, string>;
    begin
      selector :=
        function(const x : string) : string begin
          Result := x;
        end;
    
      Result := TEnumerable.JoinString<string>(source, separator, selector);
    end;
    
    class function TEnumerable.JoinString<T>(const source: IEnumerable<T>; const separator: string): string;
    var selector : Func<T, string>;
    begin
      selector :=
        function(const x : T) : string begin
          Result := TObject(x).ToString;
        end;
    
      Result := TEnumerable.JoinString<T>(source, separator, selector);
    end;
    
    class function TEnumerable.JoinString<T>(const source: IEnumerable<T>; const separator: string; const selector: Func<T, string>): string;
    var
      strings: TArray<string>;
    begin
      strings := TEnumerable.Select<T, string>(source, selector).ToArray;
      Result := string.Join(separator, strings);
    end;
    

    I also did a quick benchmark using Spring.Benchmark (thanks for that :) )

    There is also the TValue based implementation for reference. As you can see, we would trade performance for generality.

    Benchmark was run on a Intel Core i7-13700K with Win32 release build.

    Run on (24 X 3417,60 MHz CPU s)
    CPU Caches:
      L1 Data 48 K (x12)
      L1 Instruction 32 K (x12)
      L2 Unified 2048 K (x12)
      L3 Unified 30720 K (x1)
    ----------------------------------------------------------------------------
    Benchmark                                  Time             CPU   Iterations
    ----------------------------------------------------------------------------
    string.join/10                           371 ns          387 ns      1493333
    string.join/100                         1111 ns         1011 ns       896000
    string.join/1000                        8013 ns         8719 ns        89600
    string.join/10000                      71310 ns        67188 ns        10000
    string.join/100000                   1011339 ns      1098633 ns          640
    string.join/1000000                 12187009 ns     12152778 ns           45
    
    JoinStringTValue/10                     1445 ns         1569 ns       448000
    JoinStringTValue/100                    7927 ns         8281 ns       100000
    JoinStringTValue/1000                  69696 ns        71498 ns         8960
    JoinStringTValue/10000                680852 ns       697545 ns         1120
    JoinStringTValue/100000              7361538 ns      7393973 ns          112
    JoinStringTValue/1000000            78555678 ns     78125000 ns            9
    
    JoinStringIntegerSelector/10             913 ns          942 ns       896000
    JoinStringIntegerSelector/100           4656 ns         5000 ns       100000
    JoinStringIntegerSelector/1000         40989 ns        38992 ns        17231
    JoinStringIntegerSelector/10000       409049 ns       383650 ns         1792
    JoinStringIntegerSelector/100000     4586241 ns      4243827 ns          162
    JoinStringIntegerSelector/1000000   52112245 ns     51136364 ns           11
    
    JoinStringOfInt/10                       989 ns          854 ns       896000
    JoinStringOfInt/100                     4752 ns         5000 ns       100000
    JoinStringOfInt/1000                   41923 ns        41016 ns        16000
    JoinStringOfInt/10000                 399322 ns       414406 ns         1659
    JoinStringOfInt/100000               4581342 ns      4768669 ns          154
    JoinStringOfInt/1000000             51127030 ns     50000000 ns           10
    
    JoinStringOfObject/10                    988 ns          879 ns       746667
    JoinStringOfObject/100                  4811 ns         5319 ns       179200
    JoinStringOfObject/1000                42133 ns        47433 ns        11200
    JoinStringOfObject/10000              425279 ns       424757 ns         1545
    JoinStringOfObject/100000            4745623 ns      4741379 ns          145
    JoinStringOfObject/1000000          50197891 ns     49715909 ns           11
    

  6. Stefan Glienke repo owner

    While your implementation might functionally work it does not meet the quality requirements I have.

    What I mean with that is that you basically just combined TEnumerable.Select, ToArray and string.Join which causes a lot of unnecessary overhead that can be avoided by a library implementation and thus the library implementation should not just be something a library consumer could put together.

  7. Log in to comment