Optimize string / byte (Base64) validation even further

In addition to ~~#225~~ there is an even more simple, yet far more faster Base64 validation possible.

‌

Base64 strings consist of a 64-character alphabet¹. Hence the name Base64.

These characters are arbitrarily arranged in blocks of size four. The last block can be filled with (at most two) '=' to achieve the proper size of four.

‌

This regex was used in the first version of the validation:

?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=)?$

This reflects the above description.

‌

Further optimization of the Base64 validation is easy. Simply test each character of the string to see if it is present in the Base64 alphabet.

‌

¹ The 64 characters: 0-9 a-z A-Z + /

Byte array length	Regex check	Decode	Character check
1.000	0.678s / 10.351KB	0.464s / 13.423KB	0.329s / 8.303KB
10.000	3.728s / 11.128KB	1.302s / 14.712KB	0.340s / 8.303KB
100.000	32.404s / 12.023KB	8.182s / 15.354KB	0.606s / 9.457KB
1.000.000	316.781s / 20.860KB	77.746s / 23.970KB	2.695s / 18.683KB
10.000.000	3197.124s / 107.520KB	769.089s / 110.592‬KB	22.976s / 104.448KB

Comments (6)