RFC: Getting rid of getColumnMeta()

dsar
Backer | 36
+
0
-

Hi guys,

I was porting Nette Database to Firebird since in PHP 8.1 they added support to getColumnMeta() in the Firebird driver during a refactoring of some part of PDO. However they missed the field ‘native_type’ that has a Pull request here and if you follow the conversation (where I partecipated) they are against to it. The conversation continued on IRC, it seems really that most of people are against native_type and some of them are also against how and what currently getColumnMeta() does.

What I would like to propose is to get rid of getColumnMeta(), not only for the reason above but for two major points:

  1. It limits database support

Currently only MySQL, PostgreSQL and Sqlite have a good support for getColumnMeta(). Other drivers, although may have it, are known to return unrealiable values.

  1. It is damn slow

Here the greatest victim is PostgreSQL because it queries pg_catalog:

test2 $ php test_nometa.php
Queries executed in 14.987295 seconds
test2 $ php test_meta.php
Queries executed in 28.850745 seconds

Testing a naive query for 50000 times with and without getColumnMeta() shows a 100% of overhead. It is something that matters.

MySQL shows indeterministic results ranging from 10% to 40%. It is also something.

Although Nette Database has a sort of caching for this, it is based on the PDO query string and a static variable, so it just works for the same query (in good software it is unlikely to happen) and it's not a global cache.

My proposal is to use metadata from the Database (Nette Database already does this) with a true cache. If there is a general consensus I may work on this.

What do you think?

David Grudl
Nette Core | 7466
+
0
-

Caching the result of getColumnMeta() would be complicated and would have a large overhead, I'm afraid it might even be more than the overhead of the function getColumnMeta().

I would rather add a “normalize” option to the Connection options so that it can be set false.

David Grudl
Nette Core | 7466
+
0
-

I even feel like I have something like this done, so I'll commit it.

https://github.com/…267c647ee549

dsar
Backer | 36
+
0
-

Caching the result of getColumnMeta() would be complicated and would have a large overhead, I'm afraid it might even be more than the overhead of the function getColumnMeta().

Why? My idea wasn't to cache that result, but to use a sort of type mapping (based on the column name) for tables fetched from the database, of course in the case of further fields generated by expressions the default type would be string or integer (or by implementing some euristics based on functions)

I would rather add a “normalize” option to the Connection options so that it can be set false.

Normalization is a good thing in my opinion. Many here had argued that is slow but the true bottleneck was indeed getColumnMeta(). Although I don't think normalizeRow() is the cause, it may be improved by using a lookup table (a switch/match statement on numbered index, actually IStructure::FIELD_* are strings), this would be better than a serie of if tests

Last edited by dsar (2021-04-30 15:13)

dsar
Backer | 36
+
0
-

Post scriptum: That contributor is proposing to remove ‘native_type’ in favor of ‘decl_type’ (to keep in mind in the case it will be approved in future)

David Grudl
Nette Core | 7466
+
0
-

I can't imagine how to implement it, but if you know how to do it, go for it.

dsar
Backer | 36
+
0
-

I applied your last commit, I'm pretty satisfied with it (also PostgreSQL is happier now :-)

By the way PDO seems to apply a bit of normalization, in particular:

array(6) {
  ["int_fld"]=>
  int(10)
  ["varchar_fld"]=>
  string(3) "ABC"
  ["float_fld"]=>
  string(4) "3.14"
  ["decimal_fld"]=>
  string(6) "123.45"
  ["timestamp_fld"]=>
  string(19) "2021-04-30 00:00:00"
  ["boolean_fld"]=>
  bool(true)
}

In my opinion this is the best compromise:
Integers, strings and boolean are returned with their respective types. These are the hardest to convert without introspecting the database.
Decimals are returned as string and this is a good thing (here just for reference).
Floats are also returned as string, however I still have to see the sense of float in non-scientific computing.

The only one that require a conversion is the timestamp/datetime field.

At this point it would be very convenient to allow for custom normalization

David Grudl
Nette Core | 7466
+
0
-

Yeah, for PostgreSQL it behaves fine, for MySQL it unfortunately returns “int_fld” as string.

dsar
Backer | 36
+
0
-

Yeah, for PostgreSQL it behaves fine, for MySQL it unfortunately returns “int_fld” as string.

Yes, later I saw it. Maybe it could be uniformed for all drivers by using ATTR_STRINGIFY_FETCHES (or by adjusting tests for PostgreSQL since I find it very convenient).

By the way, I was playing a bit with MySQL and getColumnMeta(), in the case of arbitrary expressions (not table's fields) it's totally unrealiable (while PostgreSQL never get wrong, but at the expense of a doubling the cost of the query).

For example:

$sth = $pdo->prepare("SELECT DATE('2021-04-04 01:02:03')");

Returns correctly:

["native_type"]=>
string(4) "DATE"

While:

$sth = $pdo->prepare("SELECT DATE_ADD('2021-04-04', INTERVAL 31 DAY)");

Returns:

["native_type"]=>
string(6) "STRING"

More complex date expressions even returned double! Or real expressions returned longint (while giving a floating point result).
I don't know how it works internally for MySQL driver but this is seriously wrong.

Ignoring table's fields and playing a bit directly with values and adding:

	} else if (
		    \DateTime::createFromFormat('Y-m-d',        $value) !== false || // Date
		    \DateTime::createFromFormat('H:i:s',        $value) !== false || // Time
		    \DateTime::createFromFormat('Y-m-d H:i:s',  $value) !== false || // DateTime
		    \DateTime::createFromFormat('Y-m-d H:i:sP', $value) !== false    // DateTime + timezone
	) {
		    $normalizedRow[$key] = new DateTime($value);
	}

works very well and tests are successful.

Regarding table's fields, unfortunately I'm not able to find a nice way to inject Structure inside ResultSet, so I can directly check the column type. Do you have some idea?

Last edited by dsar (2021-05-05 11:28)

David Grudl
Nette Core | 7466
+
0
-

I added custom normalization via $connection->setRowNormalizer()

dsar
Backer | 36
+
0
-

Great! :-)

But I still hope that we will find a solution to get rid of getColumnMeta() for the default case.

Nette Database is faster than other ORMs, but it can be even faster…

JiriSlischka
Member | 9
+
0
-

I added custom normalization via $connection->setRowNormalizer()

That means I can create my own “NoNormalizer” to prevent default normalization?

Nette Database is faster than other ORMs, but it can be even faster…

Yes, because it's not ORM.

dsar
Backer | 36
+
0
-

JiriSlischka wrote:

That means I can create my own “NoNormalizer” to prevent default normalization?

If you don't want to normalize, it is enough $connection->setRowNormalizer(null);

Keep in mind that depending on the driver there is a bit of normalization (but it seems that only MySQL outputs everything as string)

Nette Database is faster than other ORMs, but it can be even faster…

Yes, because it's not ORM.

I know it comes from NotORM, however I used the term in a very generic way