RFC: Getting rid of getColumnMeta()
- dsar
- Backer | 53
Hi guys,
I was porting Nette Database to Firebird since in PHP 8.1 they added support to getColumnMeta() in the Firebird driver during a refactoring of some part of PDO. However they missed the field ‘native_type’ that has a Pull request here and if you follow the conversation (where I partecipated) they are against to it. The conversation continued on IRC, it seems really that most of people are against native_type and some of them are also against how and what currently getColumnMeta() does.
What I would like to propose is to get rid of getColumnMeta(), not only for the reason above but for two major points:
- It limits database support
Currently only MySQL, PostgreSQL and Sqlite have a good support for getColumnMeta(). Other drivers, although may have it, are known to return unrealiable values.
- It is damn slow
Here the greatest victim is PostgreSQL because it queries pg_catalog:
test2 $ php test_nometa.php
Queries executed in 14.987295 seconds
test2 $ php test_meta.php
Queries executed in 28.850745 seconds
Testing a naive query for 50000 times with and without getColumnMeta() shows a 100% of overhead. It is something that matters.
MySQL shows indeterministic results ranging from 10% to 40%. It is also something.
Although Nette Database has a sort of caching for this, it is based on the PDO query string and a static variable, so it just works for the same query (in good software it is unlikely to happen) and it's not a global cache.
My proposal is to use metadata from the Database (Nette Database already does this) with a true cache. If there is a general consensus I may work on this.
What do you think?
- David Grudl
- Nette Core | 8217
Caching the result of getColumnMeta() would be complicated and would have a large overhead, I'm afraid it might even be more than the overhead of the function getColumnMeta().
I would rather add a “normalize” option to the Connection
options so that it can be set false.
- dsar
- Backer | 53
Caching the result of getColumnMeta() would be complicated and would have a large overhead, I'm afraid it might even be more than the overhead of the function getColumnMeta().
Why? My idea wasn't to cache that result, but to use a sort of type mapping (based on the column name) for tables fetched from the database, of course in the case of further fields generated by expressions the default type would be string or integer (or by implementing some euristics based on functions)
I would rather add a “normalize” option to the
Connection
options so that it can be set false.
Normalization is a good thing in my opinion. Many here had argued that is slow but the true bottleneck was indeed getColumnMeta(). Although I don't think normalizeRow() is the cause, it may be improved by using a lookup table (a switch/match statement on numbered index, actually IStructure::FIELD_* are strings), this would be better than a serie of if tests
Last edited by dsar (2021-04-30 15:13)
- David Grudl
- Nette Core | 8217
I can't imagine how to implement it, but if you know how to do it, go for it.
- dsar
- Backer | 53
I applied your last commit, I'm pretty satisfied with it (also PostgreSQL is happier now :-)
By the way PDO seems to apply a bit of normalization, in particular:
array(6) {
["int_fld"]=>
int(10)
["varchar_fld"]=>
string(3) "ABC"
["float_fld"]=>
string(4) "3.14"
["decimal_fld"]=>
string(6) "123.45"
["timestamp_fld"]=>
string(19) "2021-04-30 00:00:00"
["boolean_fld"]=>
bool(true)
}
In my opinion this is the best compromise:
Integers, strings and boolean are returned with their respective types. These
are the hardest to convert without introspecting the database.
Decimals are returned as string and this is a good thing (here just for
reference).
Floats are also returned as string, however I still have to see the sense of
float in non-scientific computing.
The only one that require a conversion is the timestamp/datetime field.
At this point it would be very convenient to allow for custom normalization
- David Grudl
- Nette Core | 8217
Yeah, for PostgreSQL it behaves fine, for MySQL it unfortunately returns “int_fld” as string.
- dsar
- Backer | 53
Yeah, for PostgreSQL it behaves fine, for MySQL it unfortunately returns “int_fld” as string.
Yes, later I saw it. Maybe it could be uniformed for all drivers by using ATTR_STRINGIFY_FETCHES (or by adjusting tests for PostgreSQL since I find it very convenient).
By the way, I was playing a bit with MySQL and getColumnMeta(), in the case of arbitrary expressions (not table's fields) it's totally unrealiable (while PostgreSQL never get wrong, but at the expense of a doubling the cost of the query).
For example:
$sth = $pdo->prepare("SELECT DATE('2021-04-04 01:02:03')");
Returns correctly:
["native_type"]=>
string(4) "DATE"
While:
$sth = $pdo->prepare("SELECT DATE_ADD('2021-04-04', INTERVAL 31 DAY)");
Returns:
["native_type"]=>
string(6) "STRING"
More complex date expressions even returned double! Or real expressions
returned longint (while giving a floating point result).
I don't know how it works internally for MySQL driver but this is
seriously wrong.
Ignoring table's fields and playing a bit directly with values and adding:
} else if (
\DateTime::createFromFormat('Y-m-d', $value) !== false || // Date
\DateTime::createFromFormat('H:i:s', $value) !== false || // Time
\DateTime::createFromFormat('Y-m-d H:i:s', $value) !== false || // DateTime
\DateTime::createFromFormat('Y-m-d H:i:sP', $value) !== false // DateTime + timezone
) {
$normalizedRow[$key] = new DateTime($value);
}
works very well and tests are successful.
Regarding table's fields, unfortunately I'm not able to find a nice way to inject Structure inside ResultSet, so I can directly check the column type. Do you have some idea?
Last edited by dsar (2021-05-05 11:28)
- JiriSlischka
- Member | 9
I added custom normalization via $connection->setRowNormalizer()
That means I can create my own “NoNormalizer” to prevent default normalization?
Nette Database is faster than other ORMs, but it can be even faster…
Yes, because it's not ORM.
- dsar
- Backer | 53
JiriSlischka wrote:
That means I can create my own “NoNormalizer” to prevent default normalization?
If you don't want to normalize, it is enough $connection->setRowNormalizer(null);
Keep in mind that depending on the driver there is a bit of normalization (but it seems that only MySQL outputs everything as string)
Nette Database is faster than other ORMs, but it can be even faster…
Yes, because it's not ORM.
I know it comes from NotORM, however I used the term in a very generic way