mercredi 22 avril 2015

How to get the least and most common value in postgresql?

So the postgres wiki has a mode function we can add to our query and use. So, I modified it to get a lmode function, for least common value. The functions, on a single postgresql instance works great, until its part of a greenplum cluster. It goes from rapidfire, to dog slow.

I was curious to how I would be able to combine them so the query can run once to return both the most common and least common. It's okay if they aren't separate fields, I'm fine with a result like most,least.

Here is sql aggregate functions I've been using.

-- The unnest function is required for postgresql ^8.4 and lower, our green plum instance uses postgresql ~8.2
CREATE OR REPLACE FUNCTION public_work_tbls.unnest(anyarray)
  RETURNS SETOF anyelement AS
$BODY$
SELECT $1[i]
FROM   generate_series(array_lower($1,1), array_upper($1,1)) i
$BODY$
LANGUAGE sql IMMUTABLE;

-- Start: Function for Least common value --
CREATE OR REPLACE FUNCTION public_work_tbls._final_lmode(anyarray)
  RETURNS anyelement AS
$BODY$
    SELECT a
    FROM public_work_tbls.unnest($1) a
    GROUP BY 1
    ORDER BY COUNT(1) ASC, 1
    LIMIT 1;
$BODY$
LANGUAGE 'sql' IMMUTABLE;

-- Tell Postgres how to use our aggregate
CREATE AGGREGATE public_work_tbls.lmode(anyelement) (
  SFUNC=array_append, --Function to call for each row. Just builds the array
  STYPE=anyarray,
  FINALFUNC=public_work_tbls._final_lmode, --Function to call after everything has been added to array
  INITCOND='{}' --Initialize an empty array when starting
);
-- End: Function for least common value --

-- Start: Function for most common value --
CREATE OR REPLACE FUNCTION public_work_tbls._final_mode(anyarray)
  RETURNS anyelement AS
$BODY$
    SELECT a
    FROM public_work_tbls.unnest($1) a
    GROUP BY 1
    ORDER BY COUNT(1) DESC, 1
    LIMIT 1;
$BODY$
LANGUAGE 'sql' IMMUTABLE;

-- Tell Postgres how to use our aggregate
CREATE AGGREGATE public_work_tbls.mode(anyelement) (
  SFUNC=array_append, --Function to call for each row. Just builds the array
  STYPE=anyarray,
  FINALFUNC=public_work_tbls._final_mode, --Function to call after everything has been added to array
  INITCOND='{}' --Initialize an empty array when starting
);
-- End: Function for most common value --

Notes: The greenplum server I am using is on Postgresql 8.2

Thanks postgresql wizards

Aucun commentaire:

Enregistrer un commentaire