jeudi 23 avril 2015

Using String functions on flattened chararray data

Input : test.txt

file_id     file_name       created_time            accesssed_by
   1          a1                1                       user1
   1          a2                2                       user1
   2          b1                3                       user1
   3          c1                4                       user1

Pig Script :

  A = LOAD 'usertest.txt' USING PigStorage('\t') AS (file_id:long, file_name:chararray, created_time:long,accessed_by:chararray);
  B = GROUP A BY file_id;
   sorted = ORDER A BY created_time DESC;
   user = A.accessed_by;
   uniq_user = DISTINCT user;
   last = LIMIT sorted 1;
   GENERATE UPPER(FLATTEN(last.file_name)) AS file_name, COUNT(uniq_user) AS access_count;

Trying to use any of the string manipulation functions on top of flattened chararray data type is resulting in the below ERROR.

ERROR - ERROR 1200: <line 185, column 22>  mismatched input 'FLATTEN' expecting RIGHT_PAREN

Without using the string manipulation function on top of FLATTEN will result in the required data.


Objective is to use some string manipulation functions to convert the file_name to the required format before persisting without going for another iteration of the data.

Going for one more iteration as below we are able to achieve our objective, can we avoid this and do the same earlier.

 D = FOREACH C GENERATE UPPER(file_name) AS file_name, access_count;

Output :


