Normalizing De-Normalized Data with MySQL (Part 1)

 What is De-Normalized Data?

Probably the best example of denormalized data are found in a link analysis diagram..

Normally we like our data served up in flat-files with rows and columns. Columns store data we wish segregated for a record, like name, address, city, state or zip. Rows store the complete list of data as is pertains to one complete record; like a customer or product. 

But what happens when our data won’t fit neatly into this model? Like a list of people attending a conference or links between product categories. A good example of denormalized data (that’s been normalized) are breadcrumbs on a webpage or a website menu system. 

 For the sake of demonstration the table on the left has a field labeled "firstname" and a field containing denormalized data labeled "list". The "list" field may hold a list of friends of the person named in the firstname field. It could also be a list of attendees to a meeting. 

So what is so important about a denormalized data list. Just ask Facebook or eBay or Amazon. Their entire data mining models are based on denormalized data. Customers have different likes and dislikes. Some customers have many hobbies or favorite subjects and others may have very few. By normalizing "denormalized" data we are better able to identify target audiences, behaviors, and more importantly "patterns".  Yes, can we all say PATTERN!  You may have seen a detective story where the savy investigator discovers a suspect has been seen near the scene of three homicides or robberies. If the suspect was around one of the scenes it may be thought of as a "coincidence"; if he/she was seen around two of the crime scenes…? definitely not a coincidence; and if he was seen around THREE of the crime scenes! GUILTY! (maybe…).  Regardless, a denormalized list can lead us to patterns we would not necessarily spot or identify as fast, as we might using a denormalized list.

 XFN, FOAF and the Meaning of Life

Contemporary models like XFN (extended friend network), FOAF (friend of a friend), Six Degrees of Separation (Stanley Milgram) or The Three Degrees of Influence Rule – written about in  "Connected" by Christakis & Fowler (Little,Brown & Company 2009) as an understanding of confluence patterns of groups of people and their connections. Most definitely a real meat and potatoes explanation of human being’s quest for spirituality, love and the after-life. :-)

Back to the Real World

To locate hidden patterns in mountains of denormalized data generated by legacy systems, websites, spreadsheets, financial records, receipts and hundreds of other data sources it takes just a few things. First of all, denormalized data would be a good thing but what if you don’t have any? Then you make some! Using a few flat-files, some random generated strings here and there we can piece together enough data to enable us to develop software that automates the process of churning through millions of records looking for a few "needles in the hay stack."

Development Tools, the Cloud and Other Sundry Items

Open source and semi-open source solutions are here to stay. The days of being ham-stringed to licensing, versioning, and phone support are gone.  Mainstream, seasoned products like MySQL, Actionscript, PHP, Flexicious (see part 4) and Adobe Flex 4.5 (Eclipse) are hard to beat for security, reliability, ease of use and overall adoption by professional developers.

For this project I am using Flex 4.5 Data Management Services which I found to be a bit friendlier with my AWS RDS MySQL account than my HostGator account. Flex has no trouble connecting to AWS via its built in php class builder and once you get through the Zend non-sense you are off and running.

Joe Celko and Heavy Lifting

The heavy lifting is done directly on the RDS server via  MySQL Workbench; a phenomenal tool with advanced features you won’t find in phpMyAdmin or from the shell. Unless your Joe Celko who I am sure can will MySQL code into existence using pure thought  and remote viewing, I suggest using it.

Workbench’s windowed interface lets you click between tabs for creating scripts, stored procedures, import/export and allows simultaneous connections to multiple servers all at the same time. The import/export tool is fantastic. No more time outs on data imports using phpMyAdmin. 

PHP, Stored Procedures and Flex Data Management Service

For this project I used a normal php service class generated by the Flex 4.5 interface uploaded to the services directory of my hostgator folder where I store the Flex App.(this will be better explained in Parts 2 and 3).

I prefer using MySQL stored procedures whenever possible as they are easier to call from PHP,they run faster because they execute on the server and are more secure than a PHP file doing the same thing.

Custom PHP functions can easily be added to the Flex generated php class script. You can easily cut and paste existing class objects and modify them which will show up in your data services window which is pretty slick. 

In Part 1 of this tutorial I am demonstrating how I was able to insert a random list of names from one table into a field of another table. The difficulty I encountered was generating the comma-delimited random string which contains from one to ten first names and subsequently inserting this result into a firstnames table to be used by an update query to insert one of the 500 strings, randomly into the demo data table. 

The first thing was to create a table of first names that I could use to create comma delimited, random length firstnames string to update the main table. I created the first table "firstnames" using this script.

*All examples were done with MySQL Workbench pulling data from AWS. PHP code is run from my Hostgator hosted website. I like to do it this way as it keeps the php files and data on different cloud servers and reduces bandwidth charges marginally but adds an extra layer of security by requiring authentication between both sites via frequently changed Elastic IPs on AWS and a subsequent updated entry to the Hostgator CPanel "Remote Access" interface.

The firstnames Table

use db;
drop table if exists firstnames1;
create table firstnames1
(
firstname varchar(25),
id2 int(11)
);

alter table firstnames1 add column id INT UNSIGNED NOT NULL AUTO_INCREMENT,ADD PRIMARY KEY (id);
alter table firstnames1 add column list varchar(255),ADD INDEX (list);

This may look a bit unorthodox at first. I use this method to generate most of my initial tables. MySQL requires a minimum of two fields to create a valid table and the typing can be redundant when working in a large database project so I have a library of scripts I can call up quickly which require only a few changes to generate a new table. It also provides more consistency to facilitate the writing of stored procedures. Of course a great programmer never writes the same line twice! (yeah…right ;-)

The table is probably self-explanatory but I will briefly explain two fields; 1) id2 and 2) list. I created the id2 field in case I needed a field for rank or storing a number similar to the rgt field used for pre-order tree traversal with nested sets. Regardless its always prudent  to have an extra integer field.

 

The rstring() Stored Procedure

DELIMITER $$

CREATE DEFINER=`root`@`%` PROCEDURE `rstring`()
BEGIN
SET @A1:=FLOOR(1 + RAND( ) *500);
SET @A2:=1;
SET @var = NULL;
SET @s = CONCAT("SELECT GROUP_CONCAT(firstname separator ',') into @var FROM firstnames WHERE id= ",@A1," group by id");
PREPARE STMT1 FROM @s;
EXECUTE STMT1;
select @var;
END

 

Protected: MySQL Script to Generate Daily Monitoring File (Updated Nov.5)

This post is password protected. To view it please enter your password below:


New Registrations Widget

Wanna Test This Widget…? send a text message from any SMS enabled phone to

(314) 467-0782

10 Things You Never Say to a Geek!

 1. My computer worked fine until YOU fixed it.

2. Did you erase EVERYTHING ?

3. I can’t pay you for a couple of weeks.

4. Hey! come on over this weekend for a barbecue and by-the-way can you take a look at my computer…?I think it might have a virus.

5. I think my nephew was looking at porn and that must be where I got that virus.

6. I’m not computer literate.

7. I know your busy but I only have one quick question.

8. Gosh, you’re really smart where did you learn all that stuff.

9. I know if you were here you could fix it really fast.

10. $20 dollars seems kinda high! Your were only here for 10 minutes!

Are You a Geek? Take the Test!

 

1. "Leo" is a sign on the Zodiac. (true or false)

2. Jobs is something we all need (true or false)

3. SIP is something you do when you drink wine (true or false)

4. Outlook is the report your broker gives you on a hot stock (true or false)

5. Pandora is greek goddess (true or false)

6. Your neighbor’s son fixes your computer when he’s home on spring break (yes/no).

7. Your computer worked fine until your friend tried to fix it. (true or false)

8. Monster is a kid movie (true or false)

9. A dial-up modem is great if you don’t have enough money for DSL? (true or false)

10. Everyone should have virus protection software on their computer? (true or false)

If you answered true to three or more of these questions we recommend that you immediately locate the box that your computer came in, throw your computer in it and return it to the store where you bought it and get  FULL REFUND because you are TOO STUPID to have one!

 

UPDATE! Flex 4.5 and PHP – configure ZendFramework the easy way (yeh…right!)

QUICK UPDATE

Here are samples of the two critical files you need to make the Flex Data Management Service files work on your remote server. I am using a standard linux server on hostgator for this example.

1. amf_config.ini - change samples to your directory name and don’t forget the forward slash at the end or it won’t work. Make sure you create a services directory underneath the "samples directory" so you can copy the services file that Flex creates. You can find that in the "services" folder within your project. This has to be copied manually from your local project folder over to your samples/services folder. This is CRITICAL!!! Flex DOES NOT do this for you. 

[zend]

webroot = /home/yeswedoservers/public_html/samples/

[zendamf]

amf.production = false

amf.directories[]=services

 

2.Your Services file – something like "ZipsService" or YourService- this is found in the root directory of your flex project in the services directory. This file is totally case sensitive. DON’T RENAME IT! Copy it up to your "samples/services" folder and change the information below to match your server settings. Remember, these settings have nothing to do with your local wamp or mamp settings. Comment var $port = "3306"; out if you have trouble. You likely won’t need this on your server.

var $username = "your username";

var $password = "your password";

var $server = "localhost";

var $port = "3306";

var $databasename = "your database name";

var $tablename = "zips";

var $connection;

 I will post a way to trash the entire Flex Data Management "cluster-f&@k" in a couple of weeks and use something much easier and faster. There is absolutely no reason to use Zend or Flex Data Management services. You can do the same thing in regular PHP with 20% of the code. It’s Total junk!

Paul Chavaux

—–old article starts here.

Remote Webhost setup – Hostgator, etc.

1.Setup your php webservice in flex locally. i.e. wamp or xampp, whatever…

2. Don’t build an mxml page yet.

3. Export a release build.

4. Make sure there is a ‘services’ folder in the release folder, i.e., c:\wamp\www\myReleaseBuild\services

5. Make sure there is a php file in this folder that matches the service name you created.

6. Open Filezilla (download this at filezilla if you don’t have it) and log into your ftp host on your webhosting site., i.e., hostgator, etc.

7. create a folder under the root, i.e., /home/public_html/myNewFolder

8. upload the release build under this new folder. make sure you only upload the files UNDER your local release build folder like this.

c:\wamp\www\myReleaseBuild\allYourFiles

just upload the "allYourFiles" and not myReleaseBuild\allYourFiles

(you are doing this for a reason I will explain later.

9. You will now need to edit two files because they need to include your webhosting information instead of your local server info.

10. If you have cpanel in your webhosting package open up the file manager and go to the new directory you just created that now contains a copy of the files of your release build.

11. go to the services directory and open the php service file., i.e., myService.php which has the php class with all the stuff flex needs to communicate with your php/mysql server/database, etc.

12. find the following lines.

this is connection info for your local installation., i.e., wamp, xampp, whatever.

    var $username = "root";
    var $password = "";
    var $server = "localhost";
    var $port = "3306";
    var $databasename = "yourLocalDatabase";
    var $tablename = "tableWithYourRecords";

 

change it to something like this for your hostgator account or whatever hosting account you have. you can find this stuff in your cpanel under the databases or just use the ftp login and password for your ftp which usually works, if not, create a new one in the databases area of cpanel or your webhost panel.

    var $username = "probablyYourFTPLoginName";
    var $password = "probablyYourFTPPassWord";
    var $server = "localhost";
    var $port = "3306";
    var $databasename = "yourWebDatabaseName";
    var $tablename = "tableWithYourRecordsInIt";

sometimes "localhost’ won’t work and you will need to put your NON-www address, for instance, instead of www.yourDomainName.com it would be yourDomainName.com without the www. Or if this fails, open up a cmd window on your computer and ping your domain name, i.e., ping yourDomainName.com and get the ip address and put it in the $server variable. If none of that works contact your webhost and ask, but normally, localhost works fine.

sometimes port 3306 can be left blank, too.

This is really easier than I am making it.

13. Save that puppy – remember, this stays in the services folder. There shouldn’t be anything else in it but this service file (php file,oh yeah, a service is just a php file).

14. Now move up one directory level to the "root level of your release build directory" like this.

->root-level-of-your-release-build

->->services-folder-underneath-root-level-of-your-release-build

go to the top level which is: ->root-level-of-your-release-build

15. you will see your swf and html files and a bunch of other nonsense, too. Find the amf_config.ini file. This file is pure evil and no one except Mihai Corlan at Adobe knows how to use it. Oh, yeah…did I mention that everyone at Zend and Adobe are morons except Mihai?  Yes, they are all idiots and will suck you into a time tunnel or black hole with all their self-aggrandizing word soup they call Support Center…don’t waste your time on those sites or your boss will fire you.

okay, back to the task at hand…

the amf_config.ini file looks like this:

[zend]
;set the absolute location path of webroot directory, example:
;Windows: C:\apache\www
;MAC/UNIX: /user/apache/www
webroot =/home/yeswedo/public_html/1

;set the absolute location path of zend installation directory, example:
;Windows: C:\apache\PHPFrameworks\ZendFramework\library
;MAC/UNIX: /user/apache/PHPFrameworks/ZendFramework/library
zend_path =/home/yeswedo/public_html/ZendFramework/library

[zendamf]
amf.production = false
amf.directories[]=webServiceTemplate/services

Now delete some of the bullshit, well, most of it is bullshit and will throw mystery errors like the Fred Merkle in the 1908 world series. Mystery errors are errors that are created by the error fairies and can never be resolved without kidnapping the CEO of Adobe and holding him ransom for 445 days (one day longer than the Iranian Hostage Crisis) and threatening to take away all his ill-gotten stock options….(don’t try this at home).

[zend]
webroot =c:\wamp\www\
;zend_path =c:\wamp\www\ZendFramework\library

[zendamf]
amf.production = false
amf.directories[]=yourLocalReleasebuildFolderName/services

YOUR NOT DONE YET! THIS IS THE IMPORTANT PART OR YOU WILL THROUGH THE DREADED CHANNEL ERRORS!
[zend]
webroot =c:\wamp\www\
;zend_path =c:\wamp\www\ZendFramework\library

[zendamf]
amf.production = false
amf.directories[]=yourLocalReleasebuildFolderName/services

Let’s go line by line:

nothing to change on the first line [zend]

[zend]

second like is so easy it is hard. webroot =

this is the web root on your webhosting site. Most of the time it is located in the top left corner of your cpanel and should look something like:

/home/yourDomainNameWithTheDotComOrSomethingClose/public_html/

so if your domain is yeswedoservers.com then this line might look something like:

/home/yeswedo/public_html

make sure you leave off the last "/" or you will throw a channel error.

if you have your stuff in a subdirectory of your webroot like this:

/home/yeswedo/public_html/myCoolFolderWhereIStoreMyFiles/myReleaseBuildFolder/

where myCoolFolderIStoreMyFiles is a subdirectory with workfiles or something…whatever…then make sure you modify this like to read:

/home/yeswedo/public_html/myCoolFolderWhereIStoreMyFiles

okay..that line is done and make sure there is no semi-colon or spew in front of it.

the next line looks something like this before modification: Notice the semi-colon-spew-that-you-will-delete, too. Delete everything back to the equal sign "=" and enter the path to the ZendFramework\library…

;zend_path =c:\wamp\www\ZendFramework\library

should look like:

zend_path =/home/yeswedo/public_html/ZendFramework/library

where yeswedo is the name of YOUR folder (not mine), again, this should be in the upper left corner of your cpanel.

If you have a different path to your ZendFramework directory then change it accordingly.

You’re almost done. The next three lines may look like this before modification…
[zendamf]
amf.production = false
amf.directories[]=webServiceTemplate/services

leave the first two as they are above, like this:
[zendamf]
amf.production = false

and change the last line to match the releaseBuildDirectory on the webhost, remember this from above?
amf.directories[]=thisIsYourReleaseBuildFolder/services

notice that this points to the php service file we edited above, too.

Save and exit.

16. Now try to run your releaseBuildfile.swf or releaseBuildFile.html. and see if you get an error. If you get a channel error or gateway.php not found error check your paths.

17. if you keep getting channel errors you may have a syntax error in your php class file (this is the one in your services folder). If you used the editor from your webhost to edit these files you shouldn’t have a problem. Some people say the files need to be saved in utf-8 or ansi, but they should be fine if edited using your webhost editor, otherwise use a namebrand html editor like dreamweaver…just open and re-save.

Most of the time, channel errors are caused by a bad path in the amf_config.ini file. Don’t mess with the gateway.php file because it only exists to read the amf_config.ini file.

Flex generates the class php file in the services directory and if it works locally then it will work on the webhost as long as you change the connection stuff I outlined above.

Let me know if you have any questions.

Paul Chavaux

 

Quick-Tips!-Browser Project Folders

WordPress – How to Log-In to a WordPress Blog

PPC-What is AdSense PPC Advertising?

PPC-What is Pay-Per-Click