2015 in review

The WordPress.com stats helper monkeys prepared a 2015 annual report for this blog.

Here’s an excerpt:

A New York City subway train holds 1,200 people. This blog was viewed about 5,100 times in 2015. If it were a NYC subway train, it would take about 4 trips to carry that many people.

Click here to see the complete report.

Why Teradata Manages Data Transfer The Best Way

With the introduction of Teradata Parallel Transporter Utility, now ETL consultants save their long work in off shift hours overseeing big data transfers. How? Because TPT is tolerant on faulty connections and cloud machines even! Let me explain:

Our TD Server machine was on cloud server that is sometimes down because the supposedly ‘live’ IP for cross-organization LAN/WAN was down. This means what! If I have a batch load utility using Windows or Linux shell scripting running overnight with logs, I will wake up in the morning only to discover a plethora of error messages saying

CONNECTION RESET!!! TIMEOUT!!! DBC RESTARTED!!!

What more frustrating when you have just one table sizing 20 GB of data! You just cant possibly delegate it to Computer Junior in as all-nighter. And this would in olden days mean chunking ten pieces of 2 GB and loading/offloading the data by manual hard labour every day, taking to almost every weekend + weekdays or so.

Not Anymore!

Now with the latest in Teradata Parallel Transporter Utility, you join the forces of legacy FastExport and FastLoad in a pipeline; you know you have meagre spool space for a table the size of 20 GB, so specify SpoolMode=’NoSpool’ and there you go: you never have to do manual labour or even supervising bite chunking of 2 GB x 10 chunks. Never mind the maths even. You have a pipeline, with the power of Fastload and FastExport, more than a realtime streaming of data! Yeah, still better than TPump or Stream Operator, you are using Export and Load operators plug in well for a teradata to teradata transfer.

By the way, Unity Data Mover does the same task, but why you need to buy the hefty software tool if you are a Teradata Genius. Add this to your Teradata ETL Guru Tips and reclaim your place as a Professional Services Consultant @ Teradata.

Aster Getting Started in Predictive Analytics

1. Introducing Teradata Aster Discovery Platform Getting Started Ahsan Nabi Khan September 25th, 2015

2 When You Need Aster Discovery Platform? 2. DIG DEEP AND FAST: Ad-hoc, interactive exploration of all data within seconds/minutes 1. SCALABLE ANALYTICS: Vast array of analytic algorithms run on commodity hardware as an Integrated Analytics Engine

3 Advanced Analytic Applications: Use Cases •Credit, Risk and Fraud •Packaging and Advertising •Buying Patterns •Cyber Defense •Fraud and Crime •Citizen’s Feedback •Call Data Records •Service Personalization •Friends Graphs •Click Stream •Opinion, Sentiment, Stars Social Media Telecom Commerce Analysis Federal Analysis

4 Discovery Process Model

Aster Getting Started

How to integrate Hadoop and Teradata using SQL-H

Hey! I’ve been trying to do this all this September and finally got to make it work. I have tried Hadoop Connector for Teradata, Teradata Connector for Hadoop, Teradata Studio Express, Aster SQL-H, and many more cumbersome alternatives, finally to reach the Hadoop-Teradata integration without purchasing QueryGrid current version. However, without QueryGrid, you cannot do cross-platform querying. Here, we just demonstrate bidirectional data transfer between Teradata and Hadoop.

All that I needed for Teradata seamlessly integrate with Hadoop were these:

  1. Hadoop Sandbox 2.1 for VMware (http://hortonworks.com/hdp/downloads)
  2. Teradata Express 15 for VMware (http://downloads.teradata.com/downloads)
  3. Teradata Connector for Hadoop (TDCH) (http://downloads.teradata.com/downloads)
  4. Teradata Studio (http://downloads.teradata.com/downloads)

I didnt need to connect Teradata Aster, because all I needed was querying and data transfer between Hadoop and TD.

Here is how it happened:

1. I converted the OVA file I got from Hortonworks Sandbox download page, into a VMX file for running into VMware Server. The command for converting is this

ovftool.exe Hortonworks_Sandbox_2.1.ova D:/HDP_2.1_Extracted/HDP_2.1_vmware

where HDP_2.1_vmware is the VMDK file extracted. The extraction took an hour on a fast server.

2. I loaded the HDP_2.1_vmware.vmdk into VMware Server by choosing to add a new virtual machine. VMDK file made the VMX as I specified the VM configurations. I chose NAT for network connection, also chose USB driver option for VM. When turning on the VM, it asked the question that SCSI device (USB) is not working so should the VM boot from IDE. Thats the recommended option so I chose it. VM worked, run and I could browse into Hortonworks Sandbox by typing http://sandbox.hortonworks.com:8000. I could also use the port 50070 to access WebHDFS. I just changed the password for hue in the user admin section of the site at http://sandbox.hortonworks.com:8000.

3. Now I needed to install Teradata 15 and Teradata Studio and connect the two. It worked well, and there is a lot of documentation to troubleshoot if anything comes in connecting TD15 to Teradata Studio. When I could not connect TD15 the first time, I got error in Teradata Administrator “Connection Refused”. I just restarted the SUSE Linux OS on which TD 15 VM resides, and I could connect well.

4. Now the last part was to install an RPM file of Teradata Connector for Hadoop (TDCH) in the Hadoop Hortonworks Sandbox I just launched in step 2. For this, I used Putty to connect to HDP2.1 shell. I put the IP designated to sandbox.hortonworks.com in PUTTY, and connected on default port 22. I logged in as root, hadoop as username, password. Then I went to /usr/lib/ . There were installations of java 1.7 , hive, sqoop, etc. I just needed to check that java version is 1.7 or above. Now using FileZilla I transferred TDCH rpm file to /usr/lib. Then I run the command to install rpm

rpm -Uvh teradata-connector-1.3.2-hdp2.1.noarch.rpm

It installed the rpm as verbose (-v), showing me all the details.

5. Now I needed to run the oozie configurations as specified on the Teradata Studio download page in the installation instuctions.

namenode was set to sandbox.hortonworks.com . webHDFS hostname and webHDFS port need not be set as they default to name node and 50070 respectively which works.

6. Now open the Teradata Studio. Add a new database connection. Specify the Hadoop Database credentials including

WebHDFS host name: sandbox.hortonworks.com

WebHDFS port:50070

username: hue

I tested the connection. Firstly ping failed. But after long pause of waiting, which meant that it was in the middle of processing. The java error exception showed “cannot access oozie service”. So I closed the root connection through PUTTY as I was first trying to give username root. I also later closed hue connections online on sandbox.hortonworks.com so that the connection does not get timed out. Then the ping succeeded after a 20 sec pause.

7. Once both Teradata and Hadoop Distributed File System were connected to Teradata Studio, I could transfer data to and from both databases. It is done.

Can RDBMS do Analytics?

Can RDBMS do Analytics

RDBMS are relational database management systems like SQL Server, MySQL, Oracle flavors for enterprise data management. Every website that dynamically stores and retrieves data requires an RDBMS. All product line businesses in fact require a much larger space for data storage and much more processing power for retrieval than a traditional data-enabled web application. Think about Teradata, IBM and MSDN library. Their consultancy requires ever-increasing processing power and storage capacity for Relational Databases requirements. But the big guys are thinking more.

We are in the age of analytics. All the big data we have sprawling over the Internet cannot be crunched by the traditional methods. There come Data Analytics. We need clients to get in depth knowledge from the vast rows and columns of multidimensional stores of product sales. Who needs size and color information updates. We are talking about business sense of marginal and rational thinking. What style shoes are in for summer, what kinds of silk are low in cost for upcoming events, what strategies will cut costs in latest production demands, where in the segmented markets do we find demand for hard disks? These are the questions market leaders are getting answers daily from their Data Analytics Software. Yes, these extensions of software toolkits are now increasingly becoming available in all major RDBMS.

Traditional RDBMS were designed to be operational, reading and updating only the current transactions. But the market forces have created niche for analytical databases. These addenda have large historical data base, processing random heuristic queries of business analysts: ask-answer-ask-again patterns of query processing. All the updates of the historical database are kept in the timeline. But this does not means we are saving junks. Only the relevant data from all seasons is filtered, extracted and loaded in a powerful decision support system. There you have the functionality to ask and answer all kinds of who-what-why forms.

The typicals of data analytics are covered in a data warehouse layer over the top of a collection of integrated databases. Here it is not a requirement to have one piece of software dealing with all data storage and processing capacity. A number of standard RDBMS extract mechanisms support the data warehouse vendor for OLAP queries. OLAP, or online analytical processing queries include business-driven inquiries like

“What are the top brands of hard disks in the market?”
“Where the hard disk failures have client dissatisfaction levels to the highest?”

“Why the hard disk fails especially in the particular shop arcades?”

“Are other shopping areas particularly relevant in this scenario?”

“Where hard disk recycle program will reduce costs remarkably?”

“Where the last recycle program did not work?”

11 Funniest Jokes Only Linux Users Will Understand

Funny Linux Jokes

I am not going to bore you with old Linux one-liner jokes that you might have come across a number of time already. It’s almost 2016 and we still don’t have a decent compilation of new funny Linux jokes.

So, I took some time and searched on various social media platforms for some really witty Linux humor.

If you are an intermediate to experienced Linux user with some knowledge of networking, you’ll have no difficulties in understanding the jokes at all. If you are a sys admin, even better. I am not a sys admin but I studied in-depth about computer networking during my masters, so it was easy to grasp the underlying fun.

Don’t worry, if you are novice to Linux or have no previous knowledge of networking protocols, I have included some hints to the jokes here. I know that the best jokes are the one that need no explanation, but when geek humor is involved, a little explanation doesn’t harm. I mean, that’s the reason why there is a website dedicated for explaining xkcd geek jokes.

11 funny Linux jokes

These jokes are one-liners and do not include images. If geek humor interests you, I advise you to follow It’s FOSS collection of Linux Humor on Google Plus. Lots of Linux jokes in images there. I am sure you would like it.

Follow Linux Humor on Google Plus

  1. I’ll tell you a DNS joke but be advised, it could take up to 24 hours for everyone to get it.
  2. I think there is a duck in my router. It always goes NAT, NAT, NAT.
  3. So I want to dress up as a UDP packet for Halloween, but I don’t know if anyone will get it.
  4. I could tell you an ICMP joke but it would probably be repetitive.
  5. I wanted to write an IPv4 joke, but the good ones were all already exhausted.
  6. IPv4 address space walks into a bar and yells “One strong CIDR please, I’m exhausted!”
  7. Knock knock.
    Who’s there?
    SYN flood.
    SYN flood who?
    Knock knock.…
  8. Q. What is a Unix or Linux sysadmin’s favourite hangout place?
    A. Foo Bar
  9. A RAID member disk walks into a bar. Bartender asks what’s wrong?
    “Parity error.”
    “Yeah, you look a bit off.”
  10. Linux geek started working at McDonalds. A customer asked him for a Big Mac and he gave him a bit of paper with FF:FF:FF:FF:FF:FF written on it.
  11. Q. What kind of doctor fixes broken websites?
    A. A URLologist.

If you did not understand any of the above jokes, you can refer to the explanation below. But before that, if you liked these Linux jokes, don’t hesitate to share it :)

Explanation

  1. When there is a change in DNS, it needs to be propagated to all other servers. This process takes up to 24 hours in completion.
  2. IP addresses are scarce. NAT (Network Address Translation) protocol is the way in which all devices connected to the router have private IP address while the router is connected to the real networked world. Ducks are supposed to be sound like nat-nat.
  3. UDP (User Datagram Protocol) is a networking protocol which doesn´t guarantee the delivery of data packets. Normally, we use TCP protocol to communicate with the internet. UDP is mostly used in real-time systems.
  4. Open a terminal and use the command ¨ping http://www.google.com¨. You´ll understand the joke.
  5. There are limited number of IP addresses available in IPv4. With the rapid expansion of the internet, IP addresses have been exhausted. Read this for more info.
  6. CIDR was introduced to mitigate the IPv4 exhaustion.
  7. Read this to know about how SYN flood is used for DDoS attacks.
  8. Read this to know about Foo bar.
  9. In RAID, an additional parity bit is added to detect error. If this parity bit is off (i.e. 0), it indicates an error.
  10. Every networking devices, your smart phone, computer etc, has a MAC address that is often represented in hexadecimal format and looks like 01:23:45:67:89:ab. These MAC address are used to determine which device would receive the data packet.
    MAC address FF:FF:FF:FF:FF:FF is used for broadcasting the data packet to all the devices. This FF:FF:FF:FF:FF:FF is also called big MAC in networking world.
  11. Self explanatory

Sources: 1-7,11 NixCraft, 8-unknow, 9 @SwiftOnSecurity, 10 @GoogleforWork

Data Journalism: Natural Language Processing, Predictive Analytics, Data Science

%d bloggers like this: