Friday, March 21, 2014

Describe how you could use a single array to implement three stacks.

__author__ = 'nitin'

class Stack:
    def __init__(self):
        self.stack_size=300
        self.items=[0]* self.stack_size * 3
        self.stack_pointer=[0,0,0]

    def push(self,stack_num,item):
        index=stack_num * self.stack_size + self.stack_pointer[stack_num] + 1
        self.stack_pointer[stack_num]+=1
        self.items[index]=item

    def pop(self,stack_num):
        index=stack_num * self.stack_size + self.stack_pointer[stack_num]
        self.stack_pointer[stack_num] -=1
        value=self.items[index]
        self.items[index]=0
        print value
        return value

    def peek(self,stack_num):
        index=stack_num * self.stack_size + self.stack_pointer[stack_num]
        return self.items[index]

    def isEmpty(self,stack_num):
        return self.stack_pointer[stack_num] == stack_num*self.stack_size

if __name__=='__main__':
    s=Stack()
    s.push(0,1)
    s.push(1,1)
    s.push(2,1)
    s.pop(0)
    s.pop(1)
    s.pop(2)

Introduction to Amazon Redshift

A Tale of Data Warehouses
• A luxury for the ‘fat cats’
• Complex
• Expensive
• Even before the first query!
• Time Factor
• Staffing
• DBAs
• IT
• Traditional databases will never cut it
• Expectations

Enter ….
Amazon Redshift


Big Deal for Big Data?
• Fast petabyte data warehouse service
• Fully managed (setup, operation, scale)
• Seamless integration with existing BI tools (Jaspersoft,
Microstrategy, Pentaho, Tableu, BO, Cognos and more!)
• No new languages to learn
• Simple 3 steps
• Load up your cluster with data
• Connect your favorite query tool
• Query away!
• There’s an API too!

Wow – what steps does it really
involve?

• Simple Management
• Just pop into the AWS Management Console
• Pick a node with pre-allocated storage
• Start off with a few hundred GBs and scale up to a terabyte
• < $1000 / TB / year !
• Ready to accept data so load it up !
• Key point on scaling
• Zero downtime!
• Automatic addition of storage and dynamically increased
performance when more nodes added
• No separate tuning required ! Redshift does this for us

How does it work?
• Uses columnar storage
• Parallel processing architecture
• Queries spread across nodes / cluster given horizontal scale
readiness of architecture
• Monitoring of cluster
• Automatic backups or Manual Snapshots
• Easy integration with other AWS services (S3, DynamoDB,
Data Pipeline, EMR, RDS)

Worried about security?
• Encryption easily flipped on
• Backups + data encrypted
• Also compatible with SSL and works with Amazon VPC (Virtual
Private Cloud) as well

This is great – but this must cost
some serious $$$ !

• Wrong!
• Cost effective and as low as 1/10th the cost of traditional data
warehouse systems
• Zero upfront fees
• Flexible payment options
• Pay as you go
• 70% discount if you commit to a reserved instance
• A 2TB Amazon Redshift data warehouse cluster costs < $1 /
hour !

Amazon.com’s Retail Biz Test*
• On-premises data warehouse
• 32 nodes
• 4.2TB of RAM
• 1.6PB of disk
• Cost: Several Million USDs
• Amazon Redshift
• 2 nodes (128GB RAM each)
• 16TB of disk per node
• $32,000 / year (or $3.65 / hour)
• The test
• 2 billion row data & 6 most complex queries
• At least 10x faster!


HOWEVER.. beware Big Data
Myths

• Technology is only part of the solution
• Besides the tooling
• Devising hypotheses
• Determining metrics/parameters to look at
• Asking the right kinds of questions when the data is ready to
work for you
• If you’re not sure how you intend to use the data / data
warehouse focus more on making sure you have your
questions in place before implementing any technology/using
a DW (NX - HMO Principle)

All great, but how have YOU guys
used Redshift?

• Telecom
• Caller data
• Improving call routing (reducing costs)
• Identifying carrier issues in near real time
• Identifying customer trends which led to the development of new system
features and subsequently translating to more subscriptions
• Better infrastructure preparedness
• Performance Management & Analysis
• Logging granular server & network statistics data
• Process, server, cluster, I/O level metrics
• 2.5TB worth of data for a 24 hour window
• Correlating resource trends against traffic trends
• Descriptively determine bottlenecks
• Determine high risk components in cases of projected high traffic
• Pro-active improvements before any issues hit us

What does this mean for
business?

• Data warehouses are far more affordable now especially for
small to medium sized companies
• Provide an edge to smaller businesses and entrepreneurs –
potentially serve as a catalyst for small business
• Enterprises shouldn’t feel left out
• Test quantitative hypotheses faster based on actual data
• Teams at companies get quality computing in fraction of the time
• Growth in data-driven/centered businesses
• Encourage competition
• Bottom Line:
• Redshift will do the grunt work, allowing companies to focus
more on strategic utilization of the technology and allow their
data to do the work for them

My Profile

My photo
can be reached at 09916017317