Tech Unpacked – Research & Fundamentals with Nitin Sharma

Friday, March 21, 2014

Describe how you could use a single array to implement three stacks.

__author__ = 'nitin'

class Stack:
    def __init__(self):
        self.stack_size=300
        self.items=[0]* self.stack_size * 3
        self.stack_pointer=[0,0,0]

    def push(self,stack_num,item):
        index=stack_num * self.stack_size + self.stack_pointer[stack_num] + 1
        self.stack_pointer[stack_num]+=1
        self.items[index]=item

    def pop(self,stack_num):
        index=stack_num * self.stack_size + self.stack_pointer[stack_num]
        self.stack_pointer[stack_num] -=1
        value=self.items[index]
        self.items[index]=0
        print value
        return value

    def peek(self,stack_num):
        index=stack_num * self.stack_size + self.stack_pointer[stack_num]
        return self.items[index]

    def isEmpty(self,stack_num):
        return self.stack_pointer[stack_num] == stack_num*self.stack_size

if __name__=='__main__':
    s=Stack()
    s.push(0,1)
    s.push(1,1)
    s.push(2,1)
    s.pop(0)
    s.pop(1)
    s.pop(2)

Introduction to Amazon Redshift

A Tale of Data Warehouses
• A luxury for the ‘fat cats’
• Complex
• Expensive
• Even before the first query!
• Time Factor
• Staffing
• DBAs
• IT
• Traditional databases will never cut it
• Expectations

Enter ….
Amazon Redshift

Big Deal for Big Data?
• Fast petabyte data warehouse service
• Fully managed (setup, operation, scale)
• Seamless integration with existing BI tools (Jaspersoft,
Microstrategy, Pentaho, Tableu, BO, Cognos and more!)
• No new languages to learn
• Simple 3 steps
• Load up your cluster with data
• Connect your favorite query tool
• Query away!
• There’s an API too!

Wow – what steps does it really
involve?
• Simple Management
• Just pop into the AWS Management Console
• Pick a node with pre-allocated storage
• Start off with a few hundred GBs and scale up to a terabyte
• < $1000 / TB / year !
• Ready to accept data so load it up !
• Key point on scaling
• Zero downtime!
• Automatic addition of storage and dynamically increased
performance when more nodes added
• No separate tuning required ! Redshift does this for us

How does it work?
• Uses columnar storage
• Parallel processing architecture
• Queries spread across nodes / cluster given horizontal scale
readiness of architecture
• Monitoring of cluster
• Automatic backups or Manual Snapshots
• Easy integration with other AWS services (S3, DynamoDB,
Data Pipeline, EMR, RDS)

Worried about security?
• Encryption easily flipped on
• Backups + data encrypted
• Also compatible with SSL and works with Amazon VPC (Virtual
Private Cloud) as well

This is great – but this must cost
some serious $$$ !
• Wrong!
• Cost effective and as low as 1/10th the cost of traditional data
warehouse systems
• Zero upfront fees
• Flexible payment options
• Pay as you go
• 70% discount if you commit to a reserved instance
• A 2TB Amazon Redshift data warehouse cluster costs < $1 /
hour !

Amazon.com’s Retail Biz Test*
• On-premises data warehouse
• 32 nodes
• 4.2TB of RAM
• 1.6PB of disk
• Cost: Several Million USDs
• Amazon Redshift
• 2 nodes (128GB RAM each)
• 16TB of disk per node
• $32,000 / year (or $3.65 / hour)
• The test
• 2 billion row data & 6 most complex queries
• At least 10x faster!

HOWEVER.. beware Big Data
Myths
• Technology is only part of the solution
• Besides the tooling
• Devising hypotheses
• Determining metrics/parameters to look at
• Asking the right kinds of questions when the data is ready to
work for you
• If you’re not sure how you intend to use the data / data
warehouse focus more on making sure you have your
questions in place before implementing any technology/using
a DW (NX - HMO Principle)

All great, but how have YOU guys
used Redshift?
• Telecom
• Caller data
• Improving call routing (reducing costs)
• Identifying carrier issues in near real time
• Identifying customer trends which led to the development of new system
features and subsequently translating to more subscriptions
• Better infrastructure preparedness
• Performance Management & Analysis
• Logging granular server & network statistics data
• Process, server, cluster, I/O level metrics
• 2.5TB worth of data for a 24 hour window
• Correlating resource trends against traffic trends
• Descriptively determine bottlenecks
• Determine high risk components in cases of projected high traffic
• Pro-active improvements before any issues hit us

What does this mean for
business?
• Data warehouses are far more affordable now especially for
small to medium sized companies
• Provide an edge to smaller businesses and entrepreneurs –
potentially serve as a catalyst for small business
• Enterprises shouldn’t feel left out
• Test quantitative hypotheses faster based on actual data
• Teams at companies get quality computing in fraction of the time
• Growth in data-driven/centered businesses
• Encourage competition
• Bottom Line:
• Redshift will do the grunt work, allowing companies to focus
more on strategic utilization of the technology and allow their
data to do the work for them

Tech Unpacked – Research & Fundamentals with Nitin Sharma

Popular Posts

Search This Blog

Friday, March 21, 2014

Describe how you could use a single array to implement three stacks.

Introduction to Amazon Redshift

My Profile

Featured Post

🚀 Introducing the Universal API Testing Tool — Built to Catch What Manual Testing Misses

!! IMPORTANT LINKS !!

!! INTERESTING TALKS !!

Contact Form

Labels

Total Pageviews