CCA-410 Certified Administrator Apache Hadoop CDH4 Exam Set 2

What additional capability does Ganglia provide to monitor a Hadoop?


Options are :

  • Ability to monitor NameNode memory usage.
  • Ability to monitor the amount of free space on HDFS.
  • Ability to monitor processor utilization.
  • Ability to monitor free task slots.
  • Ability to monitor number of files in HDFS.

Answer : Ability to monitor NameNode memory usage.

Each slave node in your cluster has four 2 TB hard drives installed (4x2TB). You set a value of the dfs.du.datanode.received parameter to 100GB on each slave node. How does this alter HDFS block storage?


Options are :

  • All Hard drives may be used to store HDFS blocks as long as at least 100GB in total is available on the node
  • 100 GB on each hard drive may not be used to store HDFS blocks
  • 25 GB on each hard drive may not be used to store HDFS Blocks
  • a maximum of 100GB on each hard drive may be used to store HDFS blocks

Answer : 100 GB on each hard drive may not be used to store HDFS blocks

Your cluster has 9 slave nodes. The cluster block size is set to 128 MB and its replication factor set to three. How will the hadoop framework distribute block writes from a reducer into HDFS from a reducer outputting a 300MB file?


Options are :

  • The 9 blocks will be return to 3 nodes, such that each of the three gets one copy of each block
  • The 9 blocks will be return randomly to the nodes; some may receive multiple blocks some may receive none
  • All 9 nodes will each receive exactly one block
  • Reducer don’t write blocks into HDFS
  • The node on which the reducer is running will receive one copy of the each block. The other replicas will be placed on other nodes in the cluster

Answer : The 9 blocks will be return randomly to the nodes; some may receive multiple blocks some may receive none

CCA-410 Certified Administrator Apache Hadoop CDH4 Exam Set 2

What is the rule governing the formatting of the underlying filesystem in the hadoop cluster?


Options are :

  • They must all use the same file system but this does not need to be the same filesystem as the filesystem used by the namenode
  • they must all be left as formatted raw disk, hadoop format them automatically
  • They must all be left as unformatted, rawdisk;hadoop uses raw unformatted disk for HDFS
  • They can use different file system
  • They must all use the same filesystem as the namenode

Answer : They must all use the same filesystem as the namenode

Under which scenario would it be most appropriate to consider using faster (e.g 10 Gigabit) Ethernet as the network fabric for your Hadoop cluster?


Options are :

  • When the typical workloads consumes a large amount of input data, relative to the entire capacity of HDFS.
  • When the typical workloads consists of processor-intensive tasks.
  • When the typical workloads generates a large amount of intermediate data, on the order of the input data itself.
  • When the typical workloads generates a large amount of output data, significantly larger than the amount of intermediate data.

Answer : When the typical workloads generates a large amount of intermediate data, on the order of the input data itself.

Your Hadoop cluster has 12 slave nodes, a block size set to 64MB, and a replication factor of three. Choose which best describes how the Hadoop Framework distributes block writes into HDFS from a Reducer outputting a 150MB file?


Options are :

  • Reducers don't write blocks into HDFS
  • The Reducer will generate twelve blocks and write them to slave nodes nearest the node on which the Reducer runs.
  • The slave node on which the Reducer runs gets the first copy of every block written. Other block replicas will be placed on other nodes.
  • The Reducer will generate nine blocks and write them randomly to nodes throughout the cluster.

Answer : The slave node on which the Reducer runs gets the first copy of every block written. Other block replicas will be placed on other nodes.

CCA-410 Certified Administrator Apache Hadoop CDH4 Exam Set 2

You install Cloudera Manager on a cluster where each host has 1 GB of RAM. All of the services show their status as concerning. However, all jobs submitted complete without an error. Why is Cloudera Manager showing the concerning status KM the services?


Options are :

  • A slave node's disk ran out of space
  • The slave nodes are swapping.
  • DataNode service instance has crashed.
  • The slave nodes, haven't sent a heartbeat in 60 minutes

Answer : The slave nodes, haven't sent a heartbeat in 60 minutes

You have a cluster running 32 slave nodes and 3 master nodes running mapreduce V1 (MRv1). You execute the command: $ hadoop fsck / What four cluster conditions running this command will return to you?


Options are :

  • A,D,E,F
  • Under-replicated blocks
  • Number of dead datanodes
  • The current state of the file system returned from scanning individual blocks on each datanode
  • the location for every block
  • Configure capacity of your cluster
  • the current state of the file system according to the namenode
  • Blocks replicated improperly or that dont satisfy your cluster enhancement policy (e. g. , too many blocks replicated on the same node)
  • Number of datanodes

Answer : A,D,E,F

In the context of configuring a Hadoop cluster for HDFS High Availability (HA), fencing refers to:


Options are :

  • Isolating the standby NameNode from write access to the fsimage and edits files.
  • Isolating a failed NameNode from write access to the fsimage and edits files so that is cannot resume write operations if it recovers.
  • Isolating the cluster’s master daemon to limit write access only to authorized clients.
  • Isolating both HA NameNodes to prevent a client application from killing the NameNode daemons.

Answer : Isolating a failed NameNode from write access to the fsimage and edits files so that is cannot resume write operations if it recovers.

Which three distcp features can you utilize on a Hadoop cluster?


Options are :

  • B,D,E
  • Use distcp to copy files only between two clusters or more. You cannot use distcp to copy data between directories inside the same cluster.
  • Use distcp to copy physical blocks from the source to the target destination in your cluster.
  • Use distcp to run an internal MapReduce job to copy files.
  • Use distcp to copy data between directories inside the same cluster.
  • Use distcp to copy HBase table files.

Answer : B,D,E

You set up the Hadoop cluster using NameNode Federation. One NameNode manages the/users namespace and one NameNode manages the/data namespace. What happens when client tries to write a file to/reports/myreport.txt?


Options are :

  • The file successfully writes to /users/reports/myreports/myreport.txt.
  • The file writes fails silently; no file is written, no error is reported.
  • The file successfully writes to /report/myreport.txt. The metadata for the file is managed by the first NameNode to which the client connects.
  • The client throws an exception.

Answer : The file successfully writes to /report/myreport.txt. The metadata for the file is managed by the first NameNode to which the client connects.

How many mappers required for a map reduce, job determine on a cluster running map reduce V1 (MRv1)?


Options are :

  • The number mappers is equal to no of input splits calculated by client submitting
  • the jobtracker chooses the number based on no of available nodes
  • The number of mappers is calculated by namenode based on the number of HDFS blocks in the files
  • the developer specifies the number in the job configuration

Answer : The number of mappers is calculated by namenode based on the number of HDFS blocks in the files

You observe that the number of spilled records from map tasks for exceeds the number of map output records. You child heap size is 1 GB and your io.sort.mb value is set to 100MB. How would you tune your io.sort.mb value to achieve maximum memory to disk I/O ratio?


Options are :

  • For 1GB child heap size an io.sort.mb of 128MB will always maximum memory to disk I/O.
  • Tune io.sort.mb value until you observe that the number of spilled records equals (or is as close to equals) the number of map output records.
  • Decrease the io.sort.mb value below 100MB.
  • Increase the IO.sort.mb as high you can, as close to 1GB as possible.

Answer : Tune io.sort.mb value until you observe that the number of spilled records equals (or is as close to equals) the number of map output records.

What four functions do scheduling algorithms perform on hadoop cluster?


Options are :

  • allow short jobs to complete even when large, long jobs (consuming a lot of resources are running)
  • Ensure data locality by ordering map tasks so that they run on data local maps slots
  • A,B,E,F
  • Reduce the total amount of computation necessary to complete a job.
  • Run jobs at periodic times of the day
  • support the implementation of service-level agreements for multiple cluster users
  • Allow multiple users to share clusters in a predictable policy-guided manner
  • Reduce the job latencies in environment with multiple jobs of different sizes

Answer : A,B,E,F

You set the value of dfs.block.size to 64MB in hdfs-site.xml on a client machine, but you set the same property to 128MB on your clusters name node. What happens when the client writes a file to HDFS?


Options are :

  • The file will be written successfully with a block size of 64MB but client attempting to read the file will fail because the namenode believes the blocks to be 128MB in size
  • A block size of 64MB will be used
  • A block size of 128MB will be used
  • An execution will be thrown when the client attempts to write the file, because the values are different.

Answer : A block size of 64MB will be used

Which scheduler would you deploy to ensure that your cluster allows short jobs to finish within a reasonable time without starving long-running jobs?


Options are :

  • Completely Fair Scheduler (CFS)
  • Fair Scheduler
  • Capacity Scheduler
  • FIFO Scheduler

Answer : Fair Scheduler

You've configured your cluster with HDFS Federation. One NameNode manages the /data namesapace and another Name/Node manages the /reports namespace. How do you configure a client machine to access both the /data and the /reports directories on the cluster?


Options are :

  • Configure the client to mount the /data namespace. As long as a single namespace is mounted and the client participates in the cluster, HDFS grants access to all files in the cluster to that client.
  • Configure the client to mount both namespaces by specifying the appropriate properties in the core-site.xml
  • You cannot configure a client to access both directories in the current implementation of HDFS Federation.
  • You dont need to configure any parameters on the client machine. Access is controlled by the NameNodes managing the namespace.

Answer : Configure the client to mount both namespaces by specifying the appropriate properties in the core-site.xml

You are a Hadoop cluster with a NameNode on host mynamenode. What are two ways to determine available HDFS space in your cluster?


Options are :

  • Run hadoop dfsadmin –report and locate the DFS Remaining value.
  • Connect to http://mynamemode:50070/ and locate the DFS Remaining value.
  • Run hadoop dfsadmin SpaceQuota and subtract HDFS Used from Configured Capacity.
  • Run hadoop fs –du/and locate the DFS Remaining value.
  • B,C

Answer : B,C

On a cluster running MapReduce v1 (MRv1), the value of the mapred.tasktracker.map.tasks.maximum configuration  rameter in the mapred-site.xml file should be set to:


Options are :

  • The maximum number of Map tasks can run simultaneously on an individual node.
  • Half the number of the maximum number of Reduce tasks which can run simultaneously on an individual node.
  • The same value on each slave node.
  • The maximum number of Map tasks which can run on the cluster as a whole.
  • Half the number of the maximum number of Reduce tasks which can run on the cluster as a whole.

Answer : The maximum number of Map tasks can run simultaneously on an individual node.

The failure of which daemon makes HDFS unavailable on a cluster running MapReduce v1 (MRv1)?


Options are :

  • DataNode
  • Secondary NameNode
  • Application Manager
  • NameNode
  • Resource Manager
  • Node Manager

Answer : NameNode

Your cluster has nodes in seven racks, and you have provided a rack topology script. What is Hadoop's block placement policy, assuming a block replication factor of three?


Options are :

  • All three of the block are written to nodes on the same rack
  • Because there are seven racks the block is written to a node on each rack
  • One copy of the block is written to a node in each of three racks
  • One copy of the block is written to a node in one rack; two copies are written to two nodes in a different rack

Answer : One copy of the block is written to a node in one rack; two copies are written to two nodes in a different rack

Which three file actions can execute as you write a file into HDFS?


Options are :

  • C,D,E
  • You can update the files contents
  • You can index the file
  • You can delete the file
  • You can rename the file
  • You can move the file

Answer : C,D,E

You configure you cluster with HDFS High Availability (HA) using Quorum-Based storage. You do not implement HDFS Federation. What is the maximum number of NameNodes daemon you should run on you cluster in order to avoid a split-brain scenario with your NameNodes?


Options are :

  • Two active NameNodes and one Standby NameNode
  • One active NameNode and one Standby NameNode
  • Two active NameNodes and two Standby NameNodes
  • Unlimited. HDFS High Availability (HA) is designed to overcome limitations on the number of NameNodes you can deploy.

Answer : One active NameNode and one Standby NameNode

What is the recommended disk configuration for slave nodes in your Hadoop cluster with 6 x 2 TB hard drives?


Options are :

  • RAID 5
  • RAID 10
  • RAID 1+0
  • JBOD

Answer : JBOD

When planning a hadoop cluster, what general rule governs the hardware requirements between master nodes and slave nodes?


Options are :

  • The master nodes requires more memory and no disk drives
  • The master nodes require more memory and greater disk capacity then the slave nodes
  • the master nodes requires less memory and fewer number disk drives than the slave nodes
  • The master and slave nodes should have the same hardware configuration
  • The master nodes requires more memory but less disk capacity

Answer : The master nodes requires more memory but less disk capacity

Your cluster implements hdfs high availability (HA) your two namenodes are named hadoop01 and hadoop02. What occurs when you execute the command: Sudo -u hdfs haadmin -failover hadoop01 hadoop02


Options are :

  • Hadoop01 becomes inactive and hadoop02 becomes the active namenode
  • Hadoop01 is fenced, and hadoop02 becomes the active namenode
  • Hadoop02 is fenced, and hadoop01 becomes active namenode
  • Hadoop02 becomes the standby namenode and hadoop01 becomes the active namenode

Answer : Hadoop01 is fenced, and hadoop02 becomes the active namenode

Where does a MapReduce job store the intermediate data output from Mappers?


Options are :

  • On the underlying filesystem of the local disk of the machine on which the Mapper ran.
  • In HDFS, in the job’s output directory.
  • On the underlying filesystem of the local disk machine on which the JobTracker ran.
  • Stores on the underlying filesystem of the local disk of the machine on which the Reducer.
  • In HDFS, in temporary directory defined mapred.tmp.dir.

Answer : On the underlying filesystem of the local disk of the machine on which the Mapper ran.

You are running a Hadoop cluster with NameNode on host mynamenode, a secondary NameNode on host mysecondary and DataNodes. Which best describes how you determine when the last checkpoint happened?


Options are :

  • Connect to the web UI of the NameNode (http://mynamenode:50070/) and look at the Last Checkpoint information
  • Execute hdfs dfsadmin report on the command line in and look at the Last Checkpoint information.
  • Execute hdfs dfsadmin saveNameSpace on the command line which returns to you the last checkpoint value in fstime file.
  • Connect to the web UI of the Secondary NameNode (http://mysecondarynamenode:50090) and look at the Last Checkpoint information

Answer : Connect to the web UI of the Secondary NameNode (http://mysecondarynamenode:50090) and look at the Last Checkpoint information

Your cluster Mapreduce V1 (MVR1). What determines where blocks are return into HDFS client application?


Options are :

  • The clients sends the data to the name node, which then writes those blocks to the data nodes
  • The client queries the name node, which returns information on which data nodes to use. Then client writes to those data nodes.
  • the client writes immediately to data nodes at random
  • The client immediately to data nodes based on the clusters rack locality settings

Answer : The client queries the name node, which returns information on which data nodes to use. Then client writes to those data nodes.

In HDFS, you view a file with rw-r--r-- set as its permissions. What does this tell you about the file?


Options are :

  • The file’s contents can be modified by the owner, but no-one else
  • As a Filesystem in Userspace (FUSE), HDFS files are available to all users on a cluster regardless of their underlying POSIX permissions.
  • The file cannot be run as a MapReduce job
  • The file cannot be deleted by anyone
  • The file cannot be deleted by anyone but the owner

Answer : The file cannot be deleted by anyone but the owner

Comment / Suggestion Section
Point our Mistakes and Post Your Suggestions