Personal tools

Big Data Storage Architecture

Neuschwanstein Castle_Germany_080621A
[Neuschwanstein Castle, Germany]



- Big Data Storage Architecture

Big data storage is a storage infrastructure that is designed specifically to store, manage and retrieve massive amounts of data, or big data. Big data storage enables the storage and sorting of big data in such a way that it can easily be accessed, used and processed by applications and services working on big data. Big data storage is also able to flexibly scale as required. 

Big data storage primarily supports storage and input/output operations on storage with a very large number of data files and objects. A typical big data storage architecture is made up of a redundant and scalable supply of direct attached storage (DAS) pools, scale-out or clustered network attached storage (NAS) or an infrastructure based on object storage format. The storage infrastructure is connected to computing server nodes that enable quick processing and retrieval of big quantities of data. Moreover, most big data storage architectures/infrastructures have native support for big data analytics solutions such as Hadoop, Cassandra and NoSQL.


- Direct Attached Storage (DAS)

Direct attached storage is data storage that is connected directly to a computer such as a PC or server, as opposed to storage that is connected to a computer over a network. However, no network does not mean there is no interface connection. DAS can have many different types of interfaces that are connected to a server, such as a Host Bus Adapter (HBA), IDE/ATA, SATA, SAS, SCSI, eSATA and Fibre Channel (FC). These interfaces are also applied to other types of network storage. 

Direct-attached storage (DAS) is computer storage that is connected to one computer and not accessible to other computers. For an individual computer user, a hard drive or solid-state drive (SSD) is the usual form of direct-attached storage. In the enterprise, individual disk drives in a server are called direct-attached storage, as are groups of drives that are external to the server but are directly attached through Small Computer System Interface (SCSI), Serial Advanced Technology Attachment (SATA), Serial-Attached SCSI (SAS), Fibre Channel (FC) or iSCSI.

DAS can be deployed as disks - hard disk drives (HDDs) or SSDs - inside a server chassis, for example, or as an external storage enclosure or enclosures directly connected to a card plugged into the internal bus of a server. It can also be an individual drive in a desktop or laptop computer. 


- Network Attached Storage (NAS)

A NAS system is a storage device connected to a network that allows storage and retrieval of data from a centralized location for authorized network users and heterogeneous clients. NAS systems are flexible and scale-out, meaning that as you need additional storage, you can add on to what you have. NAS is like having a private cloud in the office. It’s faster, less expensive and provides all the benefits of a public cloud on site, giving you complete control. 

[RedHat]: Network-attached storage (NAS) is a file-level storage architecture that makes stored data more accessible to networked devices. NAS is 1 of the 3 main storage architectures—along with storage area networks (SAN) and direct-attached storage (DAS). NAS gives networks a single access point for storage with built-in security, management, and fault tolerant capabilities. 


- Storage Area Networks (SANs)

A Storage Area Network (SAN) is a specialized, high-speed network that provides block-level network access to storage. SANs are typically composed of hosts, switches, storage elements, and storage devices that are interconnected using a variety of technologies, topologies, and protocols. SANs may also span multiple sites. A SAN presents storage devices to a host such that the storage appears to be locally attached. This simplified presentation of storage to a host is accomplished through the use of different types of virtualization.

SANs are commonly based on Fibre Channel (FC) technology that utilizes the Fibre Channel Protocol (FCP) for open systems and proprietary variants for mainframes. In addition, the use of Fibre Channel over Ethernet (FCoE) makes it possible to move FC traffic across existing high speed Ethernet infrastructures and converge storage and IP protocols onto a single cable. Other technologies like Internet Small Computing System Interface (iSCSI), commonly used in small and medium sized organizations as a less expensive alternative to FC, and InfiniBand, commonly used in high performance computing environments, can also be used. In addition, it is possible to use gateways to move data between different SAN technologies.

SANs are primarily used to enhance accessibility of storage devices, such as disk arrays and tape libraries, to servers so that the devices appear to the operating system as locally-attached devices. A SAN typically is a dedicated network of storage devices not accessible through the local area network (LAN) by other devices, thereby preventing interference of LAN traffic in data transfer. A SAN does not provide file abstraction, only block-level operations. However, file systems built on top of SANs do provide file-level access, and are known as shared-disk file systems.


- NAS vs. SAN vs. DAS: Which Is Right for You?

Deciding which is right for your business depends on the following factors.

Key Criteria to Consider:

  • Capacity: How much data do you need to store?
  • Scalability: How much data will you need to store 5 to 10 years from now?
  • Reliability: Can your business survive without its data, files and applications? 
What would downtime do to your business?
  • Backup and Recovery: Where will you back up files and how often? What would
happen if you lost files?
  • Performance: How many employees need to share/access or collaborate on
files, from where (remote or in-house) and how often?
  • Budget: How much do you have to spend?
  • IT Staff and Resources: Do you have a dedicated IT staff person to manage your



[More to come ...]


Document Actions